17 Novembre 2025

Collaborative research by Microsoft and NVIDIA on real-time immunity

AI-Powered Threats Demand AI-Powered Defense 
 

While AI supports growth and innovation, it is also reshaping how organizations address faster, more adaptive security risks. AI-driven security threats, including “vibe-hacking”, are evolving faster than traditional defenses can adapt. Attackers can now combine reinforcement learning (RL) with LLM capabilities in code generation, tool use, and multi-step reasoning to create agents that act as autonomous, adaptive cyber weapons. These agents can mutate attacks and bypass defenses in real time—outpacing human response teams.   

Text Box 2, TextboxTraditional security tools, built on static rules and signatures, are quickly becoming obsolete. To stay protected, enterprises need to adopt AI-powered cybersecurity systems that learn, anticipate, and respond as intelligently as attackers. This is where Adversarial Learning, a critical new frontier in security, comes in. By continuously training attack and defense models together, we can build an autonomic defense system against weaponized AI. However, achieving real-time security requires scaling transformer-based architectures and optimizing them for ultra-low-latency inference at massive scale.  

This post highlights how Microsoft and NVIDIA are transforming adversarial learning research into real-time, production-grade cyber defense—leveraging GPU-accelerated computing to deliver scalable, adaptive protection.  

Strategic Collaboration: Building Real-Time Threat Detection   

Once trained, deploying transformer models for live traffic analysis demands an inference engine that can match the volume and velocity of production workloads—without compromising detection accuracy. Through joint engineering efforts, Microsoft and NVIDIA achieved breakthrough performance by transitioning from CPU to GPU compute:   

Metric  CPU Baseline  GPU Baseline  
Triton on NVIDIA H100 
GPU Optimized   
Triton on NVIDIA H100  
with further optimizations 
End-to-End Latency  1239.67 ms  17.8 ms  7.67 ms 
Throughput  0.81 req/s  57 req/s  > 130 req/s 
Detection Accuracy      >95% on adversarial benchmarks 

This end-to-end latency, which includes network latency, demonstrates the viability of deploying adversarial learning at an enterprise scale.  

Microsoft’s Contributions: Adversarial Learning, Model Training & Optimization  

To achieve high detection accuracy on adversarial traffic, Microsoft researchers trained and optimized transformer-based classifiers to detect malicious payloads.  

Key innovations included:  

  • Adversarial learning pipeline  
  • Model distillation and architecture  
  • Security-specific input segmentation that enabled NVIDIA to develop parallel tokenization  

These enhancements laid the foundation for high-precision detection and enabling AI models which can generalize across diverse attack variants.  

NVIDIA Contributions: Accelerating Inference at Scale  

Beyond baseline GPU acceleration, two NVIDIA innovations were critical to achieving real-time latency targets:   

  1. Optimized GPU Classifier (NVIDIA Triton + TensorRT):  

Text Box 2, TextboxNVIDIA engineered a custom TensorRT implementation of Microsoft’s classifier, fusing key operations into a single CUDA kernel to minimize memory traffic and launch overhead.  In particular, normalization operations were automatically fused into kernels of preceding operations by TensorRT, while custom CUDA kernels were developed to optimize both sliding window attention and dense layer activation functions. All custom kernels were then compiled together into a TensorRT engine and served via the Triton-TensorRT C++ backend to minimize host overhead.    

Overall, the NVIDIA solution led to significant performance boosts compared to standard GPU solutions, reducing forward-pass latency from 9.45 ms to 3.39 ms. This represented a 2.8× speedup and contributed 6.06 ms of the total 10.13 ms end-to-end latency reduction reported in the performance breakdown above.   

  

  1. Domain-Specific Tokenization  

After optimizing the threat-detection classifier, the data pre-processing pipeline emerged as the next major performance bottleneck. Traditional tokenization techniques often fall short when it comes to leveraging parallelism within a sequence. While whitespace-based segmentation may suffice for conventional content like articles or documentation, it proves inadequate for densely packed request strings. These strings, common in security-sensitive environments, resist balanced segmentation, leading to inefficiencies in downstream processing.   
   
To address the challenges of processing dense machine-generated payloads, NVIDIA engineered a domain-specific tokenizer optimized for low-latency environments. By integrating segmentation points developed by Microsoft, tailored to the structural nuances of machine data, the tokenizer unlocked finer-grained parallelism, delivering a 3.5× reduction in tokenization latency. These cumulative engineering breakthroughs will enable Microsoft to deploy a high-performance threat-detection classifier capable of efficiently handling a wide range of sequence lengths in real-time.  

Inference Stack:   

  • Serving: NVIDIA Triton Inference Server  
  • Model: NVIDIA TensorRT implementation of Microsoft’s threat classifier   
  • Tokenizer: Custom tokenizer optimized for security data  

 

Custom CUDA Kernels:   

  • Embedding + LayerNorm  
  • Residual Add + LayerNorm  
  • GeGLU activation  
  • Bidirectional sliding window flash attention  

 

Real-World Impact   

Speed: Real-time classification enables truly inline adversarial detection for production traffic, without introducing queueing delays.  

Scale: Sustained GPU throughput (> 130 req/s on H100); supports high-traffic endpoints and bursty workloads.   

Accuracy: >95% detection accuracy on representative adversarial inputs provides robust coverage against rapidly evolving attack variants.   

What’s Next   

The roadmap and deep engineering collaboration continues to push the boundaries of real-time threat-detection. Future efforts will explore advanced model architectures for adversarial robustness and advanced acceleration techniques such as quantization. The next phase will significantly broaden the impact of adversarial learning in practical cybersecurity applications. By training models on malicious patterns, we’re equipping them to manage higher traffic volumes and increasingly intricate payloads—while maintaining strict latency constraints. These innovations collectively lay the foundation for faster, more robust defenses that can keep pace with the escalating scale and complexity of today’s AI-driven cyber threats.  

To learn more about this research, join us at the Security Preday event on Monday, November 17 starting at 1 pm Pacific or at the NVIDIA booth on Thursday, November 20 at 10:35 am Pacific.  Please visit the Ignite event Website https://ignite.microsoft.com/en-US/home for details on how to register. 
 

Special thanks to key contributors to this research: Sami Ait Ouahmane (Microsoft), Rachel Allen (NVIDIA), Mohit Ayani (NVIDIA), Francis Beckert (Microsoft), Nora Hajjar (Microsoft), Rakib Hasan (NVIDIA), Yingqi Liu (Microsoft), Navid Nobakht (Microsoft), Rohan Varma (NVIDIA), and Bryan Xia (Microsoft)  

The post Collaborative research by Microsoft and NVIDIA on real-time immunity appeared first on Microsoft Security Blog.


Source: Microsoft Security

Share: