Public Preview: Application Gateway for Containers – Inference gateway
Share
Services
Application Gateway for Containers is extending its ingress gateway feature set with new **AI gateway** capabilities. The new inference gateway capability brings the Kubernetes Gateway API Inference Extension to Application Gateway for Containers, enabling **load and model-aware routing** for self-hosted generative AI workloads on AKS.
Application Gateway for Containers' inference gateway capability is purpose-built for serving LLMs and other inference workloads, routing requests based on model server signals rather than generic load balancing — improving Time to First Token (TTFT) and reducing timeouts under load.
* **Managed Body-Based Router (BBR):** A fully managed router that inspects the request body (such as the model field on OpenAI-compatible APIs) to drive model-aware routing — no custom proxy tier required.
* **Bring your own Endpoint Picker (EPP):** Choose the EPP implementation that fits your workload to drive metrics-aware endpoint selection across your InferencePool.
* **Secure inference:** Pairs natively with the existing web application firewall (WAF) capability in Application Gateway for Containers, applying OWASP-aligned protections to AI traffic before it reaches your model servers.
* **Improved performance and reliability:** Routes around saturated replicas and leverages model server state to lower TTFT, reduce timeouts, and improve GPU utilization.
**Learn more:**
* [Application Gateway for Containers – AI gateway - Inference gateway](https://aka.ms/agc/aigateway "https://aka.ms/agc/aigateway")
* [Application Gateway for Containers - Overview](https://aka.ms/agc "https://aka.ms/agc")
What else is happening at Microsoft Azure?
Read update
Services
Share
Public Preview: Azure Site Recovery support for Linux Azure VMs with NVMe disk controllers.
June 8th, 2026
Services
Share