Maintained with ☕️ by
IcePanel logo

Public Preview: Application Gateway for Containers – Inference gateway

Share

Services

Application Gateway for Containers is extending its ingress gateway feature set with new **AI gateway** capabilities. The new inference gateway capability brings the Kubernetes Gateway API Inference Extension to Application Gateway for Containers, enabling **load and model-aware routing** for self-hosted generative AI workloads on AKS. Application Gateway for Containers' inference gateway capability is purpose-built for serving LLMs and other inference workloads, routing requests based on model server signals rather than generic load balancing — improving Time to First Token (TTFT) and reducing timeouts under load. * **Managed Body-Based Router (BBR):** A fully managed router that inspects the request body (such as the model field on OpenAI-compatible APIs) to drive model-aware routing — no custom proxy tier required. * **Bring your own Endpoint Picker (EPP):** Choose the EPP implementation that fits your workload to drive metrics-aware endpoint selection across your InferencePool. * **Secure inference:** Pairs natively with the existing web application firewall (WAF) capability in Application Gateway for Containers, applying OWASP-aligned protections to AI traffic before it reaches your model servers. * **Improved performance and reliability:** Routes around saturated replicas and leverages model server state to lower TTFT, reduce timeouts, and improve GPU utilization. **Learn more:** * [Application Gateway for Containers – AI gateway - Inference gateway](https://aka.ms/agc/aigateway "https://aka.ms/agc/aigateway") * [Application Gateway for Containers - Overview](https://aka.ms/agc "https://aka.ms/agc")