Maintained with ☕️ by
IcePanel logo

GKE Inference Gateway is generally available (GA) and ready for production

Share

Services

## Feature Feature GKE Inference Gateway is generally available (GA) and ready for production workloads. This release introduces major performance, security, and usability enhancements since the Public Preview. * **Stable v1 API**: The API has graduated to v1\. The `InferenceModel` resource is replaced by the `InferenceObjective` resource for a clearer definition of serving goals. A zero-downtime migration path is available. * **Prefix-Aware Routing**: A new, intelligent routing feature inspects request context and routes requests with shared prefixes (like in conversational AI) to the same model replica. This can maximize KV cache hits and improve Time-to-First-Token (TTFT) latency by up to 96%. * **API Key Authentication**: Secure your endpoints by enforcing API key validation through a new integration with Apigee. * **Body-Based Routing**: The gateway can route requests using the model field directly from the HTTP request body, which enables native compatibility with the OpenAI API specification. For more information see[About GKE Inference Gateway](https://docs.cloud.google.com/kubernetes-engine/docs/concepts/about-gke-inference-gateway)and [Deploy GKE Inference Gateway](https://docs.cloud.google.com/kubernetes-engine/docs/how-to/deploy-gke-inference-gateway). ## Issue Issue Starting with version 1.33.2-gke.4655000, the GCSFuse CSI Driver automatically applies performance-tuning defaults for Cloud Storage FUSE volumes used on nodes with [high-performance machine types](https://docs.cloud.google.com/storage/docs/cloud-storage-fuse/automated-configurations). However, in GKE versions 1.34.1-gke.1431000 to 1.34.1-gke.3403001, these defaults are not being applied. This is due to an issue where GCSFuse fails to recognize the machine type from the configuration file provided by the GCSFuse CSI Driver. To apply the performance defaults, explicitly set the machine-type as a gcsfuse mount option. Use the command-line flag format, with the key and value separated by an equals sign (`=`). For example: `machine-type=n2-standard-4` Ensure the Pod using the GCSFuse volume is scheduled on a node that matches the specified machine type. These settings are optimized for high-performance machine types and might not be suitable for other node types. For more information on scheduling, see the Kubernetes documentation on[assigning Pods to Nodes](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/).