Maintained with ☕️ by
IcePanel logo

Saxml on GKE is de-prioritized beginning April 24, 2025. This means the project won't get further updates

Share

Services

## Deprecate Saxml on GKE is de-prioritized beginning April 24, 2025\. This means the project won't get further updates. Existing Saxml deployments will continue to function as is without disruption. We _strongly suggest_ that you migrate to [JetStream](https://github.com/google/JetStream), Google's up to date open source inference framework for high-performance LLM serving on TPUs and GPUs. JetStream offers continuous batching and quantization for better throughput and memory efficiency. For a migration example, see [Serve Gemma using TPUs on GKE with JetStream](https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-tpu-jetstream).