VLLM TPU vLLM TPU, a highly-efficient serving framework for large language models (LLM) that's
Share
Services
## Feature
**vLLM TPU**
[vLLM TPU](https://cloud.google.com/vertex-ai/generative-ai/docs/open-models/vllm/use-vllm-tpu), a highly-efficient serving framework for large language models (LLM) that's optimized for [Cloud TPU](https://cloud.google.com/tpu) hardware, is available through Model Garden.