Hex-LLM: High-Efficiency Large Language Model Serving is available in General Availability (GA)
Share
Services
## Feature
[Hex-LLM: High-Efficiency Large Language Model Serving](https://cloud.google.com/vertex-ai/generative-ai/docs/open-models/use-hex-llm) is available in [General Availability (GA)](https://cloud.google.com/products#product-launch-stages).
This launch adds support for the following models:
* Llama 3.1
* Llama 3.2
* Phi-3
* Qwen2 and Qwen2.5
Additional supported features:
* Multi-host serving.
* Disaggregated serving (experimental).
* Prefix caching.
* AWQ quantization.
What else is happening at Google Cloud Platform?
GKE cluster versions have been updated. New versions available for upgrades and new clusters
about 15 hours ago
Services
Share
Cloud Composer 2 is no longer available in Mexico (northamerica-south1)
about 18 hours ago
Services
Share
Read update
Services
Share