Maintained with ☕️ by
IcePanel logo

This Release Note announces General Availability of Trillium AKA v6e

Share

Services

## Announcement This Release Note announces General Availability of Trillium AKA v6e. Trillium is the 6th generation and latest Cloud TPU. It is fully integrated with our AI Hypercomputer architecture to deliver compelling value to our Google Cloud Platform AI customers. We used Trillium TPUs to train the new Gemini 2.0, Google's most capable AI model yet, and now enterprises and startups alike can take advantage of the same powerful, efficient, and sustainable infrastructure. Today, Trillium is generally available for Google Cloud customers and this week we will be delivering our first large tranches of Trillium capacity to some of our biggest Google Cloud Platform customers. Here are some of the key improvements that Trillium delivers over the prior generations, v5e and v5p: * Over 4x improvement in training performance. * Up to 3x increase in inference throughput. * A 67% increase in energy efficiency. * An impressive 4.7x increase in peak compute performance per chip. * Double the High Bandwidth Memory (HBM) capacity. * Double the Interchip Interconnect (ICI) bandwidth. * 100,000 Trillium chips per Jupiter network fabric with 13 Petabits/sec of bisection bandwidth, capable of scaling a single distributed training job to hundreds of thousands of accelerators. * Trillium provides up to 2.1x increase in performance per dollar over Cloud TPU v5e and up to 2.5x increase in performance per dollar over Cloud TPU v5p in training dense LLMs like Llama2-70b and Llama3.1-405b. * GKE integration enables seamless AI workload orchestration using Google Compute Engine MIGs including XPK for faster iterative development. * Multislice training with Trillium scales from one to hundreds of thousands of chips across pods using DCN. * Training and serving fungibility enables use of same Cloud TPU quota for both training and inference. * Support for collection scheduling with collection SLOs being defended. * Full-host VM support to enable inference support for larger models (70B+ parameters). * Official Libtpu releases that guarantees stability across all three frameworks (Jax/Pytorch-XLA/Tensorflow). These enhancements enable Trillium to excel across a wide range of AI workloads, including: * Scaling AI training workloads like LLMs including dense and Mixture of Experts (MoE) models * Inference performance and collection scheduling * Embedding-intensive models acceleration * Delivering training and inference price-performance