Nvidia’s dominance in the GPU market has prompted many companies to explore non-GPU alternatives, and one of the latest options to emerge is Google’s TPU v5e AI chip.
The TPU v5e is Google’s first AI chip designed for large-scale orchestration of AI workloads in virtual environments, and it comes with a suite of software and tools. It’s now available in preview for Google Cloud customers. This new chip replaces the TPUv4, which was previously used to train advanced language models like PaLM and PaLM 2, which power Google’s search, mapping, and productivity apps.
While Google has often compared its TPUs to Nvidia’s GPUs, the TPU v5e launch was more measured. Google emphasized its focus on providing a variety of AI chip options to meet different customer needs, such as using Nvidia’s H100 GPUs in the A3 supercomputer and the TPU v5e for both training and inferencing.
The TPU v5e is also notable for being the first Google AI chip to be available outside the U.S. The previous generation, TPU v4, was only accessible in North America. Now, the TPU v5e will be installed in data centers in the Netherlands for the EMEA region and in Singapore for the Asia-Pacific markets.
The development of the TPU v5 chip, however, has been surrounded by controversy. In 2021, Google researchers informally announced the chip and claimed that AI had been used to help design it, including planning its layout in under six hours—faster than human experts. This sparked debates internally and externally, and a researcher was even fired ahead of the paper’s publication in Nature. Other academics questioned the validity of Google’s claims, with one researcher from the University of California, San Diego, reverse-engineering Google’s process and concluding that human chip designers and automated tools could sometimes be more efficient than the AI-based method Google promoted.
Despite this controversy, Google has pushed ahead with its AI ambitions, and TPUs remain central to its strategy. The company’s large language models are optimized to run on TPUs, and the new chips are integral to Google’s data centers as it adds AI features to its products.
The TPU v5e is designed more for inferencing than training, offering peak performance of 393 teraflops of INT8 per chip, an improvement over the TPU v4’s 275 petaflops. However, it falls short in BF16 performance, with just 197 teraflops compared to the TPU v4’s 275 teraflops.
When used in clusters, the TPU v5e could outperform the TPU v4. While the TPU v4 was limited to configurations of 4,096 chips, the TPU v5e can scale to thousands of chips, enabling the processing of even larger AI models. The introduction of a new “Multislice” technology allows users to network together hundreds of thousands of AI chips, expanding AI model scaling beyond the previous physical limits of TPU pods.
Another key feature of the TPU v5e is its ability to run multiple virtual machines simultaneously. Google has also introduced Kubernetes support for managing AI workloads across both TPU v4 and v5e. The largest configuration available can deploy 64 virtual machines across 256 TPU v5e clusters. The chips are compatible with machine learning frameworks such as Pytorch, JAX, and TensorFlow.
The TPU v5e offers eight different virtual machine configurations, ranging from one chip to over 250 chips within a single slice. This allows users to tailor their setups for different sizes of large language models and generative AI tasks.
Each TPU v5e chip is equipped with four matrix multiplication units, a vector processing unit, and a scalar processing unit, all connected to HBM2 memory. Google’s data centers are equipped with high-bandwidth infrastructure that uses optical switches to link AI chips and clusters, allowing for flexible and dynamic network reconfiguration.
In terms of cost-efficiency, the TPU v5e performs significantly better than the TPU v4. For every dollar spent, the TPU v5e is up to twice as fast in training and 2.5 times faster for inferencing. The TPU v5e costs $1.20 per chip hour, compared to $3.20 for the TPU v4. This makes the TPU v5e an attractive option for companies looking to train and deploy more complex AI models at a lower cost.
Google has detailed TPU v5e configurations on its website, categorizing them for both training and inferencing use cases. The training model is suitable for tasks like transformer, text-to-image, and CNN training, fine-tuning, and serving.
Meanwhile, Google’s A3 supercomputer, which is equipped with up to 26,000 Nvidia H100 GPUs, will be generally available next month. The A3 is designed for businesses working with massive language models in sectors like finance, pharmaceuticals, and engineering.
The growing competition in AI infrastructure is also evident from other tech giants like Amazon and Intel. Amazon’s AWS has integrated its own custom AI chips, Trainium and Inferentia, while Intel has a pipeline of orders for its Gaudi2 and Gaudi3 chips. As companies look for ways to handle the demand for AI chips, Google’s TPU v5e offers an alternative for those seeking flexibility and affordability in their AI deployments.