我们的第八代TPU：为代理时代设计的两款芯片

我们的第八代TPU：为代理时代设计的两款芯片
Our eighth generation TPUs: two chips for the agentic era

原始链接: https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/eighth-generation-tpu-agentic-era/

谷歌的新TPU 8t和8i芯片，与Gemini联合设计，代表了人工智能性能和效率的飞跃。它们专为应对大型推理模型的挑战而构建，采用新颖的Boardfly拓扑结构、增加的SRAM和高带宽Virgo网络。值得注意的是，两款芯片现在都使用谷歌自研的Axion ARM架构CPU，以实现系统全面优化。这些TPU支持JAX、PyTorch和vLLM等流行框架，并提供裸机访问，简化开发和部署。一个关键重点是功耗效率——通过硬件*和*软件的创新，包括集成电源管理和先进的液体冷却，提供高达两倍于上一代的能效比。谷歌从芯片到数据中心设计的全栈控制，使其能够实现显著的节能效果，在短短五年内将每单位电力的计算能力提高六倍。这种整体方法使TPU 8t和8i成为应对苛刻人工智能工作负载的强大且可持续的解决方案。

原文

Co-designed for Gemini, open for everyone

This eighth generation TPU is also the latest expression of our co-design philosophy, where every spec is built to solve AI’s biggest hurdles.

Boardfly topology was designed specifically for the communication demands of today's most capable reasoning models.
SRAM capacity in TPU 8i was sized for the KV cache footprint of reasoning models at production scale.
Virgo Network fabric's bandwidth targets were derived from the parallelism requirements of trillion-parameter training.

And for the first time, both chips run on Google’s own Axion ARM-based CPU host, allowing us to optimize the full system, not just the chip, for performance and efficiency.

Both platforms support native JAX, MaxText, PyTorch, SGLang and vLLM — the frameworks developers already use — and offer bare metal access, giving customers direct hardware access without the overhead of virtualization. Open-source contributions including MaxText reference implementations and Tunix for reinforcement learning support turn key paths between capability and production deployment.

Designing for power efficiency at scale

In today’s data centers, power, not just chip supply, is a binding constraint. To solve this, we have optimized efficiency across the entire stack, with integrated power management that dynamically adjusts the power draw based on real-time demand. TPU 8t and TPU 8i deliver up to two times better performance-per-watt over the previous generation, Ironwood.

But efficiency at Google is not just a chip-level metric; it’s also a system-level commitment that runs from silicon to the data center. For example, we integrate network connectivity with compute on the same chip, significantly reducing the power costs of moving data across the TPU pod. Even our data centers are co-designed with our TPUs. We innovated across hardware and software to enable our data centers to deliver six times more computing power per unit of electricity than they did just five years ago.

TPU 8t and TPU 8i continue that trajectory. Both are supported by our fourth-generation liquid cooling technology that sustains performance densities air cooling cannot. By owning the full stack, from Axion host to accelerator, we can optimize system-level energy efficiency in ways that simply cannot be achieved when the host and chip are designed independently.