我们的第八代TPU：为代理时代设计的两款芯片

我们的第八代TPU：为代理时代设计的两款芯片
Our eighth generation TPUs: two chips for the agentic era

原始链接: https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/eighth-generation-tpu-agentic-era/

谷歌的新TPU 8t和8i芯片，与Gemini联合设计，代表了人工智能性能和效率的飞跃。它们专为应对大型推理模型的挑战而构建，采用新颖的Boardfly拓扑结构、增加的SRAM和高带宽Virgo网络。值得注意的是，两款芯片现在都使用谷歌自研的Axion ARM架构CPU，以实现系统全面优化。这些TPU支持JAX、PyTorch和vLLM等流行框架，并提供裸机访问，简化开发和部署。一个关键重点是功耗效率——通过硬件*和*软件的创新，包括集成电源管理和先进的液体冷却，提供高达两倍于上一代的能效比。谷歌从芯片到数据中心设计的全栈控制，使其能够实现显著的节能效果，在短短五年内将每单位电力的计算能力提高六倍。这种整体方法使TPU 8t和8i成为应对苛刻人工智能工作负载的强大且可持续的解决方案。

谷歌最近推出了第八代张量处理单元（TPU），在人工智能计算能力方面取得了显著进展。单个TPU 8t超级吊舱现在扩展到9,600个芯片，配备2PB内存，提供121 ExaFlops的算力——是上一代芯片间带宽的两倍。这一消息在谷歌博客上发布，并在Hacker News上讨论，引发了关于谷歌在人工智能竞赛中悄然复兴的讨论。评论员认为，虽然其他公司抢占头条，但谷歌垂直整合的基础设施和持续的进步正在使其占据优势地位。讨论要点包括这种硬件的竞争优势、自主购买硬件与云端租赁的吸引力，以及谷歌（或苹果）最终“赢得”人工智能格局的可能性，这归功于他们对整个技术栈的控制。新型TPU的密度和冷却系统也引起了关注。

原文

Co-designed for Gemini, open for everyone

This eighth generation TPU is also the latest expression of our co-design philosophy, where every spec is built to solve AI’s biggest hurdles.

Boardfly topology was designed specifically for the communication demands of today's most capable reasoning models.
SRAM capacity in TPU 8i was sized for the KV cache footprint of reasoning models at production scale.
Virgo Network fabric's bandwidth targets were derived from the parallelism requirements of trillion-parameter training.

And for the first time, both chips run on Google’s own Axion ARM-based CPU host, allowing us to optimize the full system, not just the chip, for performance and efficiency.

Both platforms support native JAX, MaxText, PyTorch, SGLang and vLLM — the frameworks developers already use — and offer bare metal access, giving customers direct hardware access without the overhead of virtualization. Open-source contributions including MaxText reference implementations and Tunix for reinforcement learning support turn key paths between capability and production deployment.

Designing for power efficiency at scale

In today’s data centers, power, not just chip supply, is a binding constraint. To solve this, we have optimized efficiency across the entire stack, with integrated power management that dynamically adjusts the power draw based on real-time demand. TPU 8t and TPU 8i deliver up to two times better performance-per-watt over the previous generation, Ironwood.

But efficiency at Google is not just a chip-level metric; it’s also a system-level commitment that runs from silicon to the data center. For example, we integrate network connectivity with compute on the same chip, significantly reducing the power costs of moving data across the TPU pod. Even our data centers are co-designed with our TPUs. We innovated across hardware and software to enable our data centers to deliver six times more computing power per unit of electricity than they did just five years ago.

TPU 8t and TPU 8i continue that trajectory. Both are supported by our fourth-generation liquid cooling technology that sustains performance densities air cooling cannot. By owning the full stack, from Axion host to accelerator, we can optimize system-level energy efficiency in ways that simply cannot be achieved when the host and chip are designed independently.