铁木：面向推理时代的首款谷歌TPU

铁木：面向推理时代的首款谷歌TPU
Ironwood: The first Google TPU for the age of inference

原始链接: https://blog.google/products/google-cloud/ironwood-tpu-age-of-inference/

在谷歌云大会 Next '25 上，谷歌发布了 Ironwood，这是他们的第七代张量处理单元 (TPU)，专为 AI 推理而设计。作为谷歌性能最高、可扩展性最强的定制 AI 加速器，Ironwood 标志着向能够生成见解和解释，而不仅仅是数据的主动式 AI 智能体的转变。这个“推理时代”得益于 Ironwood 处理生成式 AI 的计算和通信需求的能力。 Ironwood 通过突破性的芯片间互连 (ICI) 网络连接多达 9216 个液冷芯片，是谷歌云 AI 超级计算机的关键组成部分。这种架构针对高要求的 AI 工作负载优化了硬件和软件。开发人员可以使用谷歌的 Pathways 软件栈轻松利用数万个 Ironwood TPU 的强大功能。Ironwood 承诺为 AI 训练和服务提供无与伦比的性能、成本效益和功耗效率。

Hacker News 最新 | 过去 | 评论 | 提问 | 展示 | 工作 | 提交登录 Ironwood：首个面向推理时代的谷歌TPU（blog.google） meetpateltech 1小时前 16分 | 隐藏 | 过去 | 收藏 | 1条评论 fancyfredbot 2分钟前 [–] 看起来很棒，但我希望我们能够停止玩这些无聊的基准测试游戏。为什么将FP8性能与不支持FP8的架构进行比较？为什么在比较中省略TPUv6？为什么将El Capitan超级计算机中的FP64浮点运算次数与TPU中的FP8浮点运算次数进行比较？尤其是在考虑到FP64比FP8难约8倍的情况下，你们的TPU仍然更快？回复加入我们，参加6月16日至17日在旧金山举办的AI创业学校！指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系我们搜索：

（评论） 2024-03-27

Gemini：谷歌迄今为止最强大的人工智能模型 2023-12-07

英特尔Gaudi 3人工智能加速器 2024-04-10

（评论） 2025-03-20

原文

Today at Google Cloud Next 25, we’re introducing Ironwood, our seventh-generation Tensor Processing Unit (TPU) — our most performant and scalable custom AI accelerator to date, and the first designed specifically for inference. For more than a decade, TPUs have powered Google’s most demanding AI training and serving workloads, and have enabled our Cloud customers to do the same. Ironwood is our most powerful, capable and energy efficient TPU yet. And it's purpose-built to power thinking, inferential AI models at scale.

Ironwood represents a significant shift in the development of AI and the infrastructure that powers its progress. It’s a move from responsive AI models that provide real-time information for people to interpret, to models that provide the proactive generation of insights and interpretation. This is what we call the “age of inference” where AI agents will proactively retrieve and generate data to collaboratively deliver insights and answers, not just data.

Ironwood is built to support this next phase of generative AI and its tremendous computational and communication requirements. It scales up to 9,216 liquid cooled chips linked with breakthrough Inter-Chip Interconnect (ICI) networking spanning nearly 10 MW. It is one of several new components of Google Cloud AI Hypercomputer architecture, which optimizes hardware and software together for the most demanding AI workloads. With Ironwood, developers can also leverage Google’s own Pathways software stack to reliably and easily harness the combined computing power of tens of thousands of Ironwood TPUs.

Here’s a closer look at how these innovations work together to take on the most demanding training and serving workloads with unparalleled performance, cost and power efficiency.

铁木：面向推理时代的首款谷歌TPU Ironwood: The first Google TPU for the age of inference

铁木：面向推理时代的首款谷歌TPU
Ironwood: The first Google TPU for the age of inference