GPU机架功耗密度，2015

GPU机架功耗密度，2015–2025
GPU Rack Power Density, 2015–2025

原始链接: https://syaala.com/blog/gpu-rack-density-timeline-2026

## 下一代GPU的散热挑战英伟达的Blackwell GPU，每块功耗高达1000W，正在推动机架功耗密度急剧增加——从15kW到未来十年内可能高达240kW。这种热量产生的激增使得传统的风冷散热不足，它在物理上难以散热超过每机架50-100kW。简单来说，液冷对于现代AI基础设施来说不再是可选方案，而是由物理定律决定的必需品。存在三种主要的液冷方法：**后门热交换器**（30-50kW，改造选项）、**直连芯片液冷**（100-200+kW，英伟达对GB200的推荐）和**浸没式冷却**（200+kW，需要重大基础设施变更）。 2026-2027年的AI部署规划*必须*优先考虑散热。H100/H200 GPU *可能*可以使用高密度风冷，但Blackwell (GB200) 需要液冷，下一代系统将需要更先进的液冷解决方案。仔细评估散热需求对于成功的AI基础设施部署至关重要。

## GPU 功率与散热危机最近的 Hacker News 讨论强调了一个迅速升级的热管理问题，该问题由人工智能革命驱动。GPU 功率密度已大幅增加——从 2017 年的每机架 15kW 到 2026 年预计的 240kW。这主要归功于 NVIDIA 的新型 Blackwell GPU，每芯片可达 1,000 瓦。核心问题是什么？由于空气的热导率限制，风冷无法处理超过每机架 50-100kW 的热量。这意味着液冷不再是可选方案，而是现代人工智能基础设施的*物理要求*。三种液冷方案正在出现：后门热交换器（用于适度增加）、直接芯片冷却（NVIDIA 对当前 GB200 系统的推荐）和浸没式冷却（用于最高密度）。为下一代 GPU 规划的设施必须优先考虑先进的液冷解决方案，以避免过热并确保可靠的性能。

原文

1,000W

Per Blackwell Chip

132kW

Current Rack Density

50-100kW

Air Cooling Limit

The Physics Problem

NVIDIA's latest Blackwell GPUs generate up to 1,000 watts per chip - over three times more heat than GPUs from just seven years ago. Traditional air cooling physically cannot dissipate heat at these densities. Above 50-100kW per rack, liquid cooling isn't optional - it's physics.

Sources: Lombard Odier, Tom's Hardware, Data Center Dynamics

The Power Density Evolution

Understanding how we got here helps contextualize the infrastructure challenge. In less than a decade, rack power density has increased nearly 10x for AI workloads.

15 kW per rack

Standard enterprise workloads

40-60 kW per rack

AI workloads with H100 GPUs

132 kW per rack

NVIDIA GB200 NVL72 systems

240 kW per rack

Next-generation systems (expected)

Why Air Cooling Fails

Air has fundamental limitations as a heat transfer medium. Its thermal conductivity is roughly 25 times lower than water. At densities above 50-100kW per rack, you simply cannot move enough air through the system to dissipate heat effectively.

Critical Threshold

Traditional air cooling cannot dissipate heat at current GPU densities. Air cooling fails above 50-100kW per rack. Current GB200 systems operate at 132kW. Next-generation systems will push to 240kW.

Source: Data Center Dynamics, "Data centers: The ten main trends for 2026"

The implications are straightforward: any facility planning to deploy current-generation or next-generation GPU infrastructure must plan for liquid cooling. This is not a feature preference - it's a physical requirement.

Liquid Cooling Approaches

Three primary approaches address high-density cooling requirements:

Rear-Door Heat Exchangers (RDHx)

Capacity: 30-50 kW per rack

Retrofit solution for existing facilities. Captures heat at the rack exhaust. Suitable for moderate density increases but insufficient for current GPU requirements.

Direct-to-Chip Liquid Cooling

Capacity: 100-200+ kW per rack

Cold plates directly attached to CPU/GPU surfaces. Most efficient heat capture at the source. Required for high-density AI workloads. This is what NVIDIA recommends for GB200 deployments.

Immersion Cooling

Capacity: 200+ kW per rack

Servers fully submerged in dielectric fluid. Highest density support possible. Requires significant operational changes and specialized equipment.

What This Means for Planning

If you're planning AI infrastructure for 2026-2027, cooling strategy is not optional:

GPU Generation	Rack Density	Cooling Requirement
H100/H200	40-80 kW	High-density air may work
GB200 (Blackwell)	132 kW	Liquid cooling required
Next-gen (2026+)	240 kW	Advanced liquid cooling mandatory

Assess Your Cooling Requirements

Download our infrastructure evaluation checklist to assess cooling needs for your AI workloads.

Download Evaluation Checklist

Sources Cited

1. Lombard Odier - "AI supercharges the race" (January 2026)
2. Tom's Hardware - "The data center cooling state of play"
3. Data Center Dynamics - "Data centers: The ten main trends for 2026"
4. MLQ AI - Data Center Cooling Market Research
5. Schneider Electric - Liquid Cooling Reference Designs