芙莉欧莎：效率比H100高3.5倍

芙莉欧莎：效率比H100高3.5倍
Furiosa: 3.5x efficiency over H100s

原始链接: https://furiosa.ai/blog/introducing-rngd-server-efficient-ai-inference-at-data-center-scale

## NXT RNGD 服务器：现有数据中心的 AI 基础设施 NXT RNGD 服务器是一款高性能 AI 系统，专为轻松集成到标准数据中心环境而设计。它基于 Furiosa 的 RNGD 加速器（最高 4 petaFLOPS FP8），以惊人的功耗效率加速 AI 工作负载——每个系统仅运行 3kW，避免了昂贵的液冷升级需求。该服务器预装 Furiosa SDK 和 LLM 运行时，可实现快速部署，并支持 Kubernetes 和 Helm 等流行框架。它拥有 384GB HBM3 内存和充足的存储空间，所有组件均通过标准 PCIe 连接，无需专有基础设施。 LG AI Research 已经验证了该服务器，在他们的 EXAONE 模型上表现出色。NXT RNGD 服务器为需要本地数据控制和可扩展性的企业提供了一种经济高效的解决方案。它有望满足对 AI 基础设施日益增长的需求，尤其是在 80% 以上采用风冷且电力受限的数据中心。目前正在接受订单，预计 2026 年 1 月交付。

## Furiosa AI 芯片与推理的未来来自 Furiosa.ai 的一款新 AI 芯片正在 Hacker News 上引发讨论，声称在特定且不常用的 PCIe 配置下，其效率比 Nvidia 的 H100 GPU 高 3.5 倍。核心争论在于 Nvidia 是否正在遭遇功耗/散热瓶颈，需要大规模且昂贵的的数据中心升级才能继续扩展 LLM。评论员认为 Furiosa 的架构很有前景，*因为*它不需要全新的基础设施。然而，对于基准测试的范围（专注于较旧的 Llama 模型）以及在不同模型和工作负载中的实际泛化能力仍然存在怀疑。一个关键问题是当前 AI 基础设施的经济可行性。一些人认为 LLM 的价值并不足以证明专用“AI 数据中心”的巨大成本。讨论还涉及 TSMC 的潜在收益（摆脱对 Nvidia 的过度依赖）以及 Furiosa 芯片对于 AI 领域以外的组织的实用性，并质疑潜在的生态系统锁定。

原文

Built around our RNGD accelerators, NXT RNGD Server is an optimized system that delivers high performance on today’s most important AI workloads while fitting seamlessly into existing data center environments.

With NXT RNGD Server, enterprises can move from experimentation to deployment faster than ever. The system ships with the Furiosa SDK and Furiosa LLM runtime preinstalled, so applications can serve immediately upon installation. We optimized the platform over standard PCIe interconnects, eliminating the need for proprietary fabrics or exotic infrastructure.

Designed for compatibility, NXT RNGD Server runs at just 3 kW per system, allowing organizations to scale AI within the power and cooling limits of most modern facilities. This makes NXT RNGD Server a practical and cost-effective system to build out AI factories inside the data centers enterprises already operate.

Technical Specifications

Compute: Up to 8 × RNGD accelerators (4 petaFLOPS FP8 per server) with dual AMD EPYC processors. Supports BF16, FP8, INT8, and INT4
Memory: 384 GB HBM3 (12 TB/s bandwidth) plus 1 TB DDR5 system memory
Storage: 2 × 960 GB NVMe M.2 (OS), 2 × 3.84 TB NVMe U.2 (internal)
Networking: 1G management NIC plus 2 × 25G data NICs
Power & Cooling: 3 kW system power, redundant 2,000 W Titanium PSUs, air-cooled
Security & Management: Secure Boot, TPM, BMC attestation, dual management paths (PCIe + I2C)
Software: Preinstalled Furiosa SDK and Furiosa LLM runtime with native Kubernetes and Helm integration

Real-world benefits and proven performance

NXT RNGD Server’s superior power efficiency significantly lowers businesses’ TCO. Enterprise customers can run advanced AI efficiently at scale within current infrastructure and power limitations – using on-prem servers or cloud data centers. This is crucial for leveraging existing infrastructure, since more than 80% of data centers today are air-cooled and operate at 8 kW per rack or less.

For businesses with sensitive workloads, regulatory compliance requirements, or enhanced privacy and security needs, NXT RNGD Server offers complete control over enterprise data, with model weights running entirely on local infrastructure.

Global enterprises have validated NXT RNGD Server’s performance. In July, LG AI Research announced that it has adopted RNGD for inference computing with its EXAONE models. Running LG’s EXAONE 3.5 32B model on a single server with four RNGD cards and a batch size of one, LG AI Research achieved 60 tokens/second with a 4K context window and 50 tokens/second with a 32K context window.

We are now working with LG AI Research to supply NXT RNGD servers to enterprises using EXAONE across key sectors, including electronics, finance, telecommunications, and biotechnology.

Making rapid deployment of advanced AI available to everyone

With global data center demand at 60 GW in 2024 and expected to triple by the end of the decade, the industry faces a once-in-a-generation transformation. More than 80 percent of facilities today are air-cooled and operate at 8 kW per rack or less, making them poorly suited for GPU-based systems that require liquid cooling and 10 kW+ per server.

NXT RNGD Server provides a practical path forward. It allows organizations to deploy advanced AI within their existing facilities, without prohibitive energy costs or disruptive retrofits. Engineered as a plug-and-play system, NXT RNGD combines AI-optimized silicon with Furiosa LLM, a vLLM-compatible serving framework featuring built-in OpenAI API support, enabling organizations to deploy and scale AI workloads from day one.

By combining silicon and system design, NXT RNGD Server makes efficient, enterprise-ready, and future-proof AI infrastructure a reality.

Availability

We are taking inquiries and orders for January 2026.

Download the datasheet here and sign up for RNGD updates here.