AMD 资助了基于 ROCm 的嵌入式 CUDA 实现：现已开源

AMD 资助了基于 ROCm 的嵌入式 CUDA 实现：现已开源
AMD funded a drop-in CUDA implementation built on ROCm: It's now open-source

原始链接: https://www.phoronix.com/review/radeon-cuda-zluda

最近的技术更新涉及为 Radeon GPU 创建嵌入式 CUDA 实现，它可以更好地集成 AMD 的 Radeon 开放计算 (ROCm) 平台和 NVIDIA 的 CUDA API。当在 AMD 架构上运行时，此实现为许多流行的支持 NVIDIA CUDA 的软件包提供了几乎相同的行为，使用户可以在平台之间无缝切换。该解决方案名为 ZLUDA（之前发布是为了在英特尔显卡上启用 CUDA 支持），在前英特尔员工、现任 AMD 承包商 Andrzej Janik 的领导下，在过去两年的开发过程中经历了重大改进。尽管该功能尚未完成，但早期迹象表明它为完全开源、跨平台的单一开发人员项目提供了令人惊讶的令人印象深刻的功能。该实现的许可证包括 Apache 2.0 或 MIT 许可证的选项，并且自始至终都使用 Rust。值得注意的是，Radeon 显卡之前隐藏的身份（仅显示为“图形设备”）很快将通过更新版本显露出来。

Hacker News 的一些读者似乎对 AMD 不专注于计算机 GPU 市场的决定提出质疑，特别是在 Nvidia 继续在人工智能和数据中心领域占据主导地位的情况下。随后讨论了 AMD 选择的因素，包括其他行业的高利润率和欠发达市场的供应限制。有人认为，基于简单的数学，专注于服务不足的市场是一个无需动脑筋的决定。读者强调基于 GPU 的计算的流行，指出 Nvidia 的年收入迅速上升至 40 亿美元。另一位读者指出，虽然过度饱和的 GPU 市场中的机会可能较少，但与以 CPU 为中心的企业相比，与之相关的高利润对公司利润的贡献要大得多。其他评论表明，鉴于英特尔在 Alchemist GPU 上取得的成功，AMD 必须认识到，进一步投资 GPU 计算可能会带来重大优势。总体而言，虽然有些人认为竞争可以降低客户的成本，但最终，商业选择往往是出于经济动机，而不是为了服务消费者或满足技术需求。

原文

While there have been efforts by AMD over the years to make it easier to port codebases targeting NVIDIA's CUDA API to run atop HIP/ROCm, it still requires work on the part of developers. The tooling has improved such as with HIPIFY to help in auto-generating but it isn't any simple, instant, and guaranteed solution -- especially if striving for optimal performance. Over the past two years AMD has quietly been funding an effort though to bring binary compatibility so that many NVIDIA CUDA applications could run atop the AMD ROCm stack at the library level -- a drop-in replacement without the need to adapt source code. In practice for many real-world workloads, it's a solution for end-users to run CUDA-enabled software without any developer intervention. Here is more information on this "skunkworks" project that is now available as open-source along with some of my own testing and performance benchmarks of this CUDA implementation built for Radeon GPUs.

From several years ago you may recall ZLUDA that was for enabling CUDA support on Intel graphics. That open-source project aimed to provide a drop-in CUDA implementation on Intel graphics built atop Intel oneAPI Level Zero. ZLUDA was discontinued due to private reasons but it turns out that the developer behind that (and who was also employed by Intel at the time), Andrzej Janik, was contracted by AMD in 2022 to effectively adapt ZLUDA for use on AMD GPUs with HIP/ROCm. Prior to being contracted by AMD, Intel was considering ZLUDA development. However, they ultimately turned down the idea and did not provide funding for the project.

Andrzej Janik spent the past two years bringing ZLUDA to Radeon GPUs and it works: many CUDA software can run on HIP/ROCm without any modifications -- or other processes... Just run the binaries as you normally would while ensuring that the ZLUDA library replacements to CUDA are loaded. For reasons unknown to me, AMD decided this year to discontinue funding the effort and not release it as any software product. But the good news was that there was a clause in case of this eventuality: Janik could open-source the work if/when the contract ended.

Radeon ZLUDA Git commit

Andrzej Janik reached out and provided access to the new ZLUDA implementation for AMD ROCm to allow me to test it out and benchmark it in advance of today's planned public announcement. I've been testing it out for a few days and it's been a positive experience: CUDA-enabled software indeed running atop ROCm and without any changes. Even proprietary renderers and the like working with this "CUDA on Radeon" implementation.

The ZLUDA implementation though isn't 100% fail-safe as NVIDIA OptiX support not being fully supported and some features such as software not using PTX assembly code isn't currently handled. But for the most part this implementation is surprisingly capable for being a single developer effort.

For those wondering about the open-source code, it's dual-licensed under either Apache 2.0 or MIT. Rust fans will be excited to know the Rust programming language is leveraged for this Radeon implementation.

NOTE: In my screenshots and for the past two years of development the exposed device name for Radeon GPUs via CUDA has just been "Graphics Device" rather than the actual AMD Radeon graphics adapter with ROCm. The reason for this has been due to CUDA benchmarks auto-reporting results and other software that may have automated telemetry, to avoid leaking the fact of Radeon GPU use under CUDA, it's been set to the generic "Graphics Device" string. I'm told as part of today's open-sourcing of this ZLUDA on Radeon code that the change will be in place to expose the actual Radeon graphics card string rather than the generic "Graphics Device" concealer.

AMD 资助了基于 ROCm 的嵌入式 CUDA 实现：现已开源 AMD funded a drop-in CUDA implementation built on ROCm: It's now open-source

AMD 资助了基于 ROCm 的嵌入式 CUDA 实现：现已开源
AMD funded a drop-in CUDA implementation built on ROCm: It's now open-source