NVIDIA 全面转向开源 Linux GPU 内核模块
NVIDIA Transitions Fully Towards Open-Source Linux GPU Kernel Modules

原始链接: https://developer.nvidia.com/blog/nvidia-transitions-fully-towards-open-source-gpu-kernel-modules/

2022 年 5 月,NVIDIA 在 GPL 和 MIT 许可下推出了名为 R515 驱动程序的开源 Linux GPU 内核模块。 最初,它的目标是数据中心 GPU,而 GeForce 和工作站 GPU 还处于 alpha 阶段。 目标是在未来版本中为 GeForce 和工作站 GPU 提供全面且高性能的 Linux 支持。 自推出以来,开源 GPU 内核模块已显示出可比或卓越的应用程序性能,并获得了异构内存管理 (HMM) 和机密计算等附加功能。 现在,经过两年的开发,NVIDIA 计划在即将发布的 R560 驱动程序中专门使用开源 GPU 内核模块。 由于兼容性问题,并非所有图形处理单元 (GPU) 都可以与这些开源模块一起使用; 只有 NVIDIA 的 Grace Hopper 或 Blackwell 等尖端平台才需要它们。 基于 Turing、Ampere、Ada Lovelace 或 Hopper 架构的新型 GPU 应该会实现这一转变。 基于 Maxwell、Pascal 或 Volta 架构的较旧 GPU 无法使用开源模块,应坚持使用专有驱动程序。 在同一系统中使用较旧和较新 GPU 的混合部署仍然需要依赖专有驱动程序。 对于那些不确定使用哪个驱动程序的人,NVIDIA 提供了一个有用的脚本来帮助选择正确的驱动程序。 此外,所有安装方法中默认安装的驱动程序已从专有切换为开源。 在将包管理器与 CUDA 元包或其他非标准过程结合使用时,某些情况可能需要特别考虑。 此外,对于用户在运行文件中选择专有驱动程序选项或使用 Ansible 等自动化工具时,安装过程也有一些细微的变化。 如需进一步指导,请参阅本文的“使用安装帮助程序脚本”部分以及 NVIDIA 的驱动程序安装文档。

最近,Nvidia 宣布计划开源部分 Nvidia 显卡驱动程序,特别是内核部分,同时关闭大部分用户空间驱动程序。 此举旨在带来诸如提高内核灵活性等优势,允许 Nvidia GPU 内的时钟恢复功能,但它也带来了诸如不稳定的应用程序二进制接口 (ABI) 和大固件文件大小等挑战。 这些问题给希望更新和测试开源驱动程序版本的开发人员带来了困难,在合并固件更新时需要格外小心。 这一决定似乎主要是由历史原因驱动的,而不是完全避免开源。 早期的 Nvidia 系统在专业级显卡和消费级显卡中包含相同的硬件,导致用户将专业 BIOS 刷新到更便宜的显卡上。 为了防止销售损失,Nvidia 采取了一些措施,例如在主板中嵌入电阻器来识别卡的类型,并最终开发出更复杂的方法来阻止 BIOS 刷新。 由于专业卡和游戏卡现在使用不同的硬件,Nvidia 可以安全地开始发布开源驱动程序,而不必担心 BIOS 兼容性问题。
相关文章

原文

With the R515 driver, NVIDIA released a set of Linux GPU kernel modules in May 2022 as open source with dual GPL and MIT licensing. The initial release targeted datacenter compute GPUs, with GeForce and Workstation GPUs in an alpha state. 

At the time, we announced that more robust and fully-featured GeForce and Workstation Linux support would follow in subsequent releases and the NVIDIA Open Kernel Modules would eventually supplant the closed-source driver. 

NVIDIA GPUs share a common driver architecture and capability set. The same driver for your desktop or laptop runs the world’s most advanced AI workloads in the cloud. It’s been incredibly important to us that we get it just right. 

Two years on, we’ve achieved equivalent or better application performance with our open-source GPU kernel modules and added substantial new capabilities:

  • Heterogeneous memory management (HMM) support
  • Confidential computing
  • The coherent memory architectures of our Grace platforms
  • And more

We’re now at a point where transitioning fully to the open-source GPU kernel modules is the right move, and we’re making that change in the upcoming R560 driver release.

Supported GPUs

Not every GPU is compatible with the open-source GPU kernel modules.

For cutting-edge platforms such as NVIDIA Grace Hopper or NVIDIA Blackwell, you must use the open-source GPU kernel modules. The proprietary drivers are unsupported on these platforms.

For newer GPUs from the Turing, Ampere, Ada Lovelace, or Hopper architectures, NVIDIA recommends switching to the open-source GPU kernel modules.

For older GPUs from the Maxwell, Pascal, or Volta architectures, the open-source GPU kernel modules are not compatible with your platform. Continue to use the NVIDIA proprietary driver.

For mixed deployments with older and newer GPUs in the same system, continue to use the proprietary driver.

If you are not sure, NVIDIA provides a new detection helper script to help guide you on which driver to pick. For more information, see the Using the installation helper script section later in this post.

Installer changes

In general, the default version of the driver installed by all installation methods is switching from the proprietary driver to the open-source driver. There are a few specific scenarios that deserve special attention:

  • Package managers with the CUDA metapackage
  • Runfile
  • Installation helper script
  • Package manager details
  • Windows Subsystem for Linux
  • CUDA Toolkit

Using package managers with the CUDA metapackage

When you are installing CUDA Toolkit using a package manager (not the .run file), installation metapackages exist and are commonly used. By installing a top-level cuda package, you install a combination of CUDA Toolkit and the associated driver release. For example, by installing cuda during the CUDA 12.5 release time frame, you get the proprietary NVIDIA driver 555 along with CUDA Toolkit 12.5. 

Figure 1 shows this package structure.

Previously, using the open-source GPU kernel modules would mean that you could use the top-level metapackage. You would have had to install the distro-specific NVIDIA driver open package along with the cuda-toolkit-X-Y package of your choice.

Beginning with the CUDA 12.6 release, the flow effectively switches places (Figure 2).

Using the runfile

If you install CUDA or the NVIDIA drivers using the .run file, the installer queries your hardware and automatically installs the best-fit driver for your system. UI toggles are also available to select between the proprietary driver and the open source driver, as you choose.

If you’re installing through the CUDA .run file and using the ncurses user interface, you now see a menu similar to the following:

┌──────────────────────────────────────────────────────────────────────────────┐
│ CUDA Driver                                                                  │
│   [ ] Do not install any of the OpenGL-related driver files                  │
│   [ ] Do not install the nvidia-drm kernel module                            │
│   [ ] Update the system X config file to use the NVIDIA X driver             │
│ - [X] Override kernel module type                                            │
│      [X] proprietary                                                         │
│      [ ] open                                                                │
│   Change directory containing the kernel source files                        │
│   Change kernel object output directory                                      │
│   Done                                                                       │
│                                                                              │
│                                                                              │
│                                                                              │
│ Up/Down: Move | Left/Right: Expand | 'Enter': Select | 'A': Advanced options │
└──────────────────────────────────────────────────────────────────────────────┘

If you’re installing through the driver .run file, you see a similar choice presented (Figure 3).

You can also pass overrides using the command line to install without the user interface or if you are using automation tools such as Ansible.

# sh ./cuda_12.6.0_560.22_linux.run --override --kernel-module-type=proprietary

# sh ./NVIDIA-Linux-x86_64-560.run --kernel-module-type=proprietary

Using the installation helper script

As mentioned earlier, if you’re unsure which driver to pick for the GPUs in your system, NVIDIA created a helper script to guide you through the selection process. 

To use it, first install the nvidia-driver-assistant package with your package manager, then run the script:

$ nvidia-driver-assistant

Package manager details

For a consistent experience, NVIDIA recommends that you use package managers to install CUDA Toolkit and the drivers. However, the specific details of which package management systems are used by different distributions or how packages are structured can vary depending on your particular distribution. 

This section outlines the specific details, caveats, or migration steps needed for various platforms. 

apt: Ubuntu and Debian-based distributions

Run the following command:

$ sudo apt-get install nvidia-open

To upgrade using the cuda metapackage on Ubuntu 20.04, first switch to open kernel modules:

$ sudo apt-get install -V nvidia-kernel-source-open

$ sudo apt-get install nvidia-open

dnf: Red Hat Enterprise Linux, Fedora, Kylin, Amazon Linux, or Rocky Linux

Run the following command:

$ sudo dnf module install nvidia-driver:open-dkms

To upgrade using the cuda metapackage on dnf-based distros, module streams must be disabled:

$ echo "module_hotfixes=1" | tee -a /etc/yum.repos.d/cuda*.repo
$ sudo dnf install --allowerasing nvidia-open
$ sudo dnf module reset nvidia-driver

zypper: SUSE Linux Enterprise Server, or OpenSUSE

Run one of the following commands:

# default kernel flavor
$ sudo zypper install nvidia-open
# azure kernel flavor (sles15/x86_64)
$ sudo zypper install nvidia-open-azure
# 64kb kernel flavor (sles15/sbsa) required for Grace-Hopper
$ sudo zypper install nvidia-open-64k

Package manager summary

For simplification, we’ve condensed the package manager recommendations in table format. All releases beyond driver version 560 and CUDA Toolkit 12.6 will use these packaging conventions.

DistroInstall the latest Install a specific release 
Fedora/RHEL/Kylindnf module install nvidia-driver:open-dkmsdnf module install nvidia-driver:560-open
openSUSE/SLESzypper install nvidia-open{-azure,-64k}zypper install nvidia-open-560{-azure,-64k}
Debianapt-get install nvidia-openapt-get install nvidia-open-560
Ubuntuapt-get install nvidia-openapt-get install nvidia-open-560
Table 1. Package manager installation recommendations

For more information, see NVIDIA Datacenter Drivers.

Windows Subsystem for Linux

Windows Subsystem for Linux (WSL) uses the NVIDIA kernel driver from the host Windows operating system. You shouldn’t install any driver into this platform specifically. If you are using WSL, no change or action is required.

CUDA Toolkit

The installation of CUDA Toolkit remains unchanged through package managers. Run the following command:

$ sudo apt-get/dnf/zypper install cuda-toolkit

More information

For more information about how to install NVIDIA drivers or the CUDA Toolkit, including how to ensure that you install the proprietary drivers if you’re unable to migrate to the open-source GPU kernel modules at this time, see Driver Installation in the CUDA Installation Guide.

联系我们 contact @ memedata.com