在 Linux 上将 Nvidia GPU 的显存用作交换空间（Swap）

在 Linux 上将 Nvidia GPU 的显存用作交换空间（Swap）
Use your Nvidia GPU's VRAM as swap space on Linux

原始链接: https://github.com/c0dejedi/nbd-vram

该项目允许 Linux 用户将 NVIDIA GPU 的显存（VRAM）用作额外的交换空间（swap），从而为板载内存受限的笔记本电脑提供显著的内存扩展。通过利用 CUDA 驱动 API 和 NBD（网络块设备）协议，该方案绕过了消费级 GeForce 显卡在 NVIDIA P2P API 上的限制。该系统建立了一个溢出层级：首先填充内存，接着使用高速显存（通过 PCIe），然后是压缩的 zram，最后是 SSD。由于它使用的是标准 NBD 驱动程序而非自定义内核模块，因此该解决方案在内核和驱动程序更新后仍能保持稳定。 **主要特性：** * **性能：** 实现约 1.3 GB/s 的吞吐量，优于传统的 NVMe SSD 交换空间。 * **灵活性：** 根据可用性自动调整显存分配；包含可选的电源感知管理功能，可在断开电源时禁用交换。 * **兼容性：** 适用于任何支持 CUDA 的 NVIDIA GPU。 * **简洁性：** 无需特殊的内核或 NVIDIA 符号；通过简单的脚本即可完成安装。对于希望在硬件升级路径有限的情况下最大化可寻址内存的用户，此工具非常理想。该项目为开源（MIT 协议），可在 [GitHub](https://github.com/c0dejedi/nbd-vram) 获取。

GitHub 上最近出现的一个项目允许 Linux 用户将闲置的 Nvidia 显存（VRAM）用作交换空间（Swap）。该工具主要针对内存板载不可升级的笔记本电脑，旨在减少 SSD 损耗，并为那些原本需要依赖速度较慢的存储设备作为溢出内存的系统提升性能。 Hacker News 上的讨论反应不一。支持者认为这是一种巧妙利用闲置显存的方法，尤其是对于那些拥有大容量显存但并不总是运行高负载 AI 或游戏任务的用户而言。然而，技术评论者指出了其面临的显著效率障碍。由于该实现运行在用户空间并依赖 NBD 驱动程序，因此目前存在高延迟、大量的上下文切换开销以及 PCIe 总线利用率低的问题。因此，尽管这一概念在特定利基用例中具有理论价值，但其目前的性能仍慢于交换到现代 NVMe 固态硬盘。参与者还讨论了交换空间在现代 Linux 系统中的作用，一些人认为 ZRAM 等更好的内存管理方案对于小内存设备更为有效；另一些人则指出，诸如 CXL 之类的未来标准可能会使基于 PCIe 的内存扩展变得更加实用且具备缓存一致性。

原文

Use your NVIDIA GPU's VRAM as swap space on Linux.

Built for laptops with soldered memory and no upgrade path. If you have an RTX card sitting there with 8GB of VRAM and you're getting swapped to SSD, this puts that VRAM to work.

Tested on: RTX 3070 Laptop (GA104M, 16 GB physical, 8 GB VRAM), driver 580.159.03, kernel 6.17, Pop!_OS. Allocated 7 GB for swap. End result including zram and SSD swap ~46 GB, tripled the addressable memory. Overflow order is: RAM fills, then VRAM absorbs spill (fast, PCIe), then zram compresses the rest (CPU), then SSD only if everything else is exhausted.

A small daemon allocates VRAM via the CUDA driver API, then serves it as a block device using the NBD (Network Block Device) protocol over a Unix socket. The kernel's built-in nbd driver connects to it and exposes /dev/nbdX. From there it's a normal swap device.

Data path: kernel swap subsystem - /dev/nbdX - nbd kernel driver - Unix socket - nbd-vram daemon - cuMemcpyHtoD/DtoH - GPU VRAM.

No kernel module to write or maintain. No NVIDIA kernel symbols. Survives kernel and driver updates without rebuilding anything.

Why not the NVIDIA P2P API?

The "obvious" approach is nvidia_p2p_get_pages_persistent, which pins VRAM pages in BAR1 so the CPU can access them directly via ioremap_wc. Every existing project that tried this route hits the same wall: the NVIDIA driver returns EINVAL on consumer GeForce GPUs. Both the persistent and non-persistent variants, both flag values. It's gated at the RM level for Quadro/datacenter SKUs only, regardless of driver version.

The other approach - directly ioremap_wc the BAR1 physical address without going through the P2P API - also doesn't work. The GPU's internal page tables only have ~16 MiB of BAR1 mapped (just the display framebuffer). Reads from the rest return zeros. mkswap appears to succeed, then swapon fails because the swap header isn't actually there.

The NBD approach sidesteps all of this. cuMemcpyHtoD and cuMemcpyDtoH work on any CUDA GPU without any special permissions.

NVIDIA GPU with CUDA support (any consumer RTX/GTX card)
NVIDIA driver with libcuda.so.1 (no CUDA toolkit needed)
Linux kernel 3.0+ (nbd module, built into most distros)
nbd-client package
gcc, make

git clone https://github.com/c0dejedi/nbd-vram
cd nbd-vram
sudo ./install.sh
sudo systemctl start vram-swap-nbd

Verify:

swapon --show
# NAME       TYPE      SIZE USED PRIO
# /dev/nbd0  partition   7G   0B 1500

The service is enabled on install, so it comes up automatically on every boot.

Edit /etc/systemd/system/vram-swap-nbd.service:

Environment=VRAM_SETUP_SIZE_MB=7168    # how much VRAM to use
Environment=VRAM_SWAP_PRIORITY=1500   # swap priority (higher = used first)

The daemon tries the requested size first and backs off in 512 MiB steps if the GPU is short on memory - so it will grab as much as it can even if the display compositor is already loaded. VRAM_SETUP_SIZE_MB is the ceiling, not a hard requirement.

After changing, run sudo systemctl daemon-reload && sudo systemctl restart vram-swap-nbd.

The installer asks whether to enable power-aware management on first install. If enabled, the service automatically stops when you unplug from AC (or when battery drops below a threshold), and restarts when power is restored. Manual systemctl stop is always respected and won't be overridden.

To change settings after install, edit /etc/nbd-vram.conf. Changes take effect on the next poll (within 60 seconds) or immediately on the next AC plug/unplug event.

Smoke test (without installing)

Allocates VRAM, connects the NBD device, does a 1 MiB write/readback check, activates swap, then prints teardown instructions. install.sh handles teardown automatically if a test instance is running.

To stress the full partition after the smoke test passes:

Writes the entire VRAM partition with zeros, verifies a sample read back, then auto-restores swap on exit.

Measured on RTX 3070 Laptop via test-fill.sh (7 GiB sequential write, 4M blocks):

Sequential throughput: ~1.3 GB/s
Latency is lower than NVMe since the path goes through PCIe to GPU rather than storage

For laptops already using zram, set VRAM swap at a higher priority so it absorbs overflow before hitting SSD.

MIT - Sean Lobjoit (c0dejedi)