使用GLIBC硬件功能的轻松动态调度
Easy dynamic dispatch using GLIBC Hardware Capabilities

原始链接: https://www.kvr.at/posts/easy-dynamic-dispatch-using-GLIBC-hardware-capabilities/

GLIBC HWCAPS(在2.33中引入)提供了一种基于CPU功能动态优化库的直接方法。您可以构建针对不同CPU指令集的多个版本(例如X86-64-V2,X86-64-V3,X86-64-V4)。这些优化的库放在标准库路径下的子目录中,以相应的CPU级别命名(例如`/usr/lib/lib/lib/glibc-hwcaps/x86-64-v4/libfoo.so`)。 Dynamic Linker/Loader会在运行时自动选择最高支持的版本。如果CPU支持AVX512(X86-64-V4),则该版本已加载;否则,它降至较低的水平。针对最低支持级别(X86-64-V1)的基本库位于标准库路径中,可确保缺乏HWCAP的系统的兼容性。 这种方法通过利用可用的CPU功能,同时保持广泛的兼容性来提供性能优势。当使用RunPath或LD_Library_Path时,这也可以与专用图书馆目录一起使用。

相关文章

原文

TL;DR With GLIBC 2.33+, you can build a shared library multiple times targeting various optimization levels, and the dynamic linker/loader will pick the highest version supported by the current CPU. For example, with the layout below, on a Ryzen 9 5900X, x86-64-v3/libfoo0.so would be loaded:

/usr/lib/glibc-hwcaps/x86-64-v4/libfoo0.so
/usr/lib/glibc-hwcaps/x86-64-v3/libfoo0.so
/usr/lib/glibc-hwcaps/x86-64-v2/libfoo0.so
/usr/lib/libfoo0.so

Longer Version

GLIBC Hardware Capabilities or "hwcaps" are an easy, almost trivial way to add a simple form of dynamic dispatch to any amd64 or POWER build, provided that either the build target or the compiler's optimizations can make use of certain CPU extensions.

Mo Zhou pointed me towards this when I was faced with the challenge of creating a performant Debian package for ggml, the tensor library behind llama.cpp and whisper.cpp.

The Challenge

A performant yet universally loadable library needs to make use of some form of dynamic dispatch to leverage the most effective SIMD extensions available on any given CPU it may run on. Last January, when I first started with the packaging of ggml for Debian, ggml did have support for this through its GGML_CPU_ALL_VARIANTS=ON option, but this was limited to amd64. This meant that on all the other architectures that Debian supports, I would need to target some ancient baseline, thus effectively crippling the package there.

Dynamic Dispatch using hwcaps

hwcaps were introduced in GLIBC 2.33 and replace the (now) Legacy Hardware Capabilities, which were removed in 2.37. The way hwcaps work is delightfully simple: the dynamic linker/loader will look for a shared library not just in the standard library paths, but also in subdirectories thereof of the form hwcaps/<level>, starting with the highest <level> that the current CPU supports. The levels are predefined. I'm using the amd64 levels below.

For ggml, this meant that I simply could build the library in multiple passes, each time targeting a different <level>, and install the result in the corresponding subdirectory, which resulted in the following layout (reduced to libggml.so for brevity):

/usr/lib/x86_64-linux-gnu/ggml/glibc-hwcaps/x86-64-v4/libggml.so
/usr/lib/x86_64-linux-gnu/ggml/glibc-hwcaps/x86-64-v3/libggml.so
/usr/lib/x86_64-linux-gnu/ggml/glibc-hwcaps/x86-64-v2/libggml.so
/usr/lib/x86_64-linux-gnu/ggml/libggml.so

In practice, this means that on a CPU supporting AVX512, the linker/loader would load x86-64-v4/libggml.so if it existed, and otherwise continue to look for the other levels, all the way down to the lowest one. On a CPU which supported only SSE4.2, the lookup process would be the same, ending with picking x86-64-v2/libggml.so. With QEMU, all of this was quickly verified.

Note that the lowest-level library, targeting x86-64-v1, is not installed to a subdirectory, but to the path where the library would normally have been installed. This has the nice property that on systems not using GLIBC, and thus not having hwcaps available, package installation will still result in a loadable library, albeit the version with the worst performance. And a careful observer might have noticed that in the example above, the library is installed to a private ggml/ directory, so this mechanism also works when using RUNPATH or LD_LIBRARY_PATH.

As mentioned above, Debian's ggml package will soon switch to GGML_CPU_ALL_VARIANTS=ON, but this was still quite the useful feature to discover.

联系我们 contact @ memedata.com