GNU 工具链中 BPF 支持的后续步骤

GNU 工具链中 BPF 支持的后续步骤
Next steps for BPF support in the GNU toolchain

## GNU 工具链与 BPF 开发：2025 年更新 2025 年 GNU 工具研讨会重点在于加强 GNU 工具链对 BPF（Berkeley Packet Filter）的支持，目前该领域由 LLVM 主导。一个关键目标是使 GCC 成为 BPF 代码的主要编译器。在统一 BPF 类型格式 (BTF) 和紧凑 C 类型格式 (CTF) 方面正在取得进展——这些格式对于内核数据结构表示和调试至关重要。开发人员旨在将 BTF 创建直接集成到内核构建过程中，以简化开发。然而，完全采用 BTF 需要增强用户空间的兼容性，包括浮点数据表示和结构成员引用。 GCC 的一个主要障碍是实现对 `btf_decl_tag` 和 `btf_type_tag` 属性的支持，这对于 BPF 验证器确保内存安全至关重要。会议后不久发布了一个解决此问题的补丁。生成可验证代码仍然存在挑战——GCC 的优化可能会创建验证器认为不安全的代码。一个建议的 `-fverifiable` 标志旨在解决这个问题，但需要调整优化过程以避免破坏验证。最终，开发人员强调需要社区贡献来加速 GCC BPF 端口的开发并改善整体 BPF 生态系统。

## GNU 工具链中的 BPF 支持：摘要一篇最近的 LWN.net 文章讨论了 GNU 工具链中即将到来的 BPF（扩展伯克利数据包过滤器）支持，引发了关于技术解释和许可的争论。核心讨论围绕着将 BPF（一种允许小型程序附加到系统“挂钩点”的内核功能）与 LLVM 等现有工具集成时的复杂性。评论者争论文章是否应该明确定义 BPF 等缩写，以覆盖更广泛的读者，或者假设读者具备一定的技术知识。一个关键点是 GNU 工具链（通常为 GPL）和 LLVM（Apache 2.0）之间的许可冲突，以及尽管两者都是开源的，但合作面临的挑战。一些人建议采用“干净房间”实现来避免版权问题，而另一些人则指出法律先例表明，仅仅了解一个想法不足以构成衍生作品。这次对话凸显了 Linux 内核开发社区中，可访问性、技术深度以及开源许可的复杂性之间的紧张关系。

原文

LWN.net needs you!
Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing.

By Jonathan Corbet
October 6, 2025

Cauldron

Support for BPF in the kernel has been tied to the LLVM toolchain since the advent of extended BPF. There has been a growing effort to add BPF support to the GNU toolchain as well, though. At the 2025 GNU Tools Cauldron, the developers involved got together with representatives of the kernel community to talk about the state of that work and what needs to happen next.

Integrating BTF and CTF

The BPF type format (BTF) represents the types of kernel data structures and functions; it is used to enable BPF programs to run on multiple kernels, and by the verifier to ensure program correctness, among other uses. It is derived from the Compact C Type Format (CTF), which is a more general-purpose format that makes debugging information available for compiled programs. Nick Alcock gave a high-speed presentation of his work to reunify those two formats.

The libctf library, which works with CTF, is now able to both produce and consume BTF, he began. It can also work with an under-development "CTFv4" format that adds support for some of the trickier cases. This work is being tied into the kernel build, which would allow the creation of BTF directly when building the kernel, rather than as a separate step using the pahole utility as is done now.

There are a couple of enhancements that are needed before BTF can completely replace CTF beyond the kernel, though. A string header field is needed to be able to separate the BTF from each translation unit when the results are all combined. Some sort of agreement on a format for referring to structure members in archives (holding BTF data for multiple translation units) is required for compaction purposes. To be able to use this format in user space, there has to be a representation for floating-point data — a feature the kernel has never needed. With those in place, the extra capabilities provided by CTF would only be needed to represent huge structures (rather larger than would ever make sense in the kernel) and conflicting types with the same name. Then, GCC could create BTF for both kernel and user space, with the toolchain performing deduplication as well.

Alexei Starovoitov questioned the need for these features, saying that BTF is a kernel-specific format that does not have to support user space. José Marchesi agreed to an extent, but said that wider availability and usage of the format is needed to ensure high-quality toolchain support. Sam James asked whether BTF could represent C++ programs; the answer was that CTF is still needed for those. Handling C++ with BTF would be possible, Alcock said, with the addition of some new type codes and not much more.

GCC port status

Marchesi then shifted the discussion to the status of the GCC BPF backend (or "port" in GCC jargon); the goal of that project, he said, is to turn GCC into the primary compiler for BPF code. That is a relatively new objective, he added; the previous goal had been to produce something that worked at all, with no ambitions beyond that.

Starovoitov took over to communicate his highest-priority request: the addition of support for the btf_decl_tag and btf_type_tag attributes to GCC. Their absence, he said, is the biggest blocker to adoption of GCC for compilation to BPF. Pointers in the kernel can carry annotations like __rcu or __user to indicate, respectively, that the pointed-to memory is protected by read-copy-update or is located in user space. When these annotations are reflected in BTF with the requested attributes, the BPF verifier can use them to check that memory is being accessed in a valid and safe way. There are a lot of hacky workarounds in place to cope with their absence now, but Starovoitov would love to be able to replace them with proper attribute support: "Please do it yesterday".

Notably, David Faust, who was in the session, posted a patch series adding this support the following day. Interested readers will find much more information about how these attributes work in the cover letter.

Marchesi returned to quickly go over a number of other bits of news regarding the BPF backend. There is now an extensive test suite in GCC to validate BPF compilation, which is a nice step forward. The BPF port mostly works, but there are various bugs in the compiler that still need to be addressed. It may be necessary to add support for the may_goto instruction to the assembler. And, naturally, there is the constant challenge of producing code that will not run afoul of the BPF verifier — a topic to which the group returned shortly thereafter.

The status update concluded with a request for help from the GCC community to finish getting the BPF port into shape. He and the others working on this code do not do so full time, and BPF itself is an area of active development that is hard to keep up with. A bit of assistance, he said, would enable the job to be finished sooner. Starovoitov answered that BPF developers tend to work with LLVM instead because they can get their changes accepted quickly; the GCC process is slower and harder to work with. Marchesi said that the GCC community can be strict, but it tends to be strict in the right places. Work there can take time, but the quality of the result will be excellent.

Verification challenges

Marchesi then moved on to the generation of code by GCC that can pass the BPF verifier. Without due care, the compiler will produce code that the verifier is unable to prove correct and which, as a result, will not be loadable into the kernel. He has been promoting the idea of a new optimization mode, -Overifiable, focused on producing verifiable code. He then introduced Eduard Zingerman, who delved more deeply into the problem.

The core challenge, Zingerman began, is that the various optimization passes made by the compiler can transform the code significantly, producing a result that is hard or impossible to verify. The verifier is a path-tracing machine, which tracks the state of the stack and registers as it steps through the code, forking its representation at each branch point. It is able to track the ranges of variables through a number of operations, but is unable to track the relationships between scalars and pointers. That inability makes itself felt in a number of ways.

For example, a programmer might write code like:

    offset = ...;
    if (offset < 42) {
        ptr = packet + offset;
	/* ... */

If the verifier knows that the length of the data pointed to by packet is at least 42, it can determine that this pointer assignment is safe. But an optimizer might hoist some of the calculation outside of the conditional branch, producing code like:

    offset = ...;
    ptr = packet + offset;
    if (offset < 42) {
    	/* ... */

Now the verifier is not able to verify that the assignment of ptr is correct, so the code is no longer verifiable. The LLVM BPF port, he said, works around this kind of problem by injecting calls to special intrinsic functions that inhibit this kind of optimization.

Zingerman provided a couple of other examples of how optimization can break verification and the sorts of workarounds that the LLVM developers have adopted to make things work. But, he said, the strategy in the LLVM camp has been almost entirely reactive — wait until something breaks, then figure out a way to prevent it. What, he asked, is the GCC approach? Marchesi replied that, so far, there is no strategy at all, but that needs to change.

In the resulting discussion, it was suggested that the proposed new compiler flag should be -fverifiable instead, a suggestion that seemed to find general acceptance. The actual implementation of that option is a harder task, though. Nick Clifton asked whether the developers could just maintain a list of optimization passes that are known to break verification and should just be skipped. The problem with that approach, Faust said, is that the problems usually come about as the result of specific transformations within a pass that makes a number of other optimizations that are still wanted.

Marchesi added that optimization in general is needed for BPF output; among other things, programs may exceed the limits on the number of BPF instructions without it. His plan is to put the new flag in place, then start adapting the problematic optimization passes to avoid breaking verification. Clifton noted that the verifier might improve over time and accept code that is rejected now, so the compiler needs to be told which version of the verifier is being built for. Others pointed out that there are multiple verifiers in existence, complicating the situation further.

There was a brief mention of Krister Walfridsson's smtgcc tool, which is designed to catch optimization problems in general. Walfridsson, who was present, was not convinced that smtgcc would be helpful for this specific problem, though.

As the time for this extended session ran out, Clifton said that he found the whole idea of verifier-aware compilation to be a bit "distasteful". The more that the compiler avoids verification problems, the less pressure there is on the verifier itself to fix those problems for real. Perhaps it would be better to put effort into improving the verifier instead, he suggested. Marchesi replied that the verifier exists to make it possible to load programs into the kernel and run them safely. The pressure to make that work should be shared among all parties, he said.

[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting my travel to this event.]