GCC 16 新特性：改进的错误消息与 SARIF 输出

GCC 16 新特性：改进的错误消息与 SARIF 输出
New features in GCC 16: Improved error messages and SARIF output

原始链接: https://developers.redhat.com/articles/2026/04/28/gcc-16-improved-error-messages-sarif-output

即将发布的 GCC 16 版本引入了若干诊断和静态分析方面的改进： * **C++ 诊断：** 默认采用分层、嵌套项目符号的错误消息，显著提高了复杂模板和类型不匹配错误的易读性。 * **SARIF 输出：** 机器可读的 JSON 格式已更新，以支持嵌套逻辑位置（例如命名空间）和非标准控制流。 * **实验性 HTML 输出：** 新增的可视化选项可生成 HTML 报告，其中包含代码高亮、标注、嵌套栈帧以及交互式内存状态图。 * **静态分析器 (`-fanalyzer`)：** 重写了内部数据结构以提升性能和清晰度。该分析器目前已与 Ranger 值跟踪项目集成，并通过处理异常和具名返回值优化 (NRVO) 引入了对 C++ 的初步支持，不过由于扩展性限制，目前的 C++ 分析仍局限于较小的示例。这些特性目前已可在 Compiler Explorer 和 Fedora 44 上进行测试。

原文

I work at Red Hat on the GNU Compiler Collection (GCC). GCC 16 is about to be released, so I'm sharing some of the new features I worked on this year. Some changes are visible to users, while others improve the system more subtly.

New C++ error improvements

A well-known challenge for C++ developers is the readability of template-related error messages. C++ compilers tend to either provide too little information or spew screenfuls of text at you. Either way, the errors can be difficult to decipher.

GCC error messages have a hierarchical structure to them. In GCC 15, I added an experimental option that shows this structure as a collection of nested bullet points.

In GCC 16, this behavior is now the default. You can return to the previous behavior using -fno-diagnostics-show-nesting or -fdiagnostics-plain-output. I fixed several bugs and made use of the hierarchical structure in more places. For example, it is easy to get declarations and definitions out of sync when manually adding const to a parameter:

class foo
{
  public:
    void test(int i, int j, void *ptr, int k);
};
    
// Wrong "const"-ness of param 3.
void foo::test(int i, int j, const void *ptr, int k)
{
}

In GCC 15, we emitted the following output:

<source>:8:6: error: no declaration matches 'void foo::test(int, int, const void*, int)'
    8 | void foo::test(int i, int j, const void *ptr, int k)
      |      ^~~
<source>:4:10: note: candidate is: 'void foo::test(int, int, void*, int)'
    4 |     void test(int i, int j, void *ptr, int k);
      |          ^~~~
<source>:1:7: note: 'class foo' defined here
    1 | class foo
      |       ^~~

In GCC 16, we now emit this:

<source>:8:6: error: no declaration matches 'void foo::test(int, int, const void*, int)'
    8 | void foo::test(int i, int j, const void *ptr, int k)
      |      ^~~
  • there is 1 candidate
    • candidate is: 'void foo::test(int, int, void*, int)'
      <source>:4:10:
          4 |     void test(int i, int j, void *ptr, int k);
            |          ^~~~
      • parameter 3 of candidate has type 'void*'...
        <source>:4:35:
            4 |     void test(int i, int j, void *ptr, int k);
              |                             ~~~~~~^~~
      • ...which does not match type 'const void*'
        <source>:8:42:
            8 | void foo::test(int i, int j, const void *ptr, int k)
              |                              ~~~~~~~~~~~~^~~
<source>:1:7: note: 'class foo' defined here
    1 | class foo
      |       ^~~

This pinpoints the exact location of the problem. Use this Compiler Explorer link to see how color highlights and contrasts mismatched types in both the messages and the quoted source code.

Updated SARIF machine-readable output

By default, GCC writes its diagnostics (errors and warnings) as text to stderr. Parsing this output with regular expressions has become difficult as the compiler's capabilities have grown. In GCC 13, I added the ability to write diagnostics in machine-readable form using the Static Analysis Results Interchange Format (SARIF). This JSON-based format allows us to separate the data of the diagnostic from the way the diagnostic is presented.

GCC 16 includes several improvements to the generated SARIF output. For example, when reporting a missing return *this in an assignment operator:

namespace foo { 
namespace bar { 
class foo { 
  foo&
  operator= (const foo &other)
  {
    m_val = other.m_val;
  }
  int m_val;
};
} // namespace bar
} // namespace foo

The SARIF output now captures the nested structure of logical locations. This allows a SARIF viewer to filter for diagnostics within the foo::bar namespace:

           "logicalLocations": [{"name": "foo",
                                 "fullyQualifiedName": "foo",
                                 "kind": "namespace",
                                 "index": 0},
                                {"name": "bar",
                                 "fullyQualifiedName": "foo::bar",
                                 "kind": "namespace",
                                 "parentIndex": 0,
                                 "index": 1},
                                {"name": "baz",
                                 "kind": "type",
                                 "parentIndex": 1,
                                 "index": 2},
                                {"name": "operator=",
                                 "fullyQualifiedName": "foo::bar::baz::operator=",
                                 "decoratedName": "_ZN3foo3bar3bazaSERKS1_",
                                 "kind": "function",
                                 "parentIndex": 2,
                                 "index": 3}]

GCC 16 also adds data to SARIF output to better express non-standard control flow (such as exception-handling and longjmp) within code paths. This data is included in the upcoming SARIF 2.2 standard.

New HTML output option

In GCC 15, I added -fdiagnostics-add-output= to allow for multiple kinds of diagnostic output simultaneously. Plain text output has limitations, so GCC 16 includes a new experimental-html option.

Figure 1 shows the first example using -fdiagnostics-add-output=experimental-html.

Figure 1: An experimental HTML diagnostic in GCC 16 showing a "no declaration matches" error with highlighted code snippets and callouts.

You can see the full generated page here.

As the name suggests, this feature is experimental, but I've already found it helpful for debugging the GCC built-in static analyzer. When you enable the tool with the -fanalyzer option, it explores interprocedural paths through your source code to find bugs at compile time. I often need to debug fiddly issues in this code, and the more visualization the better. The HTML output displays nested stack frames in an execution path, using drop shadows to represent the stack visually (Figure 2).

Figure 2: Experimental HTML output from the GCC static analyzer, illustrating a 26-event execution path with nested frames and visual drop shadows to represent the call stack.

The full example is here. This version also includes an easter egg (generated via -fdiagnostics-add-output=experimental-html:show-state-diagrams=yes). If you press j and k, you can move forward and backward through the path, with diagrams showing the predicted state of memory at each event, and what pointers are pointing to what buffers (Figure 3).

Figure 3: An experimental GCC state diagram visualizing the heap and stack memory transitions, illustrating a pointer referencing a freed buffer.

Static analyzer improvements

The visualization made it easier to spot and fix problems in the analyzer, leading to several internal improvements in GCC 16. The analyzer's core data structure for tracking code (the "supergraph") had become difficult to work with, with various dark corners for bugs to hide in. I've rewritten it in GCC 16. The new code has much clearer separation of concerns (between places in the user's code versus operations that occur on the transitions between these places), and I'm already finding it makes it easier to add new features.

I also updated the data structure that tracks simulated memory buffer contents, replacing a rather clunky hashing approach with a simple std::map from bit ranges to contents. The new approach is both easier to understand and faster.

My colleague Andrew MacLeod has spent several years on a project called Ranger to improve how GCC tracks properties of values in the user's code for use by the optimizer. This covers things such as knowing whether a given integer is in the valid range to be used as an array index, or whether individual bits are known to be true or false. In GCC 16 I've started wiring up -fanalyzer to these data structures. Like the above changes, this is unlikely to be directly visible, but should lead to more accurate analysis and fewer false positives.

Before GCC 16, -fanalyzer only worked on C code; running it on C++ code often produced irrelevant results. The problem was that my code was ignoring how GCC internally represents (a) exception-handling, leading to it inventing impossible paths through the code, and (b) C++'s Named Return Value Optimization (NRVO), leading to lots of false reports about supposed memory leaks.

I have good news and bad news here. The good news: In GCC 16, I implemented exception handling and the NRVO, allowing -fanalyzer to work with C++ code.

The bad news is that the feature is currently limited to small examples. Running it on complex code might cause scaling issues where the analyzer spends its entire analysis budget on a small fraction of the code and gives up, burning CPU cycles without generating useful results. False positives have been replaced by false negatives, and so I still can't recommend using it on C++ code. It's better to be correct than to be fast, and I'm looking at ways of scaling things up for GCC 17 in the hope of making -fanalyzer be practical for use on production C++.

Try GCC 16

These features are just a small sample of the many improvements in GCC 16, which is about to be officially released upstream. You can try the new version now in Compiler Explorer; as I write this, it's listed as GCC trunk. Putting my downstream hat on, you can also try it in Fedora 44. Have fun!