Cost of enum-to-string: C++26 reflection vs. the old ways

原始链接: https://vittorioromeo.com/index/blog/refl_enum_to_string.html

Hacker Newsnew | past | comments | ask | show | jobs | submitloginCost of enum-to-string: C++26 reflection vs. the old ways (vittorioromeo.com)6 points by sagacity 3 hours ago | hide | past | favorite | 1 comment help sagacity 3 hours ago | next [–] Oof, that first example (the idiomatic C++26 way) looks so foreign if you're mostly used to C++11.reply Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact Search:
相关文章

原文

Two months ago I published “the hidden compile-time cost of C++26 reflection”, where I measured what including <meta> and doing some basic reflection actually costs per translation unit. If you haven’t read it, start there – this post builds directly on top of it.

That article used a prerelease GCC 16 snapshot. Since then, GCC 16 has been officially released and is now widely available, which seemed like a good excuse to revisit the topic with a more realistic example: enum-to-string conversion.

Enum-to-string is the “hello world” of reflection – but it’s also genuinely useful in real projects, for things like logging, serialization, debugging, and so on. If you adopt reflection in a real codebase, it might be the first thing you’ll write.

So: how much does reflection-based enum-to-string actually cost, in compile time, compared to the alternatives?

the three approaches

I benchmarked three implementations of the same operation: given an enum value, return a std::string_view with its enumerator name.

1. reflection (c++26)

No macros, no boilerplate, works for any enum:

“What the heck is Reflection?” by Murat Hepeyiler. I think it’s a pretty idiomatic example.

2. enchantum (c++17)

A C++17 header-only library by ZXShady that achieves enum reflection through __PRETTY_FUNCTION__ parsing tricks. No macros at the call site, no reflection flag needed:

available on GitHub.

benchmarking setup

Same as the previous articlehyperfine inside a Fedora 44 Docker container on a 13th Gen i9-13900K. Two things are different this time:

  • Compiler: gcc 16.1.1 20260501 (Red Hat 16.1.1-1, release build) – the officially released GCC 16, not a prerelease snapshot.

  • Noise control: the container ran with --cpuset-cpus=0-7, the host was set to the performance CPU governor, and the compiler process was pinned to a P-core with taskset -c 0. hyperfine was run with --warmup 5 --min-runs 20.

Usual disclaimer: measurements aren’t strictly rigorous, my hardware is beefy (YMMV), and single-TU numbers undersell project-wide cost.

Also, these measurements are specific to GCC 16’s current reflection and module implementation; other compilers may exhibit very different behavior.

benchmark results

total per-TU compile time

N X-macro (const char*) X-macro (string_view) enchantum Reflection
Baseline (int main()) 25.8 ms 25.7 ms 25.8 ms 25.7 ms
Header include only 25.7 ms 136.0 ms 147.1 ms 180.8 ms
4 26.6 ms 137.6 ms 170.6 ms 186.7 ms
16 26.9 ms 138.1 ms 170.9 ms 187.7 ms
64 28.0 ms 141.2 ms 172.8 ms 191.1 ms
256 32.5 ms 153.0 ms 184.1 ms 215.0 ms
1024 54.7 ms 204.5 ms 272.0 ms 255.0 ms

algorithm-only cost (TU time minus include-only time)

This approximates the additional reflection work beyond header inclusion:

N X-macro (const char*) X-macro (string_view) enchantum Reflection
4 0.9 ms 1.6 ms 23.5 ms 5.9 ms
16 1.2 ms 2.1 ms 23.8 ms 6.9 ms
64 2.3 ms 5.2 ms 25.7 ms 10.3 ms
256 6.8 ms 17.0 ms 37.0 ms 34.2 ms
1024 29.0 ms 68.5 ms 124.9 ms 74.2 ms

per-enumerator scaling

Approach ms / enumerator
X-macro (const char*) ~0.027
X-macro (string_view) ~0.06
Reflection ~0.07
enchantum O(scan range), not O(N)

enchantum does not scale with the actual enum size – it scales with the configured scan range, since it has to probe every possible value in that range. That’s why an N=4 enum costs almost as much as N=64.

reflection with PCH and modules

Since the reflection variant pays a ~155 ms header tax for <meta>, the obvious question is: does precompiling the header or switching to C++20 modules eliminate it?

I re-ran the reflection benchmark with two extra configurations:

  • PCH: precompiled <meta>, <string_view>, <type_traits> once, then compiled the TUs with -include pch.hpp.

  • Modules: pre-built the std and std.compat modules and the <bits/stdc++.h> header unit once via GCC 16’s new --compile-std-module flag (this cost is not included in the measurement), then compiled the TUs with -fmodules so that #include <meta> is transparently translated into import <bits/stdc++.h>.

N Reflection (plain #include) Reflection + PCH Reflection + modules
Header include only 180.8 ms 73.8 ms 397.4 ms
4 186.7 ms 80.6 ms 403.4 ms
16 187.7 ms 81.0 ms 403.1 ms
64 191.1 ms 84.4 ms 409.4 ms
256 215.0 ms 97.5 ms 423.2 ms
1024 255.0 ms 147.9 ms 482.5 ms

PCH is the clear winner – about ~2.3x speedup at every enum size, dropping N=4 from 187 ms to 81 ms. With PCH in place, reflection beats both enchantum and the string_view X-macro variant outright.

Modules are the opposite: about ~2.2x slowdown. I verified that the std module artifacts were genuinely cached (mtime unchanged across runs) and that GCC was loading them (confirmed via -flang-info-module-cmi and -ftime-report, which attributed ~190 ms to module import plus another ~190 ms to template instantiation work the module triggered).

Explicit import std; performs essentially the same as the transparent translation, because GCC’s std module is currently implemented as a thin wrapper around the <bits/stdc++.h> header unit – both routes end up loading the same ~34 MB artifact.

insights

  1. The header is the cost. Not the reflection. The reflection algorithm is fast – asymptotically ~0.07 ms per enumerator, essentially the same as the hand-rolled switch in the X-macro version (~0.06 ms). What makes reflection look expensive is <meta>: just including it costs ~155 ms per TU over the baseline.

  2. The X-macro with const char* is the fastest tested approach. With zero standard library headers, an N=4 enum compiles in 26.6 ms – within the noise of the baseline. Even N=1024 (54.7 ms) is faster than just #include <meta> with no reflection work at all. Most of what we call “slow C++ compilation” is really slow standard library compilation.

  3. enchantum has the smallest include cost of the non-trivial approaches (~147 ms vs reflection’s ~181 ms), but the heaviest per-call work (~24 ms even for tiny enums, because it always scans the full configured range, regardless of how many enumerators you actually have). That’s why it wins on small enums and loses on large ones.

  4. Reflection has the best ergonomics but the highest header tax. It works for any enum – sparse, scoped, unscoped – with no special setup at the declaration site. But every TU that touches <meta> pays ~155 ms before any reflection happens.

  5. PCH closes the gap, modules widen it. Precompiling <meta> cuts reflection compile time by ~2.3x and makes it the fastest of the three approaches. C++20 modules in GCC 16, surprisingly, go the other way – ~2.2x slower than the plain include path.

what this means in a real codebase

The single-TU numbers look small. They are not, at scale.

A large C++ codebase can easily have a few hundred translation units that pull in the enum-to-string header, perhaps transitively. Picking 500 TUs as a round number, and an N=16 enum as a typical size:

Approach Per-TU cost Project-wide cost (500 TUs)
X-macro (const char*) 26.9 ms ~13 seconds
X-macro (string_view) 138.1 ms ~69 seconds
enchantum 170.9 ms ~85 seconds
Reflection 187.7 ms ~94 seconds

A few hundred milliseconds per TU turns into over a minute of compile time at the project level. That’s the difference between a sub-15-second clean build and a minute and a half. Incremental builds won’t always save you, because every TU that includes an affected header pays the full price.

This is multiplied by every header in your project that has similar overhead. Real codebases don’t have one heavy header – they have dozens. The few hundred milliseconds you see in a microbenchmark become minutes once you multiply.

On the other hand, the numbers shown here do not take parallelism into account. E.g. the ~94 CPU-seconds would be ~6s on a 16-core machine, assuming perfect parallelism.

what to do about it

If you’re adopting reflection-based enum-to-string in a large codebase:

  1. Use PCH for <meta> – not modules. As shown above, a PCH cuts the header cost by ~2.3x and makes reflection the fastest of the three approaches. C++20 modules in GCC 16 do the opposite right now (they make things ~2.2x slower), so this is one of those rare cases where the older mechanism is the right answer.

  2. Don’t include the enum-to-string header from other headers. Push it as far down the include graph as possible – ideally, only into the .cpp files that actually need it. Every transitive include multiplies the cost.

  3. For compile-time-sensitive code, X-macros are still the right answer. They look ugly, they don’t compose, they force you to list enumerators in a macro. But 27 ms per TU is hard to beat. For a project like my fork of SFML, where the entire codebase rebuilds in ~4 seconds, anything with <meta> in it is a non-starter.

  4. For library authors, think twice before exposing reflection through your public headers. A library header that drags <meta> into every consumer’s TU is something I personally would not want to depend on. enchantum or an X-macro is a much friendlier choice until <meta> gets lighter or modules become ubiquitous.

conclusion

This benchmark confirms what the original article argued, now with concrete numbers from an operation that people will actually write:

The cost of C++26 reflection is not the reflection. It’s <meta>.

The reflection algorithm itself seems to be quite fast. The reason a reflection-based to_string ends up ~7x slower to compile than the bare-metal const char* X-macro variant is almost entirely <meta> (and its transitively-included headers).

I’m still excited about reflection. It will replace a lot of ugly macro boilerplate and unlock libraries that weren’t really possible before. But until <meta> gets lighter (or until modules get much faster), every project that adopts it should know what the bill looks like.

shameless self-promotion

  • I offer training, mentoring, and consulting services.

    If you are interested, check out romeo.training, alternatively you can reach out at mail (at) vittorioromeo (dot) com or on Twitter.

  • Check out my games on Steam, powered by VRSFML!

    • BubbleByte: A clicker-idle game where you recruit cats to pop bubbles. That’s it. Or is it…?

    • Open Hexagon: A fast-paced open-source arcade game with user-created content – the de-facto community-driven spiritual successor to Terry Cavanagh’s critically acclaimed Super Hexagon.

  • My book “Embracing Modern C++ Safely” is available from all major resellers.

联系我们 contact @ memedata.com