通过char访问非活动联合成员
Accessing inactive union members through char

原始链接: https://www.sandordargo.com/blog/2026/03/04/char-representation-and-UB

## C++联合体行为的意外解释 最近关于C++26中新的`std::is_within_lifetime`设施的讨论,突出显示了提案中一个示例中看似未定义行为的问题。该代码使用一个`bool`和`char`的联合体,并在`bool`处于活动状态时访问`char`成员。通常,访问非活动的联合体成员是未定义行为,但C++标准包含一个特定例外。 标准允许通过`char`、`unsigned char`或`std::byte`类型读取任何对象的表示。这源于C的遗留特性,即`char*`可以别名任何内存地址——本质上充当字节指针。由于`bool`表示为0或1(这是有效的`char`值),因此通过`char`成员进行比较是完全合法的。 然而,在`char`处于活动状态时直接读取`bool` *将*是未定义行为。这个对严格别名规则的例外并不广为人知,但对于理解示例代码有效性的原因至关重要,并且展示了C++语言中一个微妙但重要的细节。

这个Hacker News讨论围绕C++编程中关于访问`union`中非活动成员的一个特性。核心问题源于编译器的“严格别名”规则——优化受到阻碍,因为写入一种数据类型可能会意外地影响另一种数据类型。 通常,编译器假定类型不能相互别名,从而允许进行性能改进。然而,`char`和`std::byte`是例外。编译器*必须*允许写入`char`可能会影响`union`内的其他类型(如`double`)。 这个例外允许通过`char*`或`std::byte*`访问非活动的`union`成员,而不会触发严格别名限制并可能失去优化。讨论澄清了这与C++联合体有关,而不是工会,并简要涉及了编译器优化的复杂性。
相关文章

原文

I recently published an article on a new C++26 standard library facility, std::is_within_lifetime. As one of my readers, Andrey, pointed out, one of the examples contains code that seems like undefined behavior. But it’s also taken — almost directly — from the original proposal, so it’s probably not UB. And that’s correct, it’s not undefined behavior.

Let’s first examine the example and the UB suspect, then dive into the fine print of C++ to explain why it’s not what it seems to be.

The suspect

Let’s carefully examine the code below and focus on the else branch of the compile-time conditional.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
struct OptBool {
  union { bool b; char c; };

  constexpr auto has_value() const -> bool {
    if consteval {
      return std::is_within_lifetime(&b);
    } else {
      return c != 2;  // sentinel value
    }
  }

  constexpr bool value() const {
    return b;
  }
};

Did it raise your eyebrows?

What happens if the active member of the union is b? We still make our comparison through c. But isn’t accessing an inactive member of a union always undefined behavior?

The fine print almost nobody knows about

From my already referenced article, let’s jump back to the original proposal. It explicitly states that this is not UB, though it doesn’t explain it in depth. However, it does reference some details about type similarity and certain exceptions:

If a program attempts to access the stored value of an object through a glvalue whose type is not similar to one of the following types the behavior is undefined:

(11.1) the dynamic type of the object,

(11.2) a type that is the signed or unsigned type corresponding to the dynamic type of the object, or

(11.3) a char, unsigned char, or std::byte type.

The rule states that accessing an object through a glvalue of an unrelated type is undefined behavior unless the glvalue type is the dynamic type, the signed/unsigned corresponding type, or a char, unsigned char, or std::byte.

In our case, the only exception that applies is the third one.

Let’s consider the following case:

1
2
OptBool ob(true);   // active member is bool b
ob.has_value();     // reads c

In this example, we’re accessing the union storage through c, which is a char glvalue, while the active object is bool.

While accessing an inactive union member would normally be undefined behavior, the aliasing rule quoted above provides an exception. It explicitly allows reading any object’s representation through a char (or unsigned char or std::byte). In other words, any object may be inspected via a char pointer or reference.

A bool is represented as either 0 or 1, so comparing it to 2 — or any other char value — is perfectly valid and not UB.

On the other hand, reading b directly while c is the active member would be UB, because bool is not a character type listed in the exception above, and it’s not similar to char either.

Why character types matter

But why is talking about character types or bytes so important?

The morning after I wrote the first draft of this article, I stumbled upon an interesting section in Patrice Roy’s book Memory Management in C++. He explains that even though C++ tries to repurpose char as a representation of a character and uses std::byte to represent a byte, due to the language’s C roots, a char* can alias any address in memory” — it’s really just a “pointer to a byte”.

That explains everything.

Conclusion

When you see code that accesses an inactive union member through a char, your immediate reaction might be “that’s undefined behavior!” And you’d usually be right — except when you’re not.

The C++ standard contains a special exception to its strict aliasing rules. Character types (char, unsigned char, and std::byte) can alias any object in memory. This isn’t just a quirk of the language; it reflects the fundamental nature of these types as representations of raw bytes rather than semantic values.

This exception is what makes the OptBool example from the std::is_within_lifetime proposal work. At runtime, we can safely read the union storage through a char member even when bool is active, because we’re not interpreting the value — we’re just inspecting its raw representation.

It’s a small detail buried in the fine print of the standard. But it’s these kinds of details that separate code that merely looks correct from code that actually is correct according to the language rules.

Have you encountered other surprising exceptions in the C++ aliasing rules?

Connect deeper

If you liked this article, please

联系我们 contact @ memedata.com