C++ 单例的最佳实现
Best Performance of a C++ Singleton

原始链接: https://andreasfertig.com/blog/2026/03/best-performance-of-a-cpp-singleton/

本文探讨了在C++中实现单例模式时的性能考量。作者以`DisplayManager`为例,展示了构造函数选择(默认 vs. 用户自定义)对生成代码和性能的显著影响。 用户自定义的构造函数需要编译器在每次`Instance()`调用时插入保护变量和检查,从而引入来自`__cxa_guard_acquire`和`__cxa_guard_release`等函数的开销。这与默认构造函数形成对比,后者生成更简单、更快的代码。 文章还比较了使用块局部静态变量(如原始实现)与私有静态数据成员。当使用默认构造函数时,两者性能等效;但当需要用户自定义构造函数时,静态数据成员方法*优于*块局部静态变量,因为它避免了保护变量的需求。 最终,作者建议在需要构造函数时使用静态数据成员以获得最佳性能,并在可以使用默认构造函数时使用块局部静态变量以简化代码。文章提供了Compiler Explorer链接,用于详细的代码比较。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 C++ 单例的最佳性能 (andreasfertig.com) 12 分,由 jandeboevrie 发表于 2 小时前 | 隐藏 | 过去 | 收藏 | 讨论 帮助 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

In my Januray post, I focused on implementing a singleton correctly. This time I want to add performance into the mix and show you the best way to implement your singleton... or give you guidance to pick your best way.

Setting the scene

I'm using a display manager as an example, like GDM, LightDM, or others in the Linux world. Here is the motivating implementation for today:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
A 
enum class Resolution
{
  r640x480,
  r800x600,
  // ...
};

B 
class DisplayManager {
  Resolution mResolution{};

  C 
  DisplayManager(const DisplayManager&) = default;
  DisplayManager(DisplayManager&&)      = default;

  DisplayManager& operator=(const DisplayManager&) = default;
  DisplayManager& operator=(DisplayManager&&)      = default;

  DisplayManager() = default;  D 

public:
  static DisplayManager& Instance() noexcept
  {
    static DisplayManager dspm{};  E 

    return dspm;
  }

  void SetResolution(Resolution newRes) { mResolution = newRes; }

  Resolution GetResolution() const { return mResolution; }
};

Let me quickly go through the various parts. In A, you see the data type Resolution which illustrates two resolutions; you can imagine the rest. Next in B, you find the DisplayManger implementation. Diving into the implementation, you can see that I used my own advice from my last post and made the copy- and move-operations private in C. This is all just setup for today's focus.

To complete the picture, here is how I use the object:

Resolution Use()
{
  auto& s = DisplayManager::Instance();
  s.SetResolution(Resolution::r640x480);

  return s.GetResolution();
}

Let's talk performance

Going back to the DisplayManager implementation, the interesting part starts with D, the default constructor, which of course must be private in a singleton. More on that in a moment. As a last item, you see E, where I use a block local static for the variable dspm.

Let's talk performance. With C and D we have two places where we can use different implementations that influence performance for DisplayManager objects, or better access. But you might not always have the full freedom to pick all the options.

In my DisplayManager implementation I present you with a simple case. The default constructor can be defaulted since DisplayManager only holds an object of type Resolution, a class enum which boils down to an integer type. I don't need any code inside the constructors body. There are cases when this doesn't apply and you need to write code for the constructor body. By that, we can distinguish two cases here:

  • defaultable default constructor (user-declared constructor)
  • a constructor with implementation (user-defined constructor)

If you look at the generated assembly for DisplayManager with a user declared constructor, you'll see this:

Use():
        mov     DWORD PTR DisplayManager::Instance()::dspm[rip], 0
        xor     eax, eax
        ret
main:
        xor     eax, eax
        ret
DisplayManager::Instance()::dspm:
        .zero   4

For now, let's say that's good.

Once you look at the generated code for an implementation with a user-defined constructor you'll get this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Use():
        movzx   eax, BYTE PTR guard variable for DisplayManager::Instance()::dspm[rip]
        test    al, al
        je      .L13
        mov     DWORD PTR DisplayManager::Instance()::dspm[rip], 0
        xor     eax, eax
        ret
.L13:
        sub     rsp, 8
        mov     edi, OFFSET FLAT:guard variable for DisplayManager::Instance()::dspm
        call    __cxa_guard_acquire
        test    eax, eax
        jne     .L14
.L3:
        mov     DWORD PTR DisplayManager::Instance()::dspm[rip], 0
        xor     eax, eax
        add     rsp, 8
        ret
.L14:
        mov     DWORD PTR DisplayManager::Instance()::dspm[rip], 0
        mov     edi, OFFSET FLAT:guard variable for DisplayManager::Instance()::dspm
        call    __cxa_guard_release
        jmp     .L3
main:
        xor     eax, eax
        ret
guard variable for DisplayManager::Instance()::dspm:
        .zero   8
DisplayManager::Instance()::dspm:
        .zero   4

Now you can see why I called the user-defined version good. Once the compiler is required to have a default constructor, it must insert a guard variable and check the state each time you access Instance which adds up to a good amount of code. Please notice that at this point you're looking at code generated with GCC 15 at -O3 and I did not even call SetResolution or GetResolution.

Another thing to consider is that __cxa_guard_acquire and __cxa_guard_release introduce slight delays to your program.

Here is a Compiler Explorer link that shows the two options.

All right, what else can we do? Right, you can use a different approach in E.

Do you like this content?

I'm available for in-house C++ training classes worldwide, on-site or remote. Here is a sample list of my classes:
  • From C to C++
  • Programming with C++11 to C++17
  • Programming with C++20
All classes can be customized to your team's needs. Training services

Using a static data memeber

Instead of implementing the singleton pattern using a block local static variable, you can go for a private static data member. Time to see how this implementation behaves. Here is my implementation where I kept the labels stable:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
B 
class DisplayManager {
  static DisplayManager mDspm;  E1 

  Resolution mResolution{};

  C 
  DisplayManager(const DisplayManager&) = default;
  DisplayManager(DisplayManager&&)      = default;

  DisplayManager& operator=(const DisplayManager&) = default;
  DisplayManager& operator=(DisplayManager&&)      = default;

  DisplayManager() = default;  D 

public:
  static DisplayManager& Instance() noexcept
  {
    E2 
    return mDspm;
  }

  void SetResolution(Resolution newRes) { mResolution = newRes; }

  Resolution GetResolution() const { return mResolution; }
};

// Imaging this code is in an implementation file.
DisplayManager DisplayManager::mDspm{};  E3 

You can see that the changes are only in E1, E2, and E3. The latter one is required just for completeness. The interesting change is in E2 where I no longer use a block local static but the static data member from E1. You still have the two options: user-declared and user-defined constructor.

For a user-declared constructor, my code results in:

Use():
        mov     DWORD PTR DisplayManager::mDspm[rip], 0
        xor     eax, eax
        ret
main:
        xor     eax, eax
        ret
DisplayManager::mDspm:
        .zero   4

Which is exactly the same code as for the previous implementation. Things get interesting when you start looking at the user-declared constructor case:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
Use():
        mov     DWORD PTR DisplayManager::mDspm[rip], 0
        xor     eax, eax
        ret
main:
        xor     eax, eax
        ret
_GLOBAL__sub_I_DisplayManager::mDspm:
        mov     DWORD PTR DisplayManager::mDspm[rip], 0
        ret
DisplayManager::mDspm:
        .zero   4

That code looks much better than the one before. No locks are required this time, which not only leads to less assembly code but also faster code at the same time.

You'll find the two versions here on Compiler Explorer.

Summary

If you want to have good performance for your singleton implementation and you need to provide a constructor, you should go for the static data member implementation. In case you can default the default constructor, the two implementation strategies are equivalent performancewise. I would suggest using the block local approach as it saves having to define and initialize the singleton object in an implementation file.

Andreas

联系我们 contact @ memedata.com