Windows API 中读取器/写入器锁中的错误

Windows API 中读取器/写入器锁中的错误
Bug in reader/writer locks in Windows API

原始链接: https://old.reddit.com/r/cpp/comments/1b55686/maybe_possible_bug_in_stdshared_mutex_on_windows/

使用 Microsoft Visual Studio 的标准模板库 (STL) 共享内存实现的 C++ 程序似乎存在潜在问题，会导致涉及六个以上线程同时执行的死锁情况。一名用户报告称，无论是使用 MinGW 还是 MSVC 进行编译，都遇到了这种行为。然而，经过进一步分析，发现该错误存在于用户的代码中，而不是具体存在于 STL 实现中。此外，在一种情况下，将原子变量显式初始化为 0 会导致相同的行为，这表明该错误并非与受影响的操作系统中的线程优先级或线程优先级不兼容。尽管原子变量中的初始化问题已通过较新的 C++ 编译器得到解决，但在 C++20 之前的遗留实现中仍然存在。

总体而言，科技公司使用的反馈系统往往缺乏有效性和效率，导致用户感到沮丧。虽然某些解决方案涉及支付额外费用或使用第三方软件，但根本问题仍然存在 - 用户通常面临着提供清晰、简洁的错误描述的挑战。不幸的是，科技公司在有效处理输入方面遇到了困难，导致不满意的个人在其他地方寻求答案，包括社交媒体团体、论坛和聊天室。此外，虚假反馈的盛行使问题变得更加复杂，在大量无用或虚假的提交内容中稀释了真正的帮助请求。此外，过分强调遵守协议而不是解决具体问题，尽管双方都做出了多次尝试，但问题仍然存在。最终，解决和纠正报告的故障需要利益相关者、科技公司代表和最终用户等之间的共同努力。

原文

A team at my company ran into a peculiar and unexpected behavior with std::shared_mutex. This behavior only occurs on Windows w/ MSVC. It does not occur with MinGW or on other platforms.

At this point the behavior is pretty well understood. The question isn't "how to work around this". The questions are:

Is this a bug in std::shared_mutex?
Is this a bug in the Windows SlimReaderWriter implementation?

I'm going to boldly claim "definitely yes" and "yes, or the SRW behavior needs to be documented". Your reaction is surely "it's never a bug, it's always user error". I appreciate that sentiment. Please hold that thought for just a minute and read on.

Here's the scenario:

Main thread acquires exclusive lock
Main thread creates N child threads
Each child thread:
1. Acquires a shared lock
2. Yields until all children have acquired a shared lock
3. Releases the shared lock
Main thread releases the exclusive lock

This works most of the time. However 1 out of ~1000 times it "deadlocks". When it deadlocks exactly 1 child successfully acquires a shared lock and all other children block forever in lock_shared(). This behavior can be observed with std::shared_mutex, std::shared_lock/std::unique_lock, or simply calling SRW functions directly.

If the single child that succeeds calls unlock_shared() then the other children will wake up. However if we're waiting for all readers to acquire their shared lock then we will wait forever. Yes, we could achieve this behavior in other ways, that's not the question.

I made a StackOverflow post that has had some good discussion. The behavior has been confirmed. However at this point we need a language lawyer, u/STL, or quite honestly Raymond Chen to declare whether this is "by design" or a bug.

Here is code that can be trivially compiled to repro the error.

#include 
#include 
#include 
#include 
#include 
#include 
#include 

struct ThreadTestData {
    int32_t numThreads = 0;
    std::shared_mutex sharedMutex = {};
    std::atomic readCounter = 0;
};

int DoStuff(ThreadTestData* data) {
    // Acquire reader lock
    data->sharedMutex.lock_shared();

    // wait until all read threads have acquired their shared lock
    data->readCounter.fetch_add(1);
    while (data->readCounter.load() != data->numThreads) {
        std::this_thread::yield();
    }

    // Release reader lock
    data->sharedMutex.unlock_shared();

    return 0;
}

int main() {
    int count = 0;
    while (true) {
        ThreadTestData data = {};
        data.numThreads = 5;

        // Acquire write lock
        data.sharedMutex.lock();

        // Create N threads
        std::vector<:unique_ptr>> readerThreads;
        readerThreads.reserve(data.numThreads);
        for (int i = 0; i (DoStuff, &data));
        }

        // Release write lock
        data.sharedMutex.unlock();

        // Wait for all readers to succeed
        for (auto& thread : readerThreads) {
            thread->join();
        }

        // Cleanup
        readerThreads.clear();

        // Spew so we can tell when it's deadlocked
        count += 1;
        std::cout

Personally I don't think the function lock_shared() should ever be allowed to block forever when there is not an exclusive lock. That, to me, is a bug. One that only appears for std::shared_mutex in the SRW-based Windows MSVC implementation. Maybe it's allowed by the language spec? I'm not a language lawyer.

I'm also inclined to call the SRW behavior either a bug or something that should be documented. There's a 2017 Raymond Chen post that discusses EXACTLY this behavior. He implies it is user error. Therefore I'm inclined to boldly, and perhaps wrongly, call this is an SRW bug.

What do y'all think?

Edit: Updated to explicitly set readCounter to 0. That is not the problem.

Windows API 中读取器/写入器锁中的错误 Bug in reader/writer locks in Windows API

Windows API 中读取器/写入器锁中的错误
Bug in reader/writer locks in Windows API