自由线程 CPython 已准备好进行实验

自由线程 CPython 已准备好进行实验
Free-threaded CPython is ready to experiment with

原始链接: https://labs.quansight.org/blog/free-threaded-python-rollout

本文介绍了 Python 中的一项新发展，称为“自由线程”。这是一个重大变化，允许多个线程在 CPython 中的同一解释器中同时运行。目标是增强多线程性能，随着现代 CPU 的内核数量不断增加而时钟速度保持不变，这一点变得越来越重要。为了实现这一点，自由线程解释器在没有全局解释器锁 (GIL) 的情况下运行，该锁可以保护线程不安全的 C、C++、Cython 和 Fortran 代码。然而，这种变化也伴随着复杂性。纯 Python 代码工作正常时会出现线程安全问题，但用其他语言或使用 CPython C API 编写的代码可能会面临意外结果，例如崩溃、间歇性错误或无法解释的运行时错误。此外，默认和自由线程 CPython 构建之间的二进制接口差异可能会导致某些包需要额外的 Wheel 构建。此开发需要开发人员进行大量工作以确保兼容性并避免与线程安全相关的问题。作者鼓励开发人员通过他们最喜欢的安装方法开始探索自由线程解释器（macOS/Linux/Windows，通过 python.org/pyenv/apt/yum/conda）。他们还分享了实现过程中遇到的线程安全问题的经验，以及解决这些问题的建议。他们提供了此类问题的示例，包括 Numpy 和 Scipy 等流行 Python 包中的间歇性故障。尽管存在这些困难，他们相信可靠的线程安全和改进的测试策略可以缓解这些问题。为了帮助开发人员，作者创建了一个资源中心，其中包含有关添加自由线程支持的文档，以及一个状态跟踪器，显示采用自由线程的各种开源项目的进度。在即将召开的 SciPy 大会上，题为“支持自由线程 Python”的 BOF（物以类聚）会议将重点讨论这项令人兴奋的新技术。作者邀请开发人员加入他们，共同过渡到更快、更高效的 Python 环境。

编码时，自己清理是很重要的。父进程和子进程都需要适当的清理，特别是与打开的文件、网络连接和内存使用相关的清理。忽视适当的清理可能会导致诸如死锁进程、资源泄漏、数据损坏、孤立进程和文件中意外状态更改等问题。虽然异步任务使流程看起来仍然具有响应能力，但这并不一定意味着流程已成功完成或清理了所有必要的资源。因此，开发人员应确保在执行任务后正确关闭所有资源，即使使用异步 API 或框架也是如此。此外，对函数的属性（例如昂贵的计算成本、网络要求或潜在的副作用）进行注释可以增强对代码的理解和组织。然而，对“async/await”等异步结构的批评，例如其感知的复杂性，或对同步和异步编程模型之间转换期间的冗余的担忧，可能因开发人员而异。最终，采用新的编程范例和最佳实践的决定取决于个人偏好、团队共识和手头任务的性质。

原文

First, a few announcements:

Yesterday, py-free-threading.github.io launched! It's both a resource with documentation around adding support for free-threaded Python, and a status tracker for the rollout across open source projects in the Python ecosystem. We hope and expect both of these to be very useful, with the status tracker providing a one-stop-shop to check the support status of the dependencies of your project (e.g., "what was the first release of a package on PyPI to support free-threaded Python?" or "are there nightly wheels and where can I find them?") and get an overview of ecosystem-wide progress:

Tracking website for package compatibility with free-threaded CPython.

Later today, the Birds-of-a-Feather session "Supporting free-threaded Python" will be held at the SciPy 2024 conference (co-organized by one of our team members, Nathan Goldbaum, together with Madicken Munk), focusing on knowledge and experience sharing.

Free-threaded CPython - what, why, how?

You may be wondering by now what "free threading" or "free-threaded CPython" is, and why you should care. In summary: it is a major change to CPython that allows running multiple threads in parallel within the same interpreter. It is becoming available as an experimental feature in CPython 3.13. A free-threaded interpreter can run with the global interpreter lock (GIL) disabled - a capability that is finally arriving as a result of the efforts that went into PEP 703 - Making the Global Interpreter Lock Optional in CPython.

Why? Performance. Multi-threaded performance. It makes it significantly easier to write code that efficiently runs in parallel and will utilize multiple CPU cores effectively. The core counts in modern CPUs continue to grow, while clock speeds do not grow, so multi-threaded performance will continue to grow in importance.

How? It's now easy to get started by installing a free-threaded interpreter: macOS/Linux/Windows & python.org/pyenv/apt/yum/conda - your preferred option is probably available now.

Sounds awesome - what's the catch?

Implementing free-threading in CPython itself is a massive effort already, and worthy of its own (series of) blog post(s). For the wider ecosystem, there's also a ton of work involved, mainly due to two problems:

Thread-safety. While pure Python code should work unchanged, code written in other languages or using the CPython C API may not. The GIL was implicitly protecting a lot of thread-unsafe C, C++, Cython, Fortran, etc. code - and now it no longer does. Which may lead to all sorts of fun outcomes (crashes, intermittent incorrect behavior, etc.).
ABI incompatibility between the default and free-threaded CPython builds. The result of a free-threaded interpreter having a different ABI is that each package that has extension modules must now build extra wheels.

Out of these two, the thread-safety one is the more hairy problem. Having to implement and maintain extra wheel build jobs is not ideal, but the work itself is well-understood - it just needs doing for each project with extension modules. Thread-safety on the other hand is harder to understand, improve, and even test reliably. Because multithreaded code is usually sensitive to the timing of how multiple threads run and access shared state, bugs may manifest rarely. And a crash or failure that is hard to reproduce locally is harder to fix then one that is always reproducible.

Here are a couple of examples of such intermittent failures:

numpy#26690 shows an example where a simple call to the .sum() method of a numpy array fails with a fairly mysterious


RuntimeError: Identity cache already includes the item.

when used with the Python threading and queue modules. This was noticed in a scikit-learn CI job - it never failed in NumPy's own CI (scikit-learn has more tests involving parallelism). After the bug report with a reproducer was submitted, the fix to a numpy-internal cache wasn't that hard.

pywavelets#758 was a report of another fairly obscure failure in a test using concurrent.futures:


TypeError: descriptor '__enter__' for '_thread.RLock' objects doesn't apply to a '_thread.lock' object

That looked a lot like a problem in CPython, and after some investigating it was found there as well cpython#121368 and fixed fairly quickly (the fix required some deep expertise in both CPython internals and multithreaded programming in C though).

There are a fair amount of examples like that, e.g. undefined behavior in Cython code that no longer worked due to changes in CPython 3.13, a crash from C code in scipy.signal that hadn't been touched for 24 years (it was always buggy, but the GIL offered enough protection), and a crash in Pillow due to Python C API usage that wasn't supported.

It's encouraging though that issues like the ones above do get understood and resolved fairly quickly. With a good test strategy, and over time also test suites of libraries that cover Python-level threading better (such tests are largely non-existent now in most packages), detecting or guarding against thread-safety issues does seem doable. That test strategy will have to be multi-pronged: from writing new tests and running tests in loops with pytest-repeat & co., to getting ThreadSanitizer to work in CI and doing integration-level and real-world testing with users.

The road ahead & what our team will be working on

Free-threaded CPython becoming the default, and eventually the only, build of CPython is several years away. What we're hoping to see, and help accomplish, is that for Python 3.13 many projects will work on compatibility and start releasing cp313t wheels on PyPI (and possibly nightly builds too, for projects with a lot of dependencies), so users and packages further downstream can start experimenting as well. After a full year of maturing support in the ecosystem and further improvements in performance in CPython itself, we should have a good picture of both the benefits and the remaining challenges with robustness.

Our team (currently Nathan, Ken Jin, Lysandros, Edgar, and myself) has now been working on this topic for a few months, starting at the bottom of the PyData stack (most effort so far has gone to NumPy, Cython, and CPython), and slowly working our way up from there.

For each package, the approach has been similar so far - and a lot of that can be used as a template by others we think. The steps are roughly:

Add a first CI job, usually Linux x86-64 with the latest Python 3.13 pre-release candidate, and ensure the test suite passes,
Based on knowledge from maintainers, fix known issues with thread-safety and shared/global state in native code,
Add free-threaded support to the wheel build CI jobs, and start uploading nightly wheels (if appropriate for the project),
Do some stress testing locally and monitor CI jobs, and fix failures that are observed (take the opportunity to add regression tests using threading or concurrent.futures.ThreadPoolExecutor)
Mark extension modules as supporting running without the GIL
Move on to a next package (e.g., a key dependency) and using its test suite to exercise the first package more, circling back to fix issues or address follow-up actions as needed.

Our main takeaway so far: it's challenging, but tractable! And fun as well:)

We've only just scratched the surface, there'll be a lot to do - from key complex packages like PyO3 (important for projects using Rust) and PyTorch, to the sheer volume of smaller packages with extension modules. The lessons we are learning, as far as they are reusable, are going into the documentation at py-free-threading.github.io. The repository that contains the sources for that website also has an issue tracker that is used to link to the relevant project-specific tracking issues for free-threaded support, as well as for ecosystem-wide issues and tasks (contributions and ideas are very welcome here!).

Furthermore, we'd like to spend time on whatever may be impactful in helping the ecosystem adopt free-threaded CPython, from answering questions to helping with debugging - please don't hesitate to reach out or ping one of us directly on GitHub!

Conclusion & acknowledgements

We're really excited about what is becoming possible with free-threaded CPython! While our team is busy with implementing CI jobs and fixing thread-safety issues, we are as curious as anyone to see what performance improvements and interesting experiments are going to show up with real-world code soon.

It's hard to acknowledge and thank everyone involved in moving free-threaded CPython forward, because so much activity is happening. First of all we have to thank Meta for funding the efforts of our team to help the ecosystem adopt free-threaded CPython at the pace that will be needed to make this whole endeavour a success, and Sam Gross and the whole Python Runtime team at Meta for the close collaboration. Then the list is long - from the Python Steering Council, for its thoughtful approach to (and acceptance of) PEP 703, to the many library maintainers and community members who are proactively adding support to their own projects or guide and review our contributions whenever we work on projects we are not ourselves maintainers of.