Python 3.15 新增内容

Python 3.15 新增内容
What's New in Python 3.15

原始链接: https://docs.python.org/3.15/whatsnew/3.15.html

## Python 3.15: 主要新特性 Python 3.15 引入了显著的增强，尤其是在性能分析和性能方面。一个新的 `profiling` 包整合了现有工具，并显著增加了 **Tachyon**，一种零开销的统计采样分析器，能够在不修改代码的情况下分析正在运行的进程——非常适合生产环境调试。Tachyon 提供多种分析模式（CPU 时间、GIL 时间等）和多种输出格式，包括火焰图和热图。性能也因 **JIT 编译器** 的改进而提升，现在使用 LLVM 21。JIT 编译器受益于新的跟踪前端、基本的寄存器分配以及常量传播和减少引用计数等优化，从而使性能比优化的 CPython 提高了 3-4% 的几何平均值。最后，Python 3.15 提供了 **改进的错误信息**，为属性访问错误提供有用的建议，并继续开发 JIT 编译器以支持更广泛的字节码。这些更新旨在使 Python 更快、更易于调试且更易于使用。

## Python 3.15：主要更新与讨论 Hacker News 讨论了 Python 3.15 即将推出的功能，重点关注性能改进。一个主要亮点是 JIT（即时编译）编译器的持续开发，旨在使 Python 成为 Lisp 等语言更具竞争力的替代品。许多评论员对性能分析增强表示兴奋。其他值得注意的更改包括将 UTF-8 设置为默认编码，简化 POSIX 系统上的虚拟环境目录结构，以及改进错误消息（特别是建议可能的属性更正）。讨论还涉及 Python 与 JIT 编译的历史关系，将其与 Ruby 更积极的采用进行对比。存在关于异常处理作为控制流机制的争论——尽管存在不同意见，但这在 Python 中是一种常见做法——以及采用新 Python 版本时依赖项更新的挑战。最后，提到了“faster-cpython”项目，该项目正在获得社区的重新维护。该讨论强调，这些信息来自 alpha 版本，可能会发生变化。

原文

This article explains the new features in Python 3.15, compared to 3.14.

New features

PEP 799: A dedicated profiling package

A new profiling module has been added to organize Python’s built-in profiling tools under a single, coherent namespace. This module contains:

The cProfile module remains as an alias for backwards compatibility. The profile module is deprecated and will be removed in Python 3.17.

Tachyon: High frequency statistical sampling profiler

A new statistical sampling profiler (Tachyon) has been added as profiling.sampling. This profiler enables low-overhead performance analysis of running Python processes without requiring code modification or process restart.

Unlike deterministic profilers (such as profiling.tracing) that instrument every function call, the sampling profiler periodically captures stack traces from running processes. This approach provides virtually zero overhead while achieving sampling rates of up to 1,000,000 Hz, making it the fastest sampling profiler available for Python (at the time of its contribution) and ideal for debugging performance issues in production environments. This capability is particularly valuable for debugging performance issues in production systems where traditional profiling approaches would be too intrusive.

Key features include:

Zero-overhead profiling: Attach to any running Python process without affecting its performance. Ideal for production debugging where you can’t afford to restart or slow down your application.
No code modification required: Profile existing applications without restart. Simply point the profiler at a running process by PID and start collecting data.
Flexible target modes:
- Profile running processes by PID (attach) - attach to already-running applications
- Run and profile scripts directly (run) - profile from the very start of execution
- Execute and profile modules (run -m) - profile packages run as python -m module
Multiple profiling modes: Choose what to measure based on your performance investigation:
- Wall-clock time (--mode wall, default): Measures real elapsed time including I/O, network waits, and blocking operations. Use this to understand where your program spends calendar time, including when waiting for external resources.
- CPU time (--mode cpu): Measures only active CPU execution time, excluding I/O waits and blocking. Use this to identify CPU-bound bottlenecks and optimize computational work.
- GIL-holding time (--mode gil): Measures time spent holding Python’s Global Interpreter Lock. Use this to identify which threads dominate GIL usage in multi-threaded applications.
- Exception handling time (--mode exception): Captures samples only from threads with an active exception. Use this to analyze exception handling overhead.
Thread-aware profiling: Option to profile all threads (-a) or just the main thread, essential for understanding multi-threaded application behavior.
Multiple output formats: Choose the visualization that best fits your workflow:
- --pstats: Detailed tabular statistics compatible with pstats. Shows function-level timing with direct and cumulative samples. Best for detailed analysis and integration with existing Python profiling tools.
- --collapsed: Generates collapsed stack traces (one line per stack). This format is specifically designed for creating flamegraphs with external tools like Brendan Gregg’s FlameGraph scripts or speedscope.
- --flamegraph: Generates a self-contained interactive HTML flamegraph using D3.js. Opens directly in your browser for immediate visual analysis. Flamegraphs show the call hierarchy where width represents time spent, making it easy to spot bottlenecks at a glance.
- --gecko: Generates Gecko Profiler format compatible with Firefox Profiler (https://profiler.firefox.com). Upload the output to Firefox Profiler for advanced timeline-based analysis with features like stack charts, markers, and network activity.
- --heatmap: Generates an interactive HTML heatmap visualization with line-level sample counts. Creates a directory with per-file heatmaps showing exactly where time is spent at the source code level.
Live interactive mode: Real-time TUI profiler with a top-like interface (--live). Monitor performance as your application runs with interactive sorting and filtering.
Async-aware profiling: Profile async/await code with task-based stack reconstruction (--async-aware). See which coroutines are consuming time, with options to show only running tasks or all tasks including those waiting.
Opcode-level profiling: Gather bytecode opcode information for instruction-level profiling (--opcodes). Shows which bytecode instructions are executing, including specializations from the adaptive interpreter.

See profiling.sampling for the complete documentation, including all available output formats, profiling modes, and configuration options.

(Contributed by Pablo Galindo and László Kiss Kollár in gh-135953 and gh-138122.)

Improved error messages

The interpreter now provides more helpful suggestions in AttributeError exceptions when accessing an attribute on an object that does not exist, but a similar attribute is available through one of its members.

For example, if the object has an attribute that itself exposes the requested name, the error message will suggest accessing it via that inner attribute:

@dataclass
class Circle:
   radius: float

   @property
   def area(self) -> float:
      return pi * self.radius**2

class Container:
   def __init__(self, inner: Circle) -> None:
      self.inner = inner

circle = Circle(radius=4.0)
container = Container(circle)
print(container.area)

Running this code now produces a clearer suggestion:

Traceback (most recent call last):
File "/home/pablogsal/github/python/main/lel.py", line 42, in <module>
   print(container.area)
         ^^^^^^^^^^^^^^
AttributeError: 'Container' object has no attribute 'area'. Did you mean: 'inner.area'?

Upgraded JIT compiler

Results from the pyperformance benchmark suite report 3-4% geometric mean performance improvement for the JIT over the standard CPython interpreter built with all optimizations enabled. The speedups for JIT builds versus no JIT builds range from roughly 20% slowdown to over 100% speedup (ignoring the unpack_sequence microbenchmark) on x86-64 Linux and AArch64 macOS systems.

Attention

These results are not yet final.

The major upgrades to the JIT are:

LLVM 21 build-time dependency
New tracing frontend
Basic register allocation in the JIT
More JIT optimizations
Better machine code generation

LLVM 21 build-time dependency

The JIT compiler now uses LLVM 21 for build-time stencil generation. As always, LLVM is only needed when building CPython with the JIT enabled; end users running Python do not need LLVM installed. Instructions for installing LLVM can be found in the JIT compiler documentation for all supported platforms.

(Contributed by Savannah Ostrowski in gh-140973.)

A new tracing frontend

The JIT compiler now supports significantly more bytecode operations and control flow than in Python 3.14, enabling speedups on a wider variety of code. For example, simple Python object creation is now understood by the 3.15 JIT compiler. Overloaded operations and generators are also partially supported. This was made possible by an overhauled JIT tracing frontend that records actual execution paths through code, rather than estimating them as the previous implementation did.

(Contributed by Ken Jin in gh-139109. Support for Windows added by Mark Shannon in gh-141703.)

Basic register allocation in the JIT

A basic form of register allocation has been added to the JIT compiler’s optimizer. This allows the JIT compiler to avoid certain stack operations altogether and instead operate on registers. This allows the JIT to produce more efficient traces by avoiding reads and writes to memory.

(Contributed by Mark Shannon in gh-135379.)

More JIT optimizations

More constant-propagation is now performed. This means when the JIT compiler detects that certain user code results in constants, the code can be simplified by the JIT.

(Contributed by Ken Jin and Savannah Ostrowski in gh-132732.)

The JIT avoids reference counts where possible. This generally reduces the cost of most operations in Python.

(Contributed by Ken Jin, Donghee Na, Zheao Li, Savannah Ostrowski, Noam Cohen, Tomas Roun, PuQing in gh-134584.)

Better machine code generation

The JIT compiler’s machine code generator now produces better machine code for x86-64 and AArch64 macOS and Linux targets. In general, users should experience lower memory usage for generated machine code and more efficient machine code versus the old JIT.

(Contributed by Brandt Bucher in gh-136528 and gh-136528. Implementation for AArch64 contributed by Mark Shannon in gh-139855. Additional optimizations for AArch64 contributed by Mark Shannon and Diego Russo in gh-140683 and gh-142305.)