This article explains the new features in Python 3.15, compared to 3.14.
New features
PEP 799: A dedicated profiling package
A new profiling module has been added to organize Python’s built-in
profiling tools under a single, coherent namespace. This module contains:
The cProfile module remains as an alias for backwards compatibility.
The profile module is deprecated and will be removed in Python 3.17.
See also
PEP 799 for further details.
(Contributed by Pablo Galindo and László Kiss Kollár in gh-138122.)
Tachyon: High frequency statistical sampling profiler
A new statistical sampling profiler (Tachyon) has been added as
profiling.sampling. This profiler enables low-overhead performance analysis of
running Python processes without requiring code modification or process restart.
Unlike deterministic profilers (such as profiling.tracing) that instrument
every function call, the sampling profiler periodically captures stack traces from
running processes. This approach provides virtually zero overhead while achieving
sampling rates of up to 1,000,000 Hz, making it the fastest sampling profiler
available for Python (at the time of its contribution) and ideal for debugging
performance issues in production environments. This capability is particularly
valuable for debugging performance issues in production systems where traditional
profiling approaches would be too intrusive.
Key features include:
Zero-overhead profiling: Attach to any running Python process without affecting its performance. Ideal for production debugging where you can’t afford to restart or slow down your application.
No code modification required: Profile existing applications without restart. Simply point the profiler at a running process by PID and start collecting data.
Flexible target modes:
Profile running processes by PID (
attach) - attach to already-running applicationsRun and profile scripts directly (
run) - profile from the very start of executionExecute and profile modules (
run -m) - profile packages run aspython -m module
Multiple profiling modes: Choose what to measure based on your performance investigation:
Wall-clock time (
--mode wall, default): Measures real elapsed time including I/O, network waits, and blocking operations. Use this to understand where your program spends calendar time, including when waiting for external resources.CPU time (
--mode cpu): Measures only active CPU execution time, excluding I/O waits and blocking. Use this to identify CPU-bound bottlenecks and optimize computational work.GIL-holding time (
--mode gil): Measures time spent holding Python’s Global Interpreter Lock. Use this to identify which threads dominate GIL usage in multi-threaded applications.Exception handling time (
--mode exception): Captures samples only from threads with an active exception. Use this to analyze exception handling overhead.
Thread-aware profiling: Option to profile all threads (
-a) or just the main thread, essential for understanding multi-threaded application behavior.Multiple output formats: Choose the visualization that best fits your workflow:
--pstats: Detailed tabular statistics compatible withpstats. Shows function-level timing with direct and cumulative samples. Best for detailed analysis and integration with existing Python profiling tools.--collapsed: Generates collapsed stack traces (one line per stack). This format is specifically designed for creating flamegraphs with external tools like Brendan Gregg’s FlameGraph scripts or speedscope.--flamegraph: Generates a self-contained interactive HTML flamegraph using D3.js. Opens directly in your browser for immediate visual analysis. Flamegraphs show the call hierarchy where width represents time spent, making it easy to spot bottlenecks at a glance.--gecko: Generates Gecko Profiler format compatible with Firefox Profiler (https://profiler.firefox.com). Upload the output to Firefox Profiler for advanced timeline-based analysis with features like stack charts, markers, and network activity.--heatmap: Generates an interactive HTML heatmap visualization with line-level sample counts. Creates a directory with per-file heatmaps showing exactly where time is spent at the source code level.
Live interactive mode: Real-time TUI profiler with a top-like interface (
--live). Monitor performance as your application runs with interactive sorting and filtering.Async-aware profiling: Profile async/await code with task-based stack reconstruction (
--async-aware). See which coroutines are consuming time, with options to show only running tasks or all tasks including those waiting.Opcode-level profiling: Gather bytecode opcode information for instruction-level profiling (
--opcodes). Shows which bytecode instructions are executing, including specializations from the adaptive interpreter.
See profiling.sampling for the complete documentation, including all
available output formats, profiling modes, and configuration options.
(Contributed by Pablo Galindo and László Kiss Kollár in gh-135953 and gh-138122.)
Improved error messages
The interpreter now provides more helpful suggestions in
AttributeErrorexceptions when accessing an attribute on an object that does not exist, but a similar attribute is available through one of its members.For example, if the object has an attribute that itself exposes the requested name, the error message will suggest accessing it via that inner attribute:
@dataclass class Circle: radius: float @property def area(self) -> float: return pi * self.radius**2 class Container: def __init__(self, inner: Circle) -> None: self.inner = inner circle = Circle(radius=4.0) container = Container(circle) print(container.area)
Running this code now produces a clearer suggestion:
Traceback (most recent call last): File "/home/pablogsal/github/python/main/lel.py", line 42, in <module> print(container.area) ^^^^^^^^^^^^^^ AttributeError: 'Container' object has no attribute 'area'. Did you mean: 'inner.area'?
Upgraded JIT compiler
Results from the pyperformance
benchmark suite report
3-4%
geometric mean performance improvement for the JIT over the standard CPython
interpreter built with all optimizations enabled. The speedups for JIT
builds versus no JIT builds range from roughly 20% slowdown to over
100% speedup (ignoring the unpack_sequence microbenchmark) on
x86-64 Linux and AArch64 macOS systems.
Attention
These results are not yet final.
The major upgrades to the JIT are:
LLVM 21 build-time dependency
New tracing frontend
Basic register allocation in the JIT
More JIT optimizations
Better machine code generation
LLVM 21 build-time dependency
The JIT compiler now uses LLVM 21 for build-time stencil generation. As always, LLVM is only needed when building CPython with the JIT enabled; end users running Python do not need LLVM installed. Instructions for installing LLVM can be found in the JIT compiler documentation for all supported platforms.
(Contributed by Savannah Ostrowski in gh-140973.)
A new tracing frontend
The JIT compiler now supports significantly more bytecode operations and control flow than in Python 3.14, enabling speedups on a wider variety of code. For example, simple Python object creation is now understood by the 3.15 JIT compiler. Overloaded operations and generators are also partially supported. This was made possible by an overhauled JIT tracing frontend that records actual execution paths through code, rather than estimating them as the previous implementation did.
(Contributed by Ken Jin in gh-139109. Support for Windows added by Mark Shannon in gh-141703.)
Basic register allocation in the JIT
A basic form of register allocation has been added to the JIT compiler’s optimizer. This allows the JIT compiler to avoid certain stack operations altogether and instead operate on registers. This allows the JIT to produce more efficient traces by avoiding reads and writes to memory.
(Contributed by Mark Shannon in gh-135379.)
More JIT optimizations
More constant-propagation is now performed. This means when the JIT compiler detects that certain user code results in constants, the code can be simplified by the JIT.
(Contributed by Ken Jin and Savannah Ostrowski in gh-132732.)
The JIT avoids reference counts where possible. This generally reduces the cost of most operations in Python.
(Contributed by Ken Jin, Donghee Na, Zheao Li, Savannah Ostrowski, Noam Cohen, Tomas Roun, PuQing in gh-134584.)
Better machine code generation
The JIT compiler’s machine code generator now produces better machine code for x86-64 and AArch64 macOS and Linux targets. In general, users should experience lower memory usage for generated machine code and more efficient machine code versus the old JIT.
(Contributed by Brandt Bucher in gh-136528 and gh-136528. Implementation for AArch64 contributed by Mark Shannon in gh-139855. Additional optimizations for AArch64 contributed by Mark Shannon and Diego Russo in gh-140683 and gh-142305.)