Python/optimizer.c
cpython 3.14 @ ab2d84fe1023/Python/optimizer.c
The tier-2 optimizer is CPython's second compilation tier, introduced
experimentally in 3.12 and stabilized through 3.13/3.14. Where tier-1
specialization rewrites individual instructions, tier-2 records entire
execution traces across loop back-edges, runs peephole passes over the
trace, and installs an _PyExecutorObject into the code object. On
subsequent loop iterations the tier-1 eval loop detects the executor and
transfers control to the tier-2 micro-op dispatch (in ceval.c) or
to the JIT-compiled native code (when _Py_JIT is defined).
The optimizer operates entirely in terms of _PyUOpInstruction, a
64-bit struct that encodes one micro-op. Micro-ops correspond roughly to
bytecode instructions but are lower-level: a single bytecode can expand
into several micro-ops, and some micro-ops are pure optimizer artifacts
(guards, version checks) with no direct bytecode counterpart.
Map
| Lines | Symbol | Role | gopy |
|---|---|---|---|
| 1-120 | _PyUOpInstruction, _PyExecutorObject, executor vtable | Core data types. One micro-op is {uint16_t opcode, uint8_t oparg, uint8_t format, uint64_t operand}. | vm/eval_gen.go |
| 121-280 | _Py_Optimizer, optimizer vtable, _PyOptimizer_NewUOpOptimizer | Base optimizer object and the default UOpOptimizer constructor. | vm/eval_gen.go |
| 281-500 | _PyOptimizer_Optimize | Entry point called from the tier-1 back-edge hot path when the counter fires. Delegates to opt->optimize. | vm/eval_gen.go |
| 501-900 | translate_bytecode_to_trace | Converts a bytecode sequence starting at the back-edge target into a _PyUOpInstruction array. Handles guards, branches, and loop headers. | vm/eval_gen.go |
| 901-1100 | uop_optimize | Peephole pass over a recorded trace: dead-guard elimination, constant folding, type propagation. | vm/eval_gen.go |
| 1101-1300 | make_executor / _PyExecutorObject installation | Allocates the executor, copies the optimized trace, and stores it in code->co_executors. | vm/eval_gen.go |
| 1301-1400 | executor_clear / executor_dealloc | Executor lifecycle: clear back-pointers in co_executors and free. | vm/eval_gen.go |
| 1401-1500 | _PyOptimizer_GetDefaultOptimizer, _PyOptimizer_SetDefaultOptimizer, stats | Public API and per-optimizer statistics. | vm/eval_gen.go |
Reading
_Py_Optimizer interface (lines 121 to 280)
cpython 3.14 @ ab2d84fe1023/Python/optimizer.c#L121-280
The optimizer is exposed as a C object with a vtable:
typedef struct _PyOptimizerObject {
PyObject_HEAD
optimize_func optimize;
/* Stats: */
uint32_t calls_to_optimize;
uint32_t optimizations_attempted;
uint32_t optimizations_completed;
uint32_t optimizations_failed;
} _PyOptimizerObject;
typedef int (*optimize_func)(
_PyOptimizerObject *self,
PyCodeObject *code,
_Py_CODEUNIT *instr,
_PyExecutorObject **exec_ptr,
int curr_stackdepth,
bool progress_needed);
This design lets embedders and test suites swap in a custom optimizer by
calling _PyOptimizer_SetDefaultOptimizer. The default is the
UOpOptimizer created by _PyOptimizer_NewUOpOptimizer. Its optimize
pointer is set to uop_optimize (which calls translate_bytecode_to_trace
then uop_optimize). The interpreter holds a reference to the current
default optimizer in _PyRuntime.optimizer; the back-edge counter
decrement in JUMP_BACKWARD reads this pointer without locking since it
is only mutated at startup.
translate_bytecode_to_trace (lines 501 to 900)
cpython 3.14 @ ab2d84fe1023/Python/optimizer.c#L501-900
This is the trace recorder. Starting from the instruction pointed to by the
back-edge target, it walks forward through the bytecode, emitting one or
more _PyUOpInstruction entries per bytecode. Each bytecode has a
corresponding _PyUOps_* expansion defined in Python/uop_metadata.h.
static int
translate_bytecode_to_trace(
PyCodeObject *code, _Py_CODEUNIT *instr,
_PyUOpInstruction *trace, int max_length,
_PyBloomFilter *dependencies)
{
...
for (;;) {
uint8_t opcode = instr->op.code;
uint8_t oparg = instr->op.arg;
...
switch (opcode) {
case JUMP_BACKWARD:
if (instr + 2 - backwards_jump == trace_length) {
ADD_TO_TRACE(_JUMP_TO_TOP, 0, 0);
}
goto done;
case LOAD_FAST:
ADD_TO_TRACE(_LOAD_FAST, oparg, 0);
break;
...
default:
if (!opcode_has_uop_expansion(opcode)) {
goto cannot_specialize;
}
emit_uop_expansion(opcode, oparg, instr, trace, ...);
}
instr++;
}
done:
return trace_length;
}
The function stops recording when it hits a branch that cannot be inlined
(an if that both arms take at runtime), when the trace exceeds
UOP_MAX_TRACE_LENGTH, or when it loops back to the starting instruction,
at which point it emits _JUMP_TO_TOP to close the loop. Branches whose
target is within the trace are inlined as guards (_GUARD_* uops followed
by the non-taken-branch body).
Type version checks emitted during translation are registered with the
dependencies bloom filter. If a type's version tag changes after the
executor is installed, executor_clear is called to invalidate it.
_PyExecutorObject installation (lines 1101 to 1300)
cpython 3.14 @ ab2d84fe1023/Python/optimizer.c#L1101-1300
After translate_bytecode_to_trace and uop_optimize both succeed,
make_executor allocates a variable-length _PyExecutorObject whose
trailing array holds the optimized _PyUOpInstruction sequence:
static _PyExecutorObject *
make_executor(int length, _PyUOpInstruction *buffer,
_PyBloomFilter *dependencies)
{
int size = offsetof(_PyExecutorObject, trace) +
length * sizeof(_PyUOpInstruction);
_PyExecutorObject *executor = PyObject_GC_NewVar(
_PyExecutorObject, &_PyUOpExecutor_Type, length);
if (executor == NULL) {
return NULL;
}
memcpy(executor->trace, buffer, length * sizeof(_PyUOpInstruction));
...
return executor;
}
The executor is then stored in code->co_executors at the slot
corresponding to the back-edge instruction's oparg. The tier-1 dispatch
for JUMP_BACKWARD checks this slot on every iteration; a non-NULL
executor causes a branch to enter_tier_two: in ceval.c, which picks up
next_uop = executor->trace and starts the micro-op dispatch loop.
When a type that the executor depends on is mutated, executor_clear
walks code->co_executors and NULLs out any entry whose dependency bloom
filter intersects the changed type's version. This makes optimization
invalidation lazy: the slot is cleared but the executor object is not freed
until its refcount drops to zero, which happens on the next back-edge that
finds a NULL slot and decrements the old executor reference.
gopy mirror
vm/eval_gen.go contains the tier-2 dispatch loop and the
_PyUOpInstruction struct. The trace recorder and peephole optimizer from
this file are partially ported. translate_bytecode_to_trace is ported
for a subset of uops covering the 14 micro-ops shipped in v0.12.0. The
_PyExecutorObject and its installation into code->co_executors are
present; invalidation via bloom filter dependencies is stubbed.
The _Py_JIT path (which replaces the micro-op dispatch entirely with
native code) is intentionally not ported. gopy has no JIT; the
_Py_JIT preprocessor symbol is permanently undefined in the Go port.
CPython 3.14 changes worth noting
- The
_PyUOpInstructionformat field was added in 3.13 to distinguish operand encodings (16-bit oparg vs 64-bit operand vs pointer); 3.14 inherits this layout. - In 3.14 the default optimizer is no longer installed unconditionally at
interpreter startup. It is activated only when
sys.flags.optimize_uopsis non-zero, matching the-X optimize_uopsflag or thePYTHON_OPTIMIZATIONSenvironment variable. uop_optimizegained a type-propagation sub-pass in 3.13 that eliminates redundant_CHECK_MANAGED_OBJECT_HAS_VALUESguards; this pass is included in the 3.14 tree.- The free-threaded build requires executor invalidation to be
thread-safe. In the GIL build a simple
co_executors[i] = NULLis sufficient; in the free-threaded build a compare-and-swap is used instead to avoid data races with threads that are concurrently executing the tier-2 loop.