optimizer.c
Python/optimizer.c implements CPython's tier-2 optimizer, the JIT pipeline that
sits behind the specializing adaptive interpreter. When a backward branch is taken enough
times, the optimizer traces the hot loop into a linear sequence of micro-ops (uops), runs
simplification passes over that sequence, and installs an executor object that the tier-1
dispatch loop can jump into directly on subsequent iterations.
Map
| Lines | Symbol | Role |
|---|---|---|
| 80–140 | _PyOptimizer_NewUOpOptimizer | Allocates and returns the default uop optimizer |
| 200–380 | translate_bytecode_to_trace | Converts tier-1 bytecodes to uops, inserting type guards |
| 400–550 | _Py_Optimize_Uops | Entry point: traces a hot counter, calls translate, installs executor |
| 560–720 | _Py_uop_optimize | Constant-folding and dead-code-elimination passes over a trace |
| 730–900 | executor_iternext / executor_dealloc | Executor object protocol: tp_iternext drives uop dispatch |
| 900–1200 | per-opcode uop emitters | One emitter per tier-1 opcode; some expand to multiple uops |
| 1200–1600 | type-guard insertion helpers | emit_type_guard, specialize_load_attr, etc. |
| 1600–1800 | _PyUOpInstruction array helpers | Resize, append, and finalize the instruction buffer |
| 1800–2000 | statistics and debug hooks | _Py_uop_stats, OPT_STAT_INC macros |
Reading
Creating the optimizer
_PyOptimizer_NewUOpOptimizer constructs the singleton optimizer that _PyInterpreterState
holds. It sets the hot-threshold counter and wires the optimize function pointer to
_Py_Optimize_Uops.
// CPython: Python/optimizer.c:95 _PyOptimizer_NewUOpOptimizer
_PyOptimizerObject *
_PyOptimizer_NewUOpOptimizer(void)
{
_PyUOpOptimizerObject *opt = PyObject_New(
_PyUOpOptimizerObject, &_PyUOpOptimizer_Type);
if (opt == NULL) {
return NULL;
}
opt->base.optimize = _Py_Optimize_Uops;
opt->base.resume_threshold = RESUME_BACK_EDGE_THRESHOLD;
opt->base.backedge_threshold = JUMP_BACKWARD_INITIAL_VALUE;
return (_PyOptimizerObject *)opt;
}
Tracing: bytecode to uops
translate_bytecode_to_trace walks forward from a hot backward-branch target, converting
each tier-1 instruction to one or more uops. Instructions that cannot be expressed as uops
(rare opcodes, unresolvable CALL targets) abort the trace. Type guards are inserted
whenever a specialization assumption must be checked at runtime.
// CPython: Python/optimizer.c:210 translate_bytecode_to_trace
static int
translate_bytecode_to_trace(
PyCodeObject *code,
_Py_CODEUNIT *instr,
_PyUOpInstruction *trace,
int buffer_size,
_PyBloomFilter *dependencies)
{
/* Walk instructions; for each opcode emit uops into trace[]. */
...
}
The function returns the number of uops written, or a negative value on abort. The caller
in _Py_Optimize_Uops retries with a larger buffer on TRACE_TOO_LONG.
Optimization passes
_Py_uop_optimize runs two passes over the raw trace in place. The first pass propagates
known constants through _Py_UNARY_NUMERIC_OVERLOAD uops and eliminates branches whose
condition is statically determined. The second pass removes any uop whose output is never
consumed (dead stores inside the trace window).
// CPython: Python/optimizer.c:562 _Py_uop_optimize
int
_Py_uop_optimize(
_PyInterpreterFrame *frame,
_PyUOpInstruction *buffer,
int length,
_PyBloomFilter *dependencies,
int curr_stacklen)
{
...
/* Pass 1: constant folding */
/* Pass 2: dead-code elimination */
return new_length;
}
The executor object
The executor is a heap object whose tp_iternext slot drives the uop dispatch loop.
executor_iternext advances through the _PyUOpInstruction array and returns NULL
(with no exception) when the trace exits back to the tier-1 loop.
// CPython: Python/optimizer.c:735 executor_iternext
static PyObject *
executor_iternext(_PyExecutorObject *self)
{
_PyUOpInstruction *pc = self->trace + self->current;
...
}
gopy notes
- gopy's tier-2 work lives in
compile/flowgraph_passes.goandvm/eval_gen.go. The dispatcher invm/eval_gen.gochecksframe.Executorbefore entering the tier-1 loop, mirroring theRESUMEfast-path in CPython'sceval.c. _PyOptimizer_NewUOpOptimizerhas no direct gopy equivalent yet; the optimizer is instantiated inline inpythonrun/runstring.goduring interpreter setup.- Type guards emitted by
translate_bytecode_to_tracecorrespond to theGUARD_TYPEandGUARD_BOTH_INTuops listed incompile/flowgraph.go. - The
_PyBloomFilterdependency tracking has no gopy equivalent; invalidation is currently deferred to a later milestone.
CPython 3.14 changes
- The tier-2 optimizer was marked non-experimental in 3.13;
--enable-experimental-jitis no longer required to activate it at runtime. - In 3.14
translate_bytecode_to_tracegained loop-unrolling support: a trace may now include two iterations of the hot loop when the loop body is short enough to fit within the buffer limit. _Py_uop_optimizewas split from a single monolithic pass into the two-pass structure described above, making it easier to add further passes (e.g., range analysis) in future releases.- The
_PyUOpInstructionstruct gained aopcode_metadatapointer in 3.14 to allow passes to query per-opcode stack effects without a separate table lookup.