Skip to main content

Tier-2 trace projector

PEP 744 introduces a second tier of interpretation. The first tier is the bytecode loop you already know. The second tier projects hot loops into a trace of uops (micro-ops), runs analysis passes over it, and executes it on a separate dispatch loop that is friendlier to a JIT backend.

Tier-2 ships in CPython 3.13 as an experimental opt-in (PYTHON_JIT=1) and is being hardened toward a default-on JIT.

Source map

FileRole
Python/optimizer.cTrace projection, executor lifecycle.
Python/optimizer_analysis.cThe three-phase analysis orchestrator.
Python/optimizer_bytecodes.cThe uop DSL definitions.
Python/executor_cases.c.hThe generated uop dispatch loop.
Include/internal/pycore_optimizer.h_PyExecutorObject, side tables.
Python/jit.cThe JIT backend (when enabled).

The big picture

Tier-1 loop (ceval.c)

│ hits JUMP_BACKWARD or back-edge in a hot loop

▼ try_warmup_tier2
trace projector (optimizer.c)

│ walks Tier-1 bytecode, expands each op into uops


uop trace ([_START_EXECUTOR, _MAKE_WARM, ...uops, _JUMP_TO_TOP])

▼ analysis passes (optimizer_analysis.c)
│ remove unneeded _SET_IP and _CHECK_VALIDITY
│ abstract interpretation
│ guard removal


Executor object

▼ enterExecutor / JIT

The Tier-1 loop installs ENTER_EXECUTOR at the back-edge of the hot loop. Subsequent iterations skip directly into the Tier-2 dispatch loop. If the trace deopts, control returns to Tier-1 at the same byte offset.

Trace projection

_Py_uop_translate_bytecode_to_trace walks the Tier-1 bytecode starting from a hot back-edge. For each Tier-1 opcode it consults the expansion table (auto-generated from optimizer_bytecodes.c) to emit one or more uops. Examples:

  • LOAD_FAST oparg -> _LOAD_FAST oparg.
  • BINARY_OP_ADD_INT -> _GUARD_BOTH_INT, _BINARY_OP_ADD_INT.
  • CALL_PY_EXACT_ARGS -> _CHECK_FUNCTION_EXACT_ARGS, _CHECK_STACK_SPACE, _INIT_CALL_PY_EXACT_ARGS, _SAVE_RETURN_OFFSET, _PUSH_FRAME.

The projection stops when it hits a side exit (unsupported op, a deopt to Tier-1, or trace-length limit). The terminator is _JUMP_TO_TOP (loop forever) or _EXIT_TRACE (fall back).

Around the trace the projector inserts a prelude (_START_EXECUTOR, _MAKE_WARM) and an epilogue (_JUMP_TO_TOP).

Uops

A uop is a (op, oparg, operand, target) tuple. The set of uops is described in optimizer_bytecodes.c in the same DSL used for Tier-1, with these differences:

  • No inline caches.
  • Every input and output is explicit on the stack signature.
  • Side exits are explicit DEOPT_IF macros.

The full uop list lives in pycore_uop_ids.h (one ID per uop) and pycore_uop_metadata.h (per-uop tables: stack effects, escape analysis flags, deopt targets).

Analysis

The analysis pipeline runs three phases:

  1. remove_unneeded_uops. Drops _SET_IP and _CHECK_VALIDITY rows where escapes can be proved out of reach since the most recent _START_EXECUTOR. Also collapses _PUSH_NULL followed by _POP_TOP.
  2. Abstract interpretation. Walks the trace with a JitOptContext lattice that tracks known types, known constants, and known immortality of each stack slot. The lattice lets later passes elide redundant guards.
  3. Guard elimination. A guard whose precondition is implied by the lattice is dropped.

The lattice values include BOTTOM (unreachable), TYPE(T) (the value is known to be a T), INT_CONST(n), STR_CONST(s), IMMORTAL, and TOP (unknown).

Executor

An _PyExecutorObject is a heap object with:

  • The compiled uop trace.
  • A pointer back to the install site in the Tier-1 bytecode.
  • A list of dependent guards (type version, dict keys version).
  • A bloom filter of code object dependencies, for fast invalidation when any of those objects mutates.

Executors are stored in a per-code-object side table (co_executors), allowing many install sites per function.

Lifecycle

StepWhat happens
AllocateExecutormalloc + GC tracking.
ExecutorInitCopies the trace, registers dependents, hooks watchers.
detachDisconnects from co_executors. Trace remains live until refs drop.
clearEmpties the trace and zeroes the prelude.
pending deletionLinked into a thread-local list. Freed after the eval loop returns.

The deferred-free pattern is necessary because executors can be running on the stack when they need to be invalidated. The eval loop unlinks them eagerly but waits for the call chain to unwind before reclaiming the memory.

Watchers

The same watcher mechanism that drives Tier-1 specialization serves Tier-2: type version tags and dict keys versions. An executor that depends on a tag is invalidated when the tag moves.

JIT

When --enable-experimental-jit was used at build time, the uop trace is fed to a tiny JIT in jit.c that emits machine code using the copy-and-patch technique from PEP 744. The output is a thin shim per uop that calls the same C helper the interpreted form would call, but inlined and pipelined.

When the JIT is off, the trace runs on the uop interpreter in executor_cases.c.h, which is the same generator that produces Tier-1's dispatch loop but applied to the uop set.

Reading order

Monitor covers sys.monitoring, which lives on top of both Tier-1 and Tier-2 through the instrumented opcode set.