Skip to main content

ceval_macros.h

ceval_macros.h is included by both ceval.c and bytecodes.c. It defines the macro layer that sits between the C compiler and the generated opcode bodies: dispatch routing, computed-goto vs. switch vs. tail-call selection, stack pointer arithmetic, instruction pointer movement, and the LLTRACE debug path. No runtime symbols are exported — everything here is inlined by the preprocessor.

Map

LinesSymbolRole
50-60USE_COMPUTED_GOTOSPlatform capability probe
63-71INSTRUCTION_STATSPer-opcode pair-count accumulator (stats build only)
81-128TARGET / DISPATCH_GOTO / JUMP_TO_LABELDispatch backend selection
130-136PRE_DISPATCH_GOTOLLTRACE hook before every dispatch
138-151LLTRACE_RESUME_FRAMEFull-frame lltrace on frame entry
161-167DISPATCHAdvance next_instr, then dispatch
169-174DISPATCH_SAME_OPARGRe-dispatch without advancing (specialization)
176-184DISPATCH_INLINEDInline call: swap frames, jump to start_frame
203-208NEXTOPARGLoad next opcode + oparg atomically
214JUMPBYRelative branch by x instructions
217-218STACK_LEVEL / STACK_SIZEStack depth arithmetic
229-230LOCALS_ARRAY / GETLOCALFast local variable access
351-361LOAD_IP / LOAD_SP / SAVE_SPFrame pointer sync helpers
366-415GOTO_TIER_TWO / GOTO_TIER_ONETier-1 / tier-2 switch points

Reading

Dispatch Backends

The file supports three dispatch strategies, selected at compile time:

  1. Tail-call (Py_TAIL_CALL_INTERP): each opcode is a separate C function called with [[clang::musttail]] so the compiler emits a true tail call, keeping the stack flat. Clang 18+ / GCC 15+ only.
  2. Computed goto (USE_COMPUTED_GOTOS): each opcode body ends with an indirect jump through opcode_targets[]. Classic "threaded code" giving ~15-20% speedup over a switch on most CPUs.
  3. Switch (fallback): standard switch (opcode) for platforms without label-as-value support.
// CPython: Python/ceval_macros.h:161 DISPATCH
#define DISPATCH() \
{ \
assert(frame->stackpointer == NULL); \
NEXTOPARG(); \
PRE_DISPATCH_GOTO(); \
DISPATCH_GOTO(); \
}

DISPATCH_SAME_OPARG skips NEXTOPARG, reusing the current oparg. Used by specialization stubs when they decide to re-enter the same opcode slot after mutating the inline cache.

Stack Pointer Convention

The eval loop keeps stack_pointer in a local variable that is not synced with frame->stackpointer except at call boundaries. This avoids a store/load on every push or pop:

// CPython: Python/ceval_macros.h:217 STACK_LEVEL
#define STACK_LEVEL() ((int)(stack_pointer - _PyFrame_Stackbase(frame)))
#define STACK_SIZE() (_PyFrame_GetCode(frame)->co_stacksize)

Individual opcodes generated by Tools/cases_generator directly index stack_pointer as stack_pointer[-1], stack_pointer[-2], etc., with explicit --stack_pointer or ++stack_pointer for pops and pushes. No PUSH/POP macros exist in 3.14 — the generator emits the arithmetic directly.

Instruction Pointer Movement

// CPython: Python/ceval_macros.h:214 JUMPBY
#define JUMPBY(x) (next_instr += (x))

JUMPBY is used for forward branches (FOR_ITER end-of-loop, JUMP_FORWARD, etc.). NEXTOPARG (line 204) advances next_instr by 1 and loads the word atomically with FT_ATOMIC_LOAD_UINT16_RELAXED to be safe in free-threaded builds.

Eval-Breaker Polling

There is no single CHECK_EVAL_BREAKER macro in ceval_macros.h. Instead, opcode bodies in bytecodes.c poll directly:

// CPython: Python/bytecodes.c:158 eval_breaker poll in RESUME
if (_Py_atomic_load_uintptr_relaxed(&tstate->eval_breaker)
& _PY_EVAL_EVENTS_MASK) {
int err = _Py_HandlePending(tstate);
...
}

tstate->eval_breaker is a uintptr_t bitmask. The lower 8 bits (_PY_EVAL_EVENTS_MASK) hold event flags: GIL drop request, pending signals, async exception, GC scheduled, stop-the-world, and JIT invalidation. The upper bits hold a per-interpreter version counter used for global-variable inline cache validation. The RESUME opcode is the main polling site; SEND and FOR_ITER also poll for async generators.

LLTRACE Debug Tracing

// CPython: Python/ceval_macros.h:132 PRE_DISPATCH_GOTO
#ifdef Py_DEBUG
#define PRE_DISPATCH_GOTO() if (frame->lltrace >= 5) { \
lltrace_instruction(frame, stack_pointer, next_instr, opcode, oparg); }
#else
#define PRE_DISPATCH_GOTO() ((void)0)
#endif

In a Py_DEBUG build, setting sys.lltrace to a level of 5 or higher triggers lltrace_instruction before every opcode dispatch. The field lives on _PyInterpreterFrame so each frame carries its own trace level independently.

gopy notes

gopy's eval loop in vm/eval_gen.go follows the computed-goto model conceptually: a large switch on opcode inside a for loop, with a continue at the end of each case to return to the top. Go's compiler cannot generate computed gotos, so the 15-20% branch-prediction gain is not available. Future versions may explore an indirect-function-call table via Go function values to approximate threaded code.

DISPATCH_INLINED maps to a recursive evalFrame call in gopy today. A true inline (reusing the same goroutine stack frame) is deferred.

The LLTRACE_RESUME_FRAME path is not ported. gopy provides its own instruction trace via a debug build tag and a TraceFunc hook on the thread state.

CPython 3.14 changes

  • Tail-call dispatch (Py_TAIL_CALL_INTERP) is new in 3.14. It requires Clang with preserve_none and musttail attributes. The existing computed-goto path remains the default on GCC and MSVC targets.
  • QSBR_QUIESCENT_STATE (line 154) was added for the free-threaded GIL-disabled build to advance the quiescent-state-based reclamation epoch at safe points.
  • DISPATCH_SAME_OPARG (line 169) replaces the old DISPATCH_GOTO usage inside specialization stubs, making the re-dispatch intent explicit.