ceval_macros.h

ceval_macros.h is included by both ceval.c and bytecodes.c. It defines the macro layer that sits between the C compiler and the generated opcode bodies: dispatch routing, computed-goto vs. switch vs. tail-call selection, stack pointer arithmetic, instruction pointer movement, and the LLTRACE debug path. No runtime symbols are exported — everything here is inlined by the preprocessor.

Map

Lines	Symbol	Role
50-60	`USE_COMPUTED_GOTOS`	Platform capability probe
63-71	`INSTRUCTION_STATS`	Per-opcode pair-count accumulator (stats build only)
81-128	`TARGET` / `DISPATCH_GOTO` / `JUMP_TO_LABEL`	Dispatch backend selection
130-136	`PRE_DISPATCH_GOTO`	LLTRACE hook before every dispatch
138-151	`LLTRACE_RESUME_FRAME`	Full-frame lltrace on frame entry
161-167	`DISPATCH`	Advance `next_instr`, then dispatch
169-174	`DISPATCH_SAME_OPARG`	Re-dispatch without advancing (specialization)
176-184	`DISPATCH_INLINED`	Inline call: swap frames, jump to `start_frame`
203-208	`NEXTOPARG`	Load next `opcode` + `oparg` atomically
214	`JUMPBY`	Relative branch by `x` instructions
217-218	`STACK_LEVEL` / `STACK_SIZE`	Stack depth arithmetic
229-230	`LOCALS_ARRAY` / `GETLOCAL`	Fast local variable access
351-361	`LOAD_IP` / `LOAD_SP` / `SAVE_SP`	Frame pointer sync helpers
366-415	`GOTO_TIER_TWO` / `GOTO_TIER_ONE`	Tier-1 / tier-2 switch points

Reading

Dispatch Backends

The file supports three dispatch strategies, selected at compile time:

Tail-call (Py_TAIL_CALL_INTERP): each opcode is a separate C function called with [[clang::musttail]] so the compiler emits a true tail call, keeping the stack flat. Clang 18+ / GCC 15+ only.
Computed goto (USE_COMPUTED_GOTOS): each opcode body ends with an indirect jump through opcode_targets[]. Classic "threaded code" giving ~15-20% speedup over a switch on most CPUs.
Switch (fallback): standard switch (opcode) for platforms without label-as-value support.

// CPython: Python/ceval_macros.h:161 DISPATCH
#define DISPATCH() \
    { \
        assert(frame->stackpointer == NULL); \
        NEXTOPARG(); \
        PRE_DISPATCH_GOTO(); \
        DISPATCH_GOTO(); \
    }

DISPATCH_SAME_OPARG skips NEXTOPARG, reusing the current oparg. Used by specialization stubs when they decide to re-enter the same opcode slot after mutating the inline cache.

Stack Pointer Convention

The eval loop keeps stack_pointer in a local variable that is not synced with frame->stackpointer except at call boundaries. This avoids a store/load on every push or pop:

// CPython: Python/ceval_macros.h:217 STACK_LEVEL
#define STACK_LEVEL()  ((int)(stack_pointer - _PyFrame_Stackbase(frame)))
#define STACK_SIZE()   (_PyFrame_GetCode(frame)->co_stacksize)

Individual opcodes generated by Tools/cases_generator directly index stack_pointer as stack_pointer[-1], stack_pointer[-2], etc., with explicit --stack_pointer or ++stack_pointer for pops and pushes. No PUSH/POP macros exist in 3.14 — the generator emits the arithmetic directly.

Instruction Pointer Movement

// CPython: Python/ceval_macros.h:214 JUMPBY
#define JUMPBY(x)    (next_instr += (x))

JUMPBY is used for forward branches (FOR_ITER end-of-loop, JUMP_FORWARD, etc.). NEXTOPARG (line 204) advances next_instr by 1 and loads the word atomically with FT_ATOMIC_LOAD_UINT16_RELAXED to be safe in free-threaded builds.

Eval-Breaker Polling

There is no single CHECK_EVAL_BREAKER macro in ceval_macros.h. Instead, opcode bodies in bytecodes.c poll directly:

// CPython: Python/bytecodes.c:158 eval_breaker poll in RESUME
if (_Py_atomic_load_uintptr_relaxed(&tstate->eval_breaker)
        & _PY_EVAL_EVENTS_MASK) {
    int err = _Py_HandlePending(tstate);
    ...
}

tstate->eval_breaker is a uintptr_t bitmask. The lower 8 bits (_PY_EVAL_EVENTS_MASK) hold event flags: GIL drop request, pending signals, async exception, GC scheduled, stop-the-world, and JIT invalidation. The upper bits hold a per-interpreter version counter used for global-variable inline cache validation. The RESUME opcode is the main polling site; SEND and FOR_ITER also poll for async generators.

LLTRACE Debug Tracing

// CPython: Python/ceval_macros.h:132 PRE_DISPATCH_GOTO
#ifdef Py_DEBUG
#define PRE_DISPATCH_GOTO() if (frame->lltrace >= 5) { \
    lltrace_instruction(frame, stack_pointer, next_instr, opcode, oparg); }
#else
#define PRE_DISPATCH_GOTO() ((void)0)
#endif

In a Py_DEBUG build, setting sys.lltrace to a level of 5 or higher triggers lltrace_instruction before every opcode dispatch. The field lives on _PyInterpreterFrame so each frame carries its own trace level independently.

gopy notes

gopy's eval loop in vm/eval_gen.go follows the computed-goto model conceptually: a large switch on opcode inside a for loop, with a continue at the end of each case to return to the top. Go's compiler cannot generate computed gotos, so the 15-20% branch-prediction gain is not available. Future versions may explore an indirect-function-call table via Go function values to approximate threaded code.

DISPATCH_INLINED maps to a recursive evalFrame call in gopy today. A true inline (reusing the same goroutine stack frame) is deferred.

The LLTRACE_RESUME_FRAME path is not ported. gopy provides its own instruction trace via a debug build tag and a TraceFunc hook on the thread state.

CPython 3.14 changes

Tail-call dispatch (Py_TAIL_CALL_INTERP) is new in 3.14. It requires Clang with preserve_none and musttail attributes. The existing computed-goto path remains the default on GCC and MSVC targets.
QSBR_QUIESCENT_STATE (line 154) was added for the free-threaded GIL-disabled build to advance the quiescent-state-based reclamation epoch at safe points.
DISPATCH_SAME_OPARG (line 169) replaces the old DISPATCH_GOTO usage inside specialization stubs, making the re-dispatch intent explicit.

Map​

Reading​

Dispatch Backends​

Stack Pointer Convention​

Instruction Pointer Movement​

Eval-Breaker Polling​

LLTRACE Debug Tracing​

gopy notes​

CPython 3.14 changes​

Map