ceval_macros.h
ceval_macros.h is included by both ceval.c and bytecodes.c. It defines the
macro layer that sits between the C compiler and the generated opcode bodies:
dispatch routing, computed-goto vs. switch vs. tail-call selection, stack pointer
arithmetic, instruction pointer movement, and the LLTRACE debug path. No runtime
symbols are exported — everything here is inlined by the preprocessor.
Map
| Lines | Symbol | Role |
|---|---|---|
| 50-60 | USE_COMPUTED_GOTOS | Platform capability probe |
| 63-71 | INSTRUCTION_STATS | Per-opcode pair-count accumulator (stats build only) |
| 81-128 | TARGET / DISPATCH_GOTO / JUMP_TO_LABEL | Dispatch backend selection |
| 130-136 | PRE_DISPATCH_GOTO | LLTRACE hook before every dispatch |
| 138-151 | LLTRACE_RESUME_FRAME | Full-frame lltrace on frame entry |
| 161-167 | DISPATCH | Advance next_instr, then dispatch |
| 169-174 | DISPATCH_SAME_OPARG | Re-dispatch without advancing (specialization) |
| 176-184 | DISPATCH_INLINED | Inline call: swap frames, jump to start_frame |
| 203-208 | NEXTOPARG | Load next opcode + oparg atomically |
| 214 | JUMPBY | Relative branch by x instructions |
| 217-218 | STACK_LEVEL / STACK_SIZE | Stack depth arithmetic |
| 229-230 | LOCALS_ARRAY / GETLOCAL | Fast local variable access |
| 351-361 | LOAD_IP / LOAD_SP / SAVE_SP | Frame pointer sync helpers |
| 366-415 | GOTO_TIER_TWO / GOTO_TIER_ONE | Tier-1 / tier-2 switch points |
Reading
Dispatch Backends
The file supports three dispatch strategies, selected at compile time:
- Tail-call (
Py_TAIL_CALL_INTERP): each opcode is a separate C function called with[[clang::musttail]]so the compiler emits a true tail call, keeping the stack flat. Clang 18+ / GCC 15+ only. - Computed goto (
USE_COMPUTED_GOTOS): each opcode body ends with an indirect jump throughopcode_targets[]. Classic "threaded code" giving ~15-20% speedup over a switch on most CPUs. - Switch (fallback): standard
switch (opcode)for platforms without label-as-value support.
// CPython: Python/ceval_macros.h:161 DISPATCH
#define DISPATCH() \
{ \
assert(frame->stackpointer == NULL); \
NEXTOPARG(); \
PRE_DISPATCH_GOTO(); \
DISPATCH_GOTO(); \
}
DISPATCH_SAME_OPARG skips NEXTOPARG, reusing the current oparg. Used by
specialization stubs when they decide to re-enter the same opcode slot after
mutating the inline cache.
Stack Pointer Convention
The eval loop keeps stack_pointer in a local variable that is not synced with
frame->stackpointer except at call boundaries. This avoids a store/load on
every push or pop:
// CPython: Python/ceval_macros.h:217 STACK_LEVEL
#define STACK_LEVEL() ((int)(stack_pointer - _PyFrame_Stackbase(frame)))
#define STACK_SIZE() (_PyFrame_GetCode(frame)->co_stacksize)
Individual opcodes generated by Tools/cases_generator directly index
stack_pointer as stack_pointer[-1], stack_pointer[-2], etc., with explicit
--stack_pointer or ++stack_pointer for pops and pushes. No PUSH/POP macros
exist in 3.14 — the generator emits the arithmetic directly.
Instruction Pointer Movement
// CPython: Python/ceval_macros.h:214 JUMPBY
#define JUMPBY(x) (next_instr += (x))
JUMPBY is used for forward branches (FOR_ITER end-of-loop, JUMP_FORWARD, etc.).
NEXTOPARG (line 204) advances next_instr by 1 and loads the word atomically
with FT_ATOMIC_LOAD_UINT16_RELAXED to be safe in free-threaded builds.
Eval-Breaker Polling
There is no single CHECK_EVAL_BREAKER macro in ceval_macros.h. Instead,
opcode bodies in bytecodes.c poll directly:
// CPython: Python/bytecodes.c:158 eval_breaker poll in RESUME
if (_Py_atomic_load_uintptr_relaxed(&tstate->eval_breaker)
& _PY_EVAL_EVENTS_MASK) {
int err = _Py_HandlePending(tstate);
...
}
tstate->eval_breaker is a uintptr_t bitmask. The lower 8 bits
(_PY_EVAL_EVENTS_MASK) hold event flags: GIL drop request, pending signals,
async exception, GC scheduled, stop-the-world, and JIT invalidation. The upper
bits hold a per-interpreter version counter used for global-variable inline cache
validation. The RESUME opcode is the main polling site; SEND and FOR_ITER also
poll for async generators.
LLTRACE Debug Tracing
// CPython: Python/ceval_macros.h:132 PRE_DISPATCH_GOTO
#ifdef Py_DEBUG
#define PRE_DISPATCH_GOTO() if (frame->lltrace >= 5) { \
lltrace_instruction(frame, stack_pointer, next_instr, opcode, oparg); }
#else
#define PRE_DISPATCH_GOTO() ((void)0)
#endif
In a Py_DEBUG build, setting sys.lltrace to a level of 5 or higher triggers
lltrace_instruction before every opcode dispatch. The field lives on
_PyInterpreterFrame so each frame carries its own trace level independently.
gopy notes
gopy's eval loop in vm/eval_gen.go follows the computed-goto model conceptually:
a large switch on opcode inside a for loop, with a continue at the end of
each case to return to the top. Go's compiler cannot generate computed gotos, so
the 15-20% branch-prediction gain is not available. Future versions may explore
an indirect-function-call table via Go function values to approximate threaded
code.
DISPATCH_INLINED maps to a recursive evalFrame call in gopy today. A true
inline (reusing the same goroutine stack frame) is deferred.
The LLTRACE_RESUME_FRAME path is not ported. gopy provides its own instruction
trace via a debug build tag and a TraceFunc hook on the thread state.
CPython 3.14 changes
- Tail-call dispatch (
Py_TAIL_CALL_INTERP) is new in 3.14. It requires Clang withpreserve_noneandmusttailattributes. The existing computed-goto path remains the default on GCC and MSVC targets. QSBR_QUIESCENT_STATE(line 154) was added for the free-threaded GIL-disabled build to advance the quiescent-state-based reclamation epoch at safe points.DISPATCH_SAME_OPARG(line 169) replaces the oldDISPATCH_GOTOusage inside specialization stubs, making the re-dispatch intent explicit.