Compiler
The compiler walks a validated AST and emits one instruction sequence per scope. Each sequence is a flat array of pseudo-instructions plus side tables: constants, names, free variables, cell variables, exception ranges. The result feeds the flow-graph pass.
Source map
| File | Role |
|---|---|
Python/compile.c | The codegen pass. |
Python/instruction_sequence.c | The growable list of pseudo-instructions. |
Include/internal/pycore_compile.h | Public-internal API. |
Python/bytecodes.c | The opcode semantics (used by the generator). |
Python/opcode_ids.h | Numeric opcode IDs. |
Entry point
PyCodeObject *
_PyAST_Compile(mod_ty mod, PyObject *filename,
PyCompilerFlags *flags, int optimize,
PyArena *arena);
Inside, the work splits into:
_PySymtable_Build-- name resolution.compiler_init-- allocate the per-scope compile state.compiler_codegen-- the recursive walk._PyCfg_OptimizeCodeUnit-- the flow-graph pass._PyAssemble_MakeCodeObject-- the assembler.
This page covers stage 3. The flow graph and the assembler have their own pages.
Compile state
struct compiler carries everything that survives one scope:
- The current
_PyCompile_CodeUnit, which holds the instruction sequence, the constants list, the names list, the variable lists, the cell / free lists, the source line table builder, the exception block stack. - A pointer to the symbol table.
- The compile flags (the
__future__set, the optimization level). - A stack of nested scopes for re-entrancy.
compiler_enter_scope pushes a new code unit; compiler_exit_scope
pops it and emits the inner code object as a constant into the
outer scope's constants list.
The recursive walk
compiler_visit_stmt(c, stmt) switches on stmt->kind and
dispatches to one of:
compiler_function,compiler_async_function,compiler_class,compiler_lambdafor declarations.compiler_if,compiler_for,compiler_async_for,compiler_while,compiler_try,compiler_with,compiler_async_with,compiler_matchfor control flow.compiler_assign,compiler_augassign,compiler_aug_target,compiler_deletefor assignment and deletion.compiler_return,compiler_raise,compiler_assert,compiler_break,compiler_continue,compiler_pass,compiler_import,compiler_global,compiler_nonlocalfor simple statements.compiler_visit_exprwhen the statement is an expression statement (for which the value is popped, except in REPL mode).
compiler_visit_expr(c, expr) switches on expr->kind and emits
the bytecode that leaves the result on the stack. Examples:
BinOp(L, op, R)emitscompile(L); compile(R); BINARY_OP op.Call(f, args, kw)emitsPUSH_NULL; compile(f); compile(args...); CALL n(orCALL_KW nwhen keywords are present).Name(n)emitsLOAD_FAST/LOAD_DEREF/LOAD_GLOBAL/LOAD_NAMEbased on the symbol table's classification.
Emitting instructions
compiler_addop(c, opcode) appends an instruction with no
argument. compiler_addop_i(c, opcode, oparg) appends one with an
inline argument. Larger arguments are encoded as EXTENDED_ARG
prefixes; the compiler emits the prefix automatically.
Each emitted instruction also gets a location: the file, line, column, end line, end column. These come from the AST node being visited and feed the PEP 657 location table.
Control-flow scaffolding: f-blocks
CPython tracks "finally blocks" on a scope-level stack, called f-blocks. An entry on the stack describes how unwinding should behave when control exits the corresponding region. Entry kinds:
| Kind | Created by | Purpose |
|---|---|---|
| LOOP | for / while | break / continue jumps. |
| TRY_EXCEPT | try: | Catches the except arm. |
| FINALLY | try: finally: | Ensures the finally body runs. |
| WITH | with / async with | Runs __exit__ on cleanup. |
| HANDLER_CLEANUP | inside except | Clears the bound exception. |
| EXCEPTION_GROUP_HANDLER | inside except* | Reconstructs the group on re-raise. |
break, continue, return, raise, and unwind from an
exception consult the f-block stack to emit cleanup code in the
right order.
Naming the opcodes
The opcode catalogue is described in Python/bytecodes.c in a DSL
that the generator (Tools/cases_generator/) consumes. The
compiler refers to opcodes by their numeric IDs from
opcode_ids.h. The opcodes are the same Tier-1 opcodes the eval
loop sees. The assembler is in charge of laying out their inline
caches.
Pseudo-opcodes
The compiler emits a small set of pseudo-opcodes that the flow-graph pass will resolve away:
JUMP,JUMP_NO_INTERRUPT, and the conditional jump pseudos (POP_JUMP_IF_*) are emitted with symbolic jump targets. Targets are basic-block IDs, not byte offsets.SETUP_FINALLY/SETUP_CLEANUP/SETUP_WITHcreate exception-table entries; they do not exist as real bytecode.LOAD_CLOSURE,MAKE_CELL,COPY_FREE_VARSmark cell-binding intent that the assembler will lay out.
Output
The output of stage 3 is the populated _PyCompile_CodeUnit for
the current scope, plus zero or more child code objects already
attached to its constants pool. Stage 4 (the flow graph) takes
over from here.