Skip to main content

Compiler

The compiler walks a validated AST and emits one instruction sequence per scope. Each sequence is a flat array of pseudo-instructions plus side tables: constants, names, free variables, cell variables, exception ranges. The result feeds the flow-graph pass.

Source map

FileRole
Python/compile.cThe codegen pass.
Python/instruction_sequence.cThe growable list of pseudo-instructions.
Include/internal/pycore_compile.hPublic-internal API.
Python/bytecodes.cThe opcode semantics (used by the generator).
Python/opcode_ids.hNumeric opcode IDs.

Entry point

PyCodeObject *
_PyAST_Compile(mod_ty mod, PyObject *filename,
PyCompilerFlags *flags, int optimize,
PyArena *arena);

Inside, the work splits into:

  1. _PySymtable_Build -- name resolution.
  2. compiler_init -- allocate the per-scope compile state.
  3. compiler_codegen -- the recursive walk.
  4. _PyCfg_OptimizeCodeUnit -- the flow-graph pass.
  5. _PyAssemble_MakeCodeObject -- the assembler.

This page covers stage 3. The flow graph and the assembler have their own pages.

Compile state

struct compiler carries everything that survives one scope:

  • The current _PyCompile_CodeUnit, which holds the instruction sequence, the constants list, the names list, the variable lists, the cell / free lists, the source line table builder, the exception block stack.
  • A pointer to the symbol table.
  • The compile flags (the __future__ set, the optimization level).
  • A stack of nested scopes for re-entrancy.

compiler_enter_scope pushes a new code unit; compiler_exit_scope pops it and emits the inner code object as a constant into the outer scope's constants list.

The recursive walk

compiler_visit_stmt(c, stmt) switches on stmt->kind and dispatches to one of:

  • compiler_function, compiler_async_function, compiler_class, compiler_lambda for declarations.
  • compiler_if, compiler_for, compiler_async_for, compiler_while, compiler_try, compiler_with, compiler_async_with, compiler_match for control flow.
  • compiler_assign, compiler_augassign, compiler_aug_target, compiler_delete for assignment and deletion.
  • compiler_return, compiler_raise, compiler_assert, compiler_break, compiler_continue, compiler_pass, compiler_import, compiler_global, compiler_nonlocal for simple statements.
  • compiler_visit_expr when the statement is an expression statement (for which the value is popped, except in REPL mode).

compiler_visit_expr(c, expr) switches on expr->kind and emits the bytecode that leaves the result on the stack. Examples:

  • BinOp(L, op, R) emits compile(L); compile(R); BINARY_OP op.
  • Call(f, args, kw) emits PUSH_NULL; compile(f); compile(args...); CALL n (or CALL_KW n when keywords are present).
  • Name(n) emits LOAD_FAST / LOAD_DEREF / LOAD_GLOBAL / LOAD_NAME based on the symbol table's classification.

Emitting instructions

compiler_addop(c, opcode) appends an instruction with no argument. compiler_addop_i(c, opcode, oparg) appends one with an inline argument. Larger arguments are encoded as EXTENDED_ARG prefixes; the compiler emits the prefix automatically.

Each emitted instruction also gets a location: the file, line, column, end line, end column. These come from the AST node being visited and feed the PEP 657 location table.

Control-flow scaffolding: f-blocks

CPython tracks "finally blocks" on a scope-level stack, called f-blocks. An entry on the stack describes how unwinding should behave when control exits the corresponding region. Entry kinds:

KindCreated byPurpose
LOOPfor / whilebreak / continue jumps.
TRY_EXCEPTtry:Catches the except arm.
FINALLYtry: finally:Ensures the finally body runs.
WITHwith / async withRuns __exit__ on cleanup.
HANDLER_CLEANUPinside exceptClears the bound exception.
EXCEPTION_GROUP_HANDLERinside except*Reconstructs the group on re-raise.

break, continue, return, raise, and unwind from an exception consult the f-block stack to emit cleanup code in the right order.

Naming the opcodes

The opcode catalogue is described in Python/bytecodes.c in a DSL that the generator (Tools/cases_generator/) consumes. The compiler refers to opcodes by their numeric IDs from opcode_ids.h. The opcodes are the same Tier-1 opcodes the eval loop sees. The assembler is in charge of laying out their inline caches.

Pseudo-opcodes

The compiler emits a small set of pseudo-opcodes that the flow-graph pass will resolve away:

  • JUMP, JUMP_NO_INTERRUPT, and the conditional jump pseudos (POP_JUMP_IF_*) are emitted with symbolic jump targets. Targets are basic-block IDs, not byte offsets.
  • SETUP_FINALLY / SETUP_CLEANUP / SETUP_WITH create exception-table entries; they do not exist as real bytecode.
  • LOAD_CLOSURE, MAKE_CELL, COPY_FREE_VARS mark cell-binding intent that the assembler will lay out.

Output

The output of stage 3 is the populated _PyCompile_CodeUnit for the current scope, plus zero or more child code objects already attached to its constants pool. Stage 4 (the flow graph) takes over from here.

Reading order

Flowgraph is next. Then Assembler.