Skip to main content

1626. gopy codegen

Port of cpython/Python/codegen.c (6483 lines) to gopy/compile/codegen.go. This spec is the detailed source-of-truth for section 6 of 1620. The 1620 file keeps the cross-cutting view; the per-visitor and per-opcode detail lives here.

What codegen does

Codegen takes a fully-resolved scope (a symtable.Entry) plus the AST nodes that belong to that scope, and produces an instrseq.Sequence of labelled instructions. It does not produce a finished code object; that is the assembler's job (1628). It does not run any optimisation passes; that is the flowgraph's job (1627).

Boundary contract:

Input:
ast.Mod plus per-scope ast nodes (FunctionDef, ClassDef, Lambda, ...)
*symtable.Table (so each Name lookup resolves to LOCAL/CELL/FREE/GLOBAL)
*future.Features (annotations PEP 649 and division flags)
*compileContext (filename, optimisation level, c_flags)

Output (per scope):
*instrseq.Sequence (labelled, no jump deltas yet)
unitState: name, qualname, scope kind, argcount, posonly, kwonly,
firstlineno, fblock stack at end, free-vars list,
cell-vars list, fasthidden bitset, deferred-annotation list

The driver (1620 / compile/compiler.go) walks the symtable top-down and calls codegen once per Entry, collecting one Sequence per scope.

File layout

compile/codegen.go is large enough to be split into focused sibling files. Mirror the symtable/build_*.go pattern:

Go fileCPython linesContents
codegen.go1-250, 4933+unit struct, compiler struct, entry/leave scope, helper addops, public entry points
codegen_stmt.go2991-3110visitStmt plus the simple statement visitors (Pass / Break / Continue / Delete / Assert / etc.)
codegen_stmt_funclike.go1311-1727Function / Lambda / Class / TypeAlias bodies; closure construction; type-param bodies
codegen_stmt_control.go2043-2289If / For / AsyncFor / While / Return
codegen_stmt_try.go2293-2792Try / TryStar / Finally / Except / unwind_fblock_stack
codegen_stmt_with.go4940-5172With / AsyncWith / with_except_finish state machine
codegen_stmt_match.go5736-6473Match plus all 11 pattern visitors (or, as, mapping, sequence, class, singleton, value, etc.)
codegen_stmt_import.go2793-2931Import / ImportFrom / star-import emit
codegen_expr.go5172-5345visitExpr dispatch
codegen_expr_simple.go3290-3623BoolOp / BinOp / UnaryOp / List / Tuple / Set / Dict / Compare / IfExp
codegen_expr_call.go4017-4769Call / Keyword / kwargs splat / star-args panel
codegen_expr_str.go4061-4196JoinedStr / TemplateStr / FormattedValue / Interpolation
codegen_expr_name.go3179-3289nameop (LOAD_FAST / STORE_NAME / DELETE_DEREF / LOAD_GLOBAL / etc.)
codegen_expr_ann.go5420-5546Annotation expressions, AnnAssign, deferred panel
codegen_expr_sub.go5547-5669Subscript / Slice / two-part slice
codegen_comp.go4770-4932sync/async comprehension generators, genexp/listcomp/setcomp/dictcomp drivers
codegen_aug.go5345-5419AugAssign panel
codegen_fblock.go518-647Frame block stack: types, push/pop, unwind
codegen_addop.go254-461addop_i / addop_j / addop_name / addop_o / addop_load_const
codegen_anno.go666-846Annotation scope setup / leave / deferred body / process deferred
codegen_helpers.go1810-1978jump_if, addcompare, check_compare, infer_type
codegen_pattern.go5728-6353Pattern helpers and dispatch

Each file gets a header: // Port of cpython/Python/codegen.c L<a>-L<b>. Every exported and unexported function gets the standard // CPython: codegen.c:L<n> codegen_<name> citation.

Public surface

package compile

// Codegen drives the per-scope visitor. The caller (1620 driver) walks
// the symtable and invokes Codegen once per entry. unitOut is filled in
// with everything the assembler needs in 1628.
func Codegen(c *Compiler, sc *symtable.Entry, mod ast.Mod) (*Unit, error)

// Unit is the per-scope handoff. Equivalent to CPython's compiler_unit
// minus the bookkeeping the flowgraph and assemble stages own.
type Unit struct {
Name string
Qualname string
ScopeType symtable.BlockType
Argcount int
PosOnlyArgCount int
KwOnlyArgCount int
FirstLineno int
Flags uint32 // CO_OPTIMIZED, CO_NEWLOCALS, CO_VARARGS, ...
Seq *instrseq.Sequence
Consts []any // ordered, deduped by EqualConst
Names []string
VarNames []string
FreeVars []string
CellVars []string
FastHidden map[string]bool
DeferredAnnotations []deferredAnnotation
}

// CompileFlags re-exports the bits the codegen needs from co_flags.
const (
CO_OPTIMIZED = 0x0001
CO_NEWLOCALS = 0x0002
CO_VARARGS = 0x0004
CO_VARKEYWORDS = 0x0008
CO_NESTED = 0x0010
CO_GENERATOR = 0x0020
CO_COROUTINE = 0x0100
CO_ITERABLE_COROUTINE = 0x0200
CO_ASYNC_GENERATOR = 0x0400
CO_HAS_DOCSTRING = 0x4000000
CO_METHOD = 0x8000000
)

Compiler is the long-lived driver state (filename, optimisation level, future flags, the symtable). unit is per-scope and stacked: a nested function pushes a fresh unit, emits its body, pops, and the outer scope receives a MAKE_FUNCTION referencing the new code object slot.

Frame block stack

CPython tracks unwinding for break / continue / return / try through a stack of fblockinfo. Each entry tags the kind of frame (LOOP, TRY, FINALLY, WITH, ASYNC_WITH, EXCEPTION_HANDLER, EXCEPTION_GROUP_HANDLER, HANDLER_CLEANUP, POP_VALUE) and carries jump targets and a datum slot for the original AST node.

Go form:

type fblockKind int

const (
fblockWhileLoop fblockKind = iota + 1
fblockForLoop
fblockTryExcept
fblockFinallyTry
fblockFinallyEnd
fblockWith
fblockAsyncWith
fblockHandlerCleanup
fblockPopValue
fblockExceptionHandler
fblockExceptionGroupHandler
fblockAsyncComprehensionGenerator
fblockStopIteration
)

type fblock struct {
Kind fblockKind
Block instrseq.Label // entry label
Exit instrseq.Label // exit label
Datum ast.Node // original AST node, for codegen_unwind_fblock
Generator bool // for-loop hoists DELETE_FAST of iter var
}

type unit struct {
// ...
fblocks []fblock
}

Push and pop never reorder; unwindFblockStack walks fblocks from top down emitting POP_BLOCK / POP_TOP / END_FINALLY as appropriate. Mirror codegen.c:518-647 line-for-line.

Statement visitor coverage

Every ast.Stmt kind must dispatch through visitStmt. The list:

  • FunctionDef (1390)
  • AsyncFunctionDef (1390 with is_async=1)
  • ClassDef (1623)
  • TypeAlias (1727)
  • Return (2191)
  • Delete (2880 path inside visit_stmt)
  • Assign (in visit_stmt 3060+)
  • AugAssign (5346)
  • AnnAssign (5476)
  • For (2071)
  • AsyncFor (2117)
  • While (2165)
  • If (2043)
  • With (5167)
  • AsyncWith (5070)
  • Match (6459)
  • Raise (codegen_raise inside visit_stmt)
  • Try (2774)
  • TryStar (2782)
  • Assert (2932)
  • Import (2835)
  • ImportFrom (2881)
  • Global (no-op at codegen)
  • Nonlocal (no-op at codegen)
  • ExprStmt (2962)
  • Pass (no-op)
  • Break (2232)
  • Continue (2248)

The dispatch in visitStmt is a switch on the concrete type of ast.Stmt. Match the order in codegen.c:2991-3166 for cite-friendly diffs. Global and Nonlocal are no-ops in codegen because symtable already lifted them; document this with a // CPython: codegen.c:Lxxxx no-op, scope already resolved by symtable line.

Expression visitor coverage

visitExpr switches on ast.Expr. The list:

  • BoolOp (3290)
  • NamedExpr (walrus, in visit_expr 5174)
  • BinOp (in visit_expr 5180+)
  • UnaryOp (in visit_expr)
  • Lambda (1999)
  • IfExp (1979)
  • Dict (3497)
  • Set (3467)
  • ListComp (4901)
  • SetComp (4911)
  • DictComp (4922)
  • GeneratorExp (4891)
  • Await (in visit_expr)
  • Yield (visit_expr + addop_yield 3168)
  • YieldFrom (visit_expr + add_yield_from 472)
  • Compare (3552)
  • Call (4036)
  • FormattedValue (4165)
  • Interpolation (4133)
  • JoinedStr (4104)
  • TemplateStr (4061)
  • Constant (in visit_expr)
  • Attribute (in visit_expr; LOAD_ATTR / LOAD_METHOD selection)
  • Subscript (5548)
  • Starred (only in target context; raises in load context)
  • Name (3186)
  • List (3431)
  • Tuple (3449)
  • Slice (5609)

Special panels

LOAD/STORE name selection (codegen_nameop 3186)

The hottest helper. Matches CPython exactly:

symtable scopectx=Loadctx=Storectx=Del
Local (function)LOAD_FASTSTORE_FASTDELETE_FAST
Local (module)LOAD_NAMESTORE_NAMEDELETE_NAME
Local (class)LOAD_NAMESTORE_NAMEDELETE_NAME
CellLOAD_DEREFSTORE_DEREFDELETE_DEREF
FreeLOAD_DEREFSTORE_DEREFDELETE_DEREF
GlobalImplicitLOAD_GLOBALSTORE_GLOBALDELETE_GLOBAL
GlobalExplicitLOAD_GLOBALSTORE_GLOBALDELETE_GLOBAL

Class-mediated free var (__class__, __classdict__, __conditional_annotations__) takes a special path that emits LOAD_DEREF plus LOAD_FROM_DICT_OR_DEREF. See codegen.c:3179-3287.

Super-instructions

CPython picks fused opcodes when arg shape matches:

  • LOAD_FAST_LOAD_FAST after LOAD_FAST then LOAD_FAST
  • STORE_FAST_LOAD_FAST after STORE_FAST then LOAD_FAST
  • STORE_FAST_STORE_FAST after two STORE_FAST in a row
  • LOAD_CONST_IMMORTAL for cached small ints / interned strings
  • LOAD_FAST_BORROW, LOAD_FAST_BORROW_LOAD_FAST_BORROW for known-non-escape reads

The selection happens in instr_sequence write-back, not in codegen visitors. Defer the implementation to the flowgraph (1627) but document the contract so codegen does not pre-fuse.

MAKE_FUNCTION / closure construction (codegen_make_closure 923)

MAKE_FUNCTION takes the code object on the stack and a flags oparg. The visitor fills co_freevars, defaults, kwdefaults, and annotations onto the stack first per the CPython oparg spec:

  • bit 0x01: defaults tuple
  • bit 0x02: kwonly defaults dict
  • bit 0x04: annotations function (PEP 649) or annotation dict (legacy)
  • bit 0x08: closure cell tuple

Closure construction:

  1. For each name in co_freevars of the inner code, push the matching cell from the outer scope: LOAD_FAST if Cell, LOAD_DEREF if Free.
  2. BUILD_TUPLE with that count.
  3. MAKE_FUNCTION with bit 0x08 set.

CPython: codegen.c:923-961.

Cell / free-var prologue (MAKE_CELL / COPY_FREE_VARS)

At function entry, before the first user instruction:

  • For each name in co_cellvars whose flag has DEF_PARAM, emit MAKE_CELL to box the parameter slot.
  • If co_freevars is non-empty, emit COPY_FREE_VARS n to copy the cell tuple from the function object into the local cells.

This is emitted by codegen_function_body after RESUME 0 and before the docstring / first body statement. CPython: codegen.c:1311-1389.

Match panel

Match plus the eleven Pattern* kinds. Use a patternContext struct to track:

  • stores: names bound by patterns in the current alternative (so or branches can verify identical names).
  • allowIrrefutable: false at the top of or, true elsewhere.
  • failPop: number of values pushed by sub-patterns that need POP on fail.
  • onTop: number of values pushed by the dispatcher that the pattern must consume.
type patternContext struct {
Stores []string
AllowIrrefutable bool
FailPop []instrseq.Label
OnTop int
}

Each pattern visitor (codegen_pattern_*) is mechanical. The hard ones:

  • PatternMatchOr: alternatives with shared bindings, fail-pop unification
  • PatternMatchClass: positional / keyword via MATCH_CLASS and __match_args__
  • PatternMatchMapping: MATCH_MAPPING plus MATCH_KEYS, rest ** capture
  • PatternMatchSequence: MATCH_SEQUENCE, length check, slice-out star

CPython: codegen.c:5728-6473. Tests under compile/codegen_match_test.go.

With statement state machine

with and async with both lower to:

SETUP_WITH -> push exit fblock
<context expr>
CALL on __enter__
<body>
LOAD_CONST None x3
CALL on __exit__
... or on exception path:
WITH_EXCEPT_START
... etc.

The state machine tracks how many context managers are open and how many to clean up on exception. Both with and async-with use codegen_with_inner recursively per item. CPython: codegen.c:4940-5172.

Deferred annotations (PEP 649)

When from __future__ import annotations is not set and the ANNOTATIONS feature is the 3.14 default, function and class annotations compile to a separate inner code object that runs lazily on __annotations__ access. The visitor records these in unit.DeferredAnnotations; the driver emits one __annotations__ function per outer scope at end-of-block. CPython: codegen.c:666-846, 1081-1145, 5476-5546.

PEP 695 type parameters

class C[T, *Ts, **P]: and def f[T](...): and type X[T] = ... each create an extra inner scope (TypeParametersBlock) that:

  1. Builds the TypeVar / TypeVarTuple / ParamSpec objects.
  2. BUILD_TUPLE of the type-param objects.
  3. Calls into the actual function or class body with the tuple as a first positional arg.
  4. The outer code receives the tuple via LOAD_FAST .type_params.

CPython: codegen.c:1195-1310, 1505-1622, 1700-1810.

Comprehensive test plan

Tests live in compile/codegen_*_test.go. Each test file mirrors the visitor file. Tests at this layer use hand-built AST inputs and assert the produced instruction sequence against a golden form printed by a disassembler that we ship alongside (compile/dis.go, task #51).

Three test layers:

Layer 1: Unit (no CPython dependency)

For every visitor, two tests minimum:

  • a happy-path AST that exercises every branch of the visitor
  • a syntax-error input that asserts the exact error message string
func TestCodegenForLoop(t *testing.T) {
// for i in [1, 2]: pass
src := module(forStmt(...))
code := compileMust(t, src)
want := []string{
"RESUME 0",
"LOAD_CONST (1, 2)",
"GET_ITER",
"FOR_ITER L1",
"STORE_FAST i",
"JUMP L0",
"L1: END_FOR",
"L2: LOAD_CONST None",
"RETURN_VALUE",
}
assertDis(t, code, want)
}

Coverage table (every checkbox below maps one visitor + one test):

VisitorTest fileCases
visit_stmt FunctionDefcodegen_func_test.goempty body, single return, generator, async, decorators, defaults, kwonly, posonly, varargs, varkw, type-params
visit_stmt ClassDefcodegen_class_test.goempty class, with bases, with kwargs, with decorators, with type-params, with init_subclass
visit_stmt TypeAliascodegen_typealias_test.gosimple alias, with type-params, defaults, bound
visit_stmt Returncodegen_return_test.gobare return, return value, return in generator (raises)
visit_stmt Assigncodegen_assign_test.gosingle, multi-target, tuple-unpack, starred-unpack, attribute, subscript
visit_stmt AugAssigncodegen_aug_test.goname, attr, subscript, every binop
visit_stmt AnnAssigncodegen_annassign_test.gowith value, without value, simple vs not, deferred
visit_stmt Forcodegen_for_test.gotuple unpack target, with else, with break, with continue, nested
visit_stmt AsyncForcodegen_asyncfor_test.gobasic, with else, in async function
visit_stmt Whilecodegen_while_test.gobasic, with else, with break, with continue
visit_stmt Ifcodegen_if_test.goif-only, if-else, elif chain, constant fold
visit_stmt Withcodegen_with_test.gosingle ctx, multiple, with as, nested
visit_stmt AsyncWithcodegen_asyncwith_test.gosingle, multiple, exception path
visit_stmt Matchcodegen_match_test.goone per pattern kind, or-pattern, guard, capture, irrefutable check
visit_stmt Raisecodegen_raise_test.gobare, exc, exc from
visit_stmt Trycodegen_try_test.gotry-except, try-finally, try-except-finally, try-except-else, multiple handlers, bare except
visit_stmt TryStarcodegen_trystar_test.gobasic, multiple handlers, with finally
visit_stmt Assertcodegen_assert_test.gobare, with msg, optimised away under -O
visit_stmt Importcodegen_import_test.gosimple, dotted, as
visit_stmt ImportFromcodegen_importfrom_test.goname, dotted, star, as
visit_stmt Breakcodegen_break_test.goin loop, in nested loop, in try-finally
visit_stmt Continuecodegen_continue_test.goin loop, in nested loop, in try-finally
visit_stmt ExprStmtcodegen_exprstmt_test.godocstring placement, plain expr (POP_TOP), interactive (PRINT_EXPR)
visit_expr BoolOpcodegen_boolop_test.goand / or, two operands, three operands, short-circuit
visit_expr NamedExpr (walrus)codegen_walrus_test.gobasic, in comprehension, in lambda, retarget
visit_expr BinOpcodegen_binop_test.goevery op, type-inferred specialization
visit_expr UnaryOpcodegen_unaryop_test.gonot / -/~/+
visit_expr Lambdacodegen_lambda_test.goempty, with args, with defaults
visit_expr IfExpcodegen_ifexp_test.gobasic, nested
visit_expr Dictcodegen_dict_test.goempty, simple, with double-star, mixed
visit_expr Setcodegen_set_test.goempty (raises, list comp instead), simple, with star
visit_expr ListCompcodegen_listcomp_test.gobasic, with if, with multiple for, with walrus
visit_expr SetCompcodegen_setcomp_test.gobasic, with if, nested
visit_expr DictCompcodegen_dictcomp_test.gobasic, with if
visit_expr GeneratorExpcodegen_genexp_test.gobasic, with if, async
visit_expr Awaitcodegen_await_test.goin async, error in sync
visit_expr Yieldcodegen_yield_test.gobare, with value, in async generator
visit_expr YieldFromcodegen_yieldfrom_test.gobasic, in coroutine raises
visit_expr Comparecodegen_compare_test.goone op, chained, every cmpop
visit_expr Callcodegen_call_test.gopositional, keyword, star, doublestar, method, super, every CALL_INTRINSIC variant
visit_expr FormattedValuecodegen_fstring_test.goconv flags, format spec, in joinedstr
visit_expr Interpolationcodegen_interp_test.gonew in 3.14: t-string
visit_expr JoinedStrcodegen_joinedstr_test.goempty, single, multi, mixed const+formatted
visit_expr TemplateStrcodegen_templatestr_test.got-string PEP 750 panel
visit_expr Constantcodegen_const_test.goint, float, str, bytes, None, True, False, complex
visit_expr Attributecodegen_attribute_test.goLOAD_ATTR vs LOAD_METHOD, store, del
visit_expr Subscriptcodegen_subscript_test.goint, slice, tuple, store, del
visit_expr Starredcodegen_starred_test.goin target, in call, in load context (raises)
visit_expr Namecodegen_name_test.goLOCAL/CELL/FREE/GLOBAL_EXPLICIT/GLOBAL_IMPLICIT, class scope mediation, class free var
visit_expr Listcodegen_list_test.goempty, single, with star
visit_expr Tuplecodegen_tuple_test.goempty, single, all-const folds to LOAD_CONST tuple
visit_expr Slicecodegen_slice_test.goone part, two parts, three parts

Layer 2: Cross-check vs CPython

Build the same AST in CPython via ast.parse, compile with compile(), and assert dis.dis text equality with our output. Tagged //go:build cpython. One driver test per visitor coverage row.

Layer 3: Marshal parity

marshal.dumps(code) byte-equal to gopy/marshal.Dumps(unit.Code) for ~50 hand-picked source snippets covering every opcode emitted in 3.14. This layer crosses the assemble boundary (1628), so it lives in compile/marshal_parity_test.go and depends on tasks #47, #48, #49.

Lint, complexity, and refactor budget

Same rules as symtable: cognitive ≤30, cyclomatic ≤20. The largest CPython visitors that overflow if ported as-is:

  • codegen_visit_stmt (170 lines, 27 cases): split by category matching symtable/build_visit.go (visitStmtDef, visitStmtControl, visitStmtSimple).
  • codegen_visit_expr (170 lines, 28 cases): split into visitExprComp, visitExprBuild, visitExprLeaf.
  • codegen_pattern_class (60+ lines, deeply nested): extract patternClassPositional, patternClassKeyword, patternClassFinish.
  • codegen_try_except (180 lines): extract tryExceptHandlerEntry, tryExceptHandlerBody, tryExceptCleanup.
  • codegen_with_inner and codegen_async_with_inner: extract the exit-handler emit into a helper used by both.

Use the same helper-function naming as symtable/build_visit.go: verbs that describe the work, not the position in the file.

Citation policy

Every Go function carries // CPython: codegen.c:L<n> codegen_<name>. Helpers extracted to satisfy lint carry both: // CPython: codegen.c:L<n> codegen_<name> (extracted helper) so a reader can trace back to the un-split source.

Order of work

  1. Skeleton: Codegen entry, Unit struct, Compiler driver, addop helpers, name-op dispatch (LOAD_FAST / LOAD_DEREF / LOAD_GLOBAL / LOAD_NAME by symtable scope). Tests assert the error path so the harness compiles. Landed as compile/codegen.go, compile/codegen_addop.go, compile/codegen_stmt.go, compile/codegen_expr.go, compile/codegen_expr_name.go.
  2. Pass, ExprStmt, Constant, Name (LOAD/STORE), Return, Assign. Smallest module compiles end-to-end. compile/codegen_test.go covers empty module, pass, expr-stmt pop, module-level assign, name load, const dedup. fblock stack stub will land alongside step 3.
  3. Control flow: If, For, While, Break, Continue. Landed as compile/codegen_fblock.go, compile/codegen_stmt_control.go, compile/codegen_control_test.go. Break / continue out-of-loop error paths covered.
  4. Functions: FunctionDef, AsyncFunctionDef, Lambda, defaults, kwonly defaults, varargs / varkeyword flags, decorator chain, closure (free / cell vars). Inner code object held as a *Unit const placeholder; assemble (1628) translates it to a real code object. Landed as compile/codegen_stmt_funclike.go, compile/codegen_funclike_test.go.
  5. Expression panel: BoolOp, BinOp, UnaryOp, Compare, IfExp, List, Tuple, Set, Dict, Attribute, Subscript, Slice, Call. Landed as compile/codegen_expr_op.go, compile/codegen_expr_container.go, compile/codegen_expr_call.go, compile/codegen_expr_test.go.
  6. Misc statements: Delete, AugAssign, AnnAssign, Raise, Assert, Import, ImportFrom. Landed as compile/codegen_stmt_misc.go and compile/codegen_stmt_misc_test.go.
  7. Misc expressions: NamedExpr, Yield, YieldFrom, Await, JoinedStr, FormattedValue. Landed as compile/codegen_expr_misc.go and compile/codegen_expr_misc_test.go.
  8. Assignment targets: Attribute, Subscript, Tuple / List unpack (UNPACK_SEQUENCE), Tuple / List with *rest (UNPACK_EX). Landed as compile/codegen_assign_test.go plus the extension to assignTo in compile/codegen_stmt.go.
  9. Classes: ClassDef, bases, keyword args (metaclass), decorator chain. Inner body opens with name/module + qualname prologue; full PEP 695 / classcell / static-attributes panels land alongside super(). Landed as compile/codegen_class.go and compile/codegen_class_test.go.
  10. Comprehensions: ListComp, SetComp, DictComp, GeneratorExp.
  11. With / Try / TryStar: full unwind panel.
  12. Match: pattern visitors.
  13. PEP 695 type parameters.
  14. Deferred annotations (PEP 649).
  15. Super-instruction emission contract with the flowgraph.

Each step lands as one PR with the matching test row from the table above ticked. The codegen package is not "done" until every checkbox in the visitor coverage table is green and Layer 2 cross-check passes.