Skip to main content

1620. gopy compile pipeline

What we are porting

Twelve files from cpython/Python/ (~39k lines) form the compiler that turns an AST into a code object:

C sourceLinesGo target
asdl.c6ast/asdl.go
Python-ast.c (generated)18485ast/nodes_gen.go
ast.c1091ast/validate.go
ast_preprocess.c990ast/preprocess.go
ast_unparse.c1029ast/unparse.go
future.c119future/future.go
symtable.c3266symtable/symtable.go
instruction_sequence.c483compile/instrseq.go
codegen.c6483compile/codegen.go
flowgraph.c4165compile/flowgraph.go
assemble.c802compile/assemble.go
compile.c1753compile/compiler.go

The 1620 series runs after the parser hands us an AST and before the VM runs the bytecode. It is the longest spec series in the project and the one with the strongest source-shape parity requirement: the dis.dis output of a compiled module must match CPython 3.14 byte-for-byte.

The parser itself (PEG, in cpython/Parser/) is a separate spec series. v0.5 assumes parser.Parse(src) (ast.Mod, error) already exists.

Layering

parser.Parse (separate spec)


ast.Mod (asdl-typed tree)

ast.Validate, ast.Preprocess


future.FromAST ───► future flags bitmask


symtable.Build ───► per-scope symbol tables


compile.Compile
├─ codegen (per-scope instruction sequence)
├─ flowgraph (CFG, peephole optimizations)
└─ assemble (bytecode + line table + exception table)


*objects.Code

Each stage is independently testable. The gate test at the v0.5 boundary calls compile.Compile(parser.Parse("a = 1 + 2")) and asserts the disassembly matches CPython.

ast package

asdl.go

asdl.c is six lines: the macro-expanded _PyAsdl_Sequence_New for generic, identifier, and int sequences. In Go this is just a type parameter:

package ast

// Seq is the asdl_seq* equivalent. CPython stores sequence length and
// elements inline; we use a Go slice.
type Seq[T any] []T

No dedicated allocator. The arena lives in arena/ (v0.1).

nodes_gen.go

Python-ast.c is generated by Parser/asdl_c.py from Parser/Python.asdl. We do not port the C generator; we write a Go generator (tools/asdl_go) that consumes the same .asdl file and emits Go structs. Each AST node is a struct embedding a position. Sum types (mod, stmt, expr, expr_context, etc.) become interfaces with a sealed marker.

Examples:

type Pos struct{ Lineno, ColOffset, EndLineno, EndColOffset int }

type Mod interface{ isMod() }
type Module struct {
Pos
Body Seq[Stmt]
TypeIgnores Seq[TypeIgnore]
}
func (*Module) isMod() {}

type Stmt interface{ isStmt() }
type Assign struct {
Pos
Targets Seq[Expr]
Value Expr
TypeComment string
}
func (*Assign) isStmt() {}

type ExprContext int
const (
Load ExprContext = iota + 1
Store
Del
)

Constructors mirror _PyAST_*: ast.NewModule(body, typeIgnores, pos), ast.NewAssign(targets, value, typeComment, pos), etc.

The full enumeration of mod/stmt/expr/pattern/type_param/excepthandler sums plus the product types (arguments, arg, keyword, alias, withitem, match_case, type_ignore) lives in the generator output. Enumerating them here would duplicate the asdl source; treat Parser/Python.asdl as authoritative.

validate.go (_PyAST_Validate)

Pure tree walk that rejects malformed nodes:

  • Position sanity. lineno <= end_lineno. Column ranges within the surrounding line. No negative offsets unless the canonical "no position" marker (-1, -1, -1, -1).
  • Forbidden identifiers. None, True, False cannot appear as Name, arg.arg, keyword.arg, alias.name (except as the module name in from None import which is itself rejected elsewhere).
  • Constant.value constrained to the marshallable subset (int, float, complex, str, bytes, bool, None, Ellipsis, tuple, frozenset of the same).
  • Comprehensions have at least one generator.
  • expr_context consistency. Targets in Store, deletions in Del, reads in Load. Starred in Load only as a top-level Call arg or as an iterable element.
  • Match patterns shape: MatchClass.cls is a Name or Attribute, MatchMapping keys are constants or attribute lookups, etc.
  • Type parameter constraints (PEP 695): TypeVar.bound cannot be a starred expression.

Returns error (CPython sets PyErr; we return).

preprocess.go (_PyAST_Preprocess)

Three passes fused into one walk that mirrors astfold_* in ast_preprocess.c for CPython 3.14 (note: arithmetic constant folding on BinOp / UnaryOp / Compare / BoolOp of constants moved to flowgraph.c LOAD_CONST chain folding in 3.14; do NOT fold those here):

  1. Targeted constant folding. The 3.14 surface is small:
    • string % tuple printf-style format collapses to a JoinedStr when the format spec is supported (%s, %r, %a with optional width/precision) and the tuple has no Starred.
    • Name("__debug__") in Load context becomes Constant(!optimize).
    • MatchValue / MatchMapping keys allow -N and Const ± Const so user code like case -1 or case 1+2j still works after folding.
    • __debug__ and the format-fold respect syntax_check_only (no mutation when only running validation).
  2. PEP 765 finally checks. Walk every Try node and warn when a finally block contains return, break, or continue. Emitted as a SyntaxWarning via the v0.7 warnings module; v0.5 records the warnings on the compiler context.
  3. Body cleanup.
    • Under optimize >= 2, the leading docstring is dropped (replaced with Pass if it was the only statement).
    • If a fold produced a string-typed expression at body position 0, wrap it in a single-value JoinedStr so the docstring detector does not re-treat it as a docstring.

Annotation expressions under CO_FUTURE_ANNOTATIONS (PEP 563) are skipped: their text is captured by the unparser separately, the constant fold has no business descending.

unparse.go (_PyAST_Unparse)

Produces a Python source representation of an expression. Used for PEP 649 deferred-evaluation annotations: a def f(x: T) retains the unparsed annotation source string for the __annotations__ dict. Operator precedence drives parenthesization. Float infinity renders as 1e309 per CPython. Format strings, interpolations, template strings round-trip back to f-string / t-string syntax.

This is a literal port; the precedence table and per-node visitor order match ast_unparse.c 1:1.

future package

future.go ports future.c. Walks the module body past the docstring collecting from __future__ import x statements. Produces:

package future

type Features struct {
Bits uint32
Location ast.Pos
}

const (
Generators = 1 << iota // CO_FUTURE_GENERATORS (legacy, always on)
Division // CO_FUTURE_DIVISION
AbsoluteImport // CO_FUTURE_ABSOLUTE_IMPORT
WithStatement // CO_FUTURE_WITH_STATEMENT
PrintFunction // CO_FUTURE_PRINT_FUNCTION
UnicodeLiterals // CO_FUTURE_UNICODE_LITERALS
BarryAsBdfl // CO_FUTURE_BARRY_AS_BDFL
GeneratorStop // legacy, ignored
Annotations // CO_FUTURE_ANNOTATIONS (PEP 563)
)

func FromAST(mod ast.Mod) (*Features, error)

Errors mirror CPython: from __future__ import braces raises SyntaxError with the exact same string. Future imports after the first non-docstring/non-future statement raise SyntaxError. Unknown feature names raise SyntaxError with the unknown name interpolated.

symtable package

Port of symtable.c. _PySymtable_Build becomes symtable.Build.

package symtable

type Block int
const (
ModuleBlock Block = iota
FunctionBlock
ClassBlock
AnnotationBlock
TypeAliasBlock
TypeParametersBlock
TypeVariableBlock
)

type Scope int
const (
Local Scope = iota + 1
GlobalExplicit
GlobalImplicit
Free
Cell
)

// SymbolFlags packs the DEF_* and USE bits plus the resolved Scope.
type SymbolFlags uint32

const (
DefGlobal SymbolFlags = 1 << iota
DefLocal
DefParam
DefNonlocal
Use
DefFreeClass
DefImport
DefAnnot
DefCompIter
DefTypeParam
DefCompCell
)

type Entry struct {
Type Block
Name string
Symbols map[string]SymbolFlags
Children []*Entry
Varnames []string

// Per-block attributes mirroring ste_*.
Nested bool
Generator bool
Coroutine bool
Comprehension bool
Varargs bool
Varkeywords bool
ReturnsValue bool
NeedsClassClosure bool
NeedsClassDict bool

Loc ast.Pos
}

type Table struct {
Top *Entry
Blocks map[ast.Node]*Entry
Future *future.Features
}

func Build(mod ast.Mod, filename string, ff *future.Features) (*Table, error)

Two-phase algorithm:

  1. Block visit. Walk the AST building a tree of Entry. Each function/class/comprehension/lambda creates a child block. Every name use or definition records a flag bit on the enclosing block.
  2. Analysis. Bottom-up over the block tree, resolve each name to one of Local, GlobalExplicit, GlobalImplicit, Free, Cell. Free-vars in nested scopes mark their defining scope's slot as Cell. Class scopes mediate (they can contribute Cell for methods but their own free names skip the class scope).

Error parity is non-negotiable: the exact SyntaxError strings ("name 'x' is used prior to global declaration", "no binding for nonlocal 'x' found", "name 'x' is assigned to before global declaration", etc.) must match.

compile package

instrseq.go

Port of instruction_sequence.c. The pre-CFG instruction stream:

package compile

type Label int // -1 means unbound.

type Instr struct {
Op Opcode
Arg int32
Loc ast.Pos
// Exception handler info: index into the exc-handler stack at
// emit time, set during codegen, finalized during assemble.
ExcDepth int32
}

type Sequence struct {
Instrs []Instr
LabelMap []int // label id to instr index, len == NewLabel calls.
Nested []*Sequence
}

func (s *Sequence) NewLabel() Label
func (s *Sequence) UseLabel(l Label)
func (s *Sequence) Addop(op Opcode, arg int32, loc ast.Pos)
func (s *Sequence) Insert(idx int, in Instr)
func (s *Sequence) ApplyLabelMap()
func (s *Sequence) AddNested(child *Sequence)

Opcode is generated from CPython's Lib/opcode.py (or, equivalently, Include/internal/pycore_opcode.h). The opcode set is shared between v0.5 and v0.6; the generator lives in tools/opcodes_go and emits compile/opcodes_gen.go plus vm/opcodes_gen.go.

codegen.go

Detailed source-of-truth: 1626_gopy_codegen.md. The summary below covers the cross-cutting picture; per-visitor file split, fblock stack, match panel, with-statement state machine, deferred annotations, PEP 695 type-parameter codegen, and the comprehensive per-visitor test plan live in 1626.

Port of codegen.c (~6500 lines). Walks each scope's AST and emits into a Sequence. Per-node visitors mirror compiler_visit_*. The naming convention is genXxx for the visitor of Xxx:

func (c *compiler) genStmt(s ast.Stmt)
func (c *compiler) genExpr(e ast.Expr)
func (c *compiler) genCall(call *ast.Call)
// ... one per AST node kind.

Macro shorthands from CPython (ADDOP, ADDOP_I, ADDOP_LOAD_CONST, ADDOP_NAME, ADDOP_JUMP, ADDOP_COMPARE) become methods on *compiler that wrap Sequence.Addop.

Name resolution piggybacks on the symtable: the codegen looks up each Name in the current Entry.Symbols to pick LOAD_FAST / LOAD_DEREF / LOAD_GLOBAL / LOAD_NAME. Comprehensions inline into the parent stream when they share a scope, and emit a nested sequence otherwise.

The control-flow constructs (try/except/finally, with, async with, match, comprehensions) are line-for-line ports. Each block-leaving construct (return, break, continue, raise) honors the unwind protocol by emitting the right POP_BLOCK / RERAISE sequence; this is the trickiest part of codegen and is gated by a dedicated subset of the CPython test suite (test_compile, test_dis).

flowgraph.go

Detailed source-of-truth: 1627_gopy_flowgraph.md. The summary below lists the visible types and the optimisation set; per-pass file split, exact pass ordering, const-cache structure, super-instruction contract, optimizeLoadFast ref-stack panel, and the layered test plan live in 1627.

Port of flowgraph.c. Converts an instruction stream into a CFG of basic blocks, runs peephole optimizations, then converts back.

type Block struct {
Instrs []Instr
Succ []*Block
Pred []*Block
StackEntry int32
Reachable bool
}

type CFG struct {
Blocks []*Block
Entry *Block
}

func FromSequence(seq *Sequence) *CFG
func (g *CFG) Optimize()
func (g *CFG) ToSequence() *Sequence
func (g *CFG) ResolveJumps() // labels -> offsets

Optimizations (every one a CPython parity test target):

  • Constant folding on LOAD_CONST chains.
  • Jump threading (JUMP -> JUMP -> X becomes JUMP X).
  • Conditional jump propagation (POP_JUMP_IF_TRUE over a block that ends in another conditional jump).
  • Dead-code elimination (unreachable blocks and instructions after an unconditional terminator).
  • Stack-effect verification.
  • Push RESUME at function entry.
  • Insert LOAD_CONST None, RETURN_VALUE for fall-through returns.

The order and exact predicates are taken from flowgraph.c. Lints (gocognit, gocyclo) are exempted on the optimizer entry point because the CPython source structure is the contract.

assemble.go

Detailed source-of-truth: 1628_gopy_assemble.md. The summary below shows the public type and the four-step plan; the PEP 626 line-table format dispatcher, PEP 657 exception-table varint encoding, the localsplus / fastlocalskinds layout, and the marshal-parity test plan live in 1628.

Port of assemble.c. Lays out the final code object:

type Assembler struct {
Code []byte
LineTable []byte
ExcTable []byte
Consts []objects.Object
Names []string
}

func Assemble(scope *symtable.Entry, seq *Sequence, ff *future.Features) (*objects.Code, error)

Steps:

  1. Lay out instructions in order, expanding wide args via EXTENDED_ARG.
  2. Emit the line/column table in the adaptive varint format (PEP 626): short, one-line, long, none.
  3. Emit the exception table (PEP 657): start, end, target, stack_depth, lasti as varint deltas.
  4. Allocate constants, names, varnames, freevars, cellvars from the symtable.

The byte format is fixed by CPython 3.14 marshal, so the assembler is testable against dis.dis output and against marshal.loads of a CPython-produced .pyc.

compiler.go

Top-level compile.Compile:

func Compile(mod ast.Mod, filename string, optimize int) (*objects.Code, error)

Driver: ast.Validate -> ast.Preprocess -> future.FromAST -> symtable.Build -> per-scope codegen -> flowgraph.Optimize -> assemble.Assemble. Errors propagate as Go errors carrying the SyntaxError string and position.

Phasing inside v0.5

Because v0.5 is the largest single phase in the project, we split it into commits sized to land independently. Each commit ships with matched unit tests and is lint-clean. The cross-cut gate at v05test/gate_test.go exercises the full pipeline and is the last to land.

Full v0.5 checklist

Status legend: [x] done, [ ] pending, [~] partial / scaffold.

1. ast package

asdl runtime

  • ast/asdl.go: Seq[T], Len, Get, Set, NewSeq. Mirrors _Py_asdl_*_seq_new and asdl_seq_LEN/GET/SET.
  • ast/asdl_test.go: empty seq, get/set round-trip, nil Len.

Hand-written node skeleton (foundation for early ports)

  • ast/nodes.go: Pos, NoPos, sealed Mod/Stmt/Expr interfaces, Module, Interactive, Expression, FunctionType, TypeIgnore, ImportFrom, Alias, ExprStmt, Constant, IsDocString (mirrors _PyAST_GetDocString).

asdl-driven Go generator

  • tools/asdl_go/: parser for cpython/Parser/Python.asdl.
  • tools/asdl_go/: emitter for Go structs and sealed-interface marker methods. Collision rule renames stmt.Expr to ExprStmt and single-ctor sums (type_ignore.TypeIgnore) to <name>Node.
  • tools/asdl_go/main.go: CLI with -input, -output.
  • tools/asdl_go/main_test.go: small-fixture parse and emit shape tests plus a real-asdl smoke test.

Generated nodes (output of the generator)

  • ast/nodes_gen.go: full mod/stmt/expr/pattern/excepthandler sums, type_param sum (PEP 695), product types (arguments, arg, keyword, alias, withitem, match_case), expr_context and operator enums folded in (CPython collapses these into the same asdl file). Hand-written stubs in nodes.go retired; only Pos, NoPos, and IsDocString remain there.

Validation

  • ast/validate.go plus ast/validate_panel.go: Validate(mod ast.Mod) error mirroring _PyAST_Validate.
    • Position sanity (lineno ordering, no negative offsets except NoPos sentinel).
    • Forbidden identifier names (None, True, False) via validateName.
    • Constant.Value constrained to marshal-allowed kinds.
    • Comprehension non-empty generators (validateComprehension).
    • expr_context consistency via validateExprCtx (Name/Attribute/Subscript/Starred/List/Tuple ctx slots).
    • Starred allowed only as Call arg or iterable element (default validateExpr rejects; Call and validateLoadElts permit in Load context).
    • Match pattern shape rules (validatePattern, validatePatternSeq).
    • PEP 695 type-parameter constraints (validateTypeParams).
  • ast/validate_test.go: foundation panel (positions, ImportFrom level, constant kinds, nil rejection). Expands with later validators.

Preprocess

  • ast/preprocess.go: Preprocess(mod, opts) []Warning. Mirrors _PyAST_Preprocess for CPython 3.14: a single tree walk that does PEP 765 plus the small set of folds 3.14 keeps in the AST layer.
    • Full astfold walker descending every stmt / expr / pattern / type_param kind (parity with astfold_* cases in ast_preprocess.c).
    • PEP 765 finally-block control-flow checks (return, break, continue).
    • string % tuple printf-format fold to JoinedStr (via fold_binop).
    • Name("__debug__") substitution by Constant(!optimize) in Load context.
    • MatchValue / MatchMapping const folding (USub of Constant, Add/Sub of all-Constant operands).
    • Docstring removal under -OO (replaces sole-stmt docstring with Pass per remove_docstring).
    • Body re-wrap when a fold produces a leading string expr.
    • PEP 563 annotation skip: Arg.annotation, FunctionDef.returns, AsyncFunctionDef.returns, AnnAssign.annotation not visited when CO_FUTURE_ANNOTATIONS is set in the future bits.
    • syntax_check_only flag suppresses mutating folds.
    • [n] Arithmetic / Compare / BoolOp / tuple / frozenset constant folding lives in flowgraph.c LOAD_CONST chain folding for 3.14; not part of ast_preprocess.c. See section 7 below.
  • ast/preprocess_test.go: PEP 765 panel plus fold panels for format folding, __debug__, MatchValue numerics, docstring removal, PEP 563 annotation skip, syntax-check-only mode.

Unparse

  • ast/unparse.go: Unparse(expr Expr) (string, error).
    • Operator precedence parenthesization parity.
    • Float infinity rendered as 1e309.
    • f-string / t-string round-trip.
    • FormattedValue, Interpolation, JoinedStr, TemplateStr.
  • ast/unparse_test.go: golden-string panel matching CPython's ast.unparse output.

2. future package

  • future/future.go: Features, CO_FUTURE_* numeric flags matching Include/cpython/code.h, FromAST, SyntaxError error type, "not a chance" string parity for from __future__ import braces, "future feature %.100s is not defined" parity.
  • future/future_test.go: annotations, barry_as_FLUFL, no-op features, braces rejection, unknown rejection, docstring skip, stop-at-first-non-future, relative-import ignored, location tracking, Expression mode.

3. compile package: instruction sequence

  • compile/instrseq.go: Sequence, Instr, ExceptHandlerInfo, JumpTargetLabel, Opcode, MaxOpcode=511, MaxOparg=1<<30, NewLabel, UseLabel, Addop, Insert, AddNested, SetAnnotationsCode, ApplyLabelMap. Takes hasTarget predicate callback so it does not depend on opcode metadata yet.
  • compile/instrseq_test.go: label IDs, addop+UseLabel, idempotent ApplyLabelMap, non-jump preservation, Insert label shift, AddNested, except-handler resolution, SetAnnotationsCode panic, opcode/oparg range panics.

4. opcode generator and metadata

  • tools/opcodes_go/: parser for cpython/Lib/_opcode_metadata.py and Include/internal/pycore_opcode_metadata.h.
  • tools/opcodes_go/main.go: CLI emitting compile/opcodes_gen.go. vm/opcodes_gen.go waits on the v0.6 vm port.
  • compile/opcodes_gen.go: typed Opcode constants for the full 3.14 opmap; HasArg, HasConst, HasName, HasJump, HasFree, HasLocal, HasEvalBreak, HasDeopt, HasError, HasEscapes, HasExit, HasPure, HasPassthrough, HasOpargAnd1, HasErrorNoPop, HasNoSaveIp predicates; Name() lookup; HasTarget shorthand for the label resolver.
  • [~] compile/opcodes_metadata_gen.go: per-opcode stack-effect table (push/pop counts) and cache-line sizes. v0.5 hand-codes the stack-effect table inside flowgraph_stackdepth.go; the generated file lands when the bytecodes.c DSL generator does in v0.6.
  • compile/opcodes_test.go: opcode number cross-check, flag predicates, HasTarget panel, out-of-range Name.
  • Wire Sequence.ApplyLabelMap callers to pass compile.HasTarget.

5. symtable package

Data model

  • symtable/types.go: Block, Scope, SymbolFlags constants byte-equal to CPython (DEF_GLOBAL=1, DEF_LOCAL=2, DEF_PARAM=4, DEF_NONLOCAL=8, USE=0x10, etc., per Include/internal/pycore_symtable.h). Includes DEF_FREE_CLASS, DEF_IMPORT, DEF_ANNOT, DEF_COMP_ITER, DEF_TYPE_PARAM, DEF_COMP_CELL, SCOPE_OFFSET, ScopeMask. Block enum order matches _Py_block_ty. ComprehensionType enum byte-equal.
  • symtable/entry.go: Entry struct mirroring PySTEntryObject field-for-field, including the PEP 649 AnnotationBlock, PEP 695 MangledNames, CanSeeClassScope, HasConditionalAnnotations, InConditionalBlock, InUnevaluatedAnnotation, CompIterExpr, Method, HasDocstring. Directive {Name, Loc} records for global / nonlocal locations. Methods: IsFunctionLike, GetSymbol, GetScope.
  • symtable/table.go: Table struct (Top, Blocks, Future, Filename) with Lookup(key any). Map key is any because no ast.Node interface exists.
  • symtable/mangle.go: Mangle(private, name) / MaybeMangle(private, ste, name) PEP 8 private name mangling.

Build pass (block visit)

  • symtable/build.go: Build(mod ast.Mod, filename string, ff *future.Features) (*Table, error). Driver dispatches Module / Interactive / Expression bodies; FunctionType returns an explicit error to match CPython. builder carries table, cur, stack, private, filename, future, nextID. Helpers: enterBlock, enterExisting, exitBlock, addDef, addDefCtx, addDefHelper, checkName, lookup, lookupEntry, recordDirective, allowsTopLevelAwait (stub returns false), isAsyncDef.
  • symtable/build_visit.go: visitStmt plus per-kind helpers (visitFunctionLike, visitClassDef, visitTypeAlias, visitReturn, visitAssign, visitAnnAssign, visitForLike, visitWhile, visitIf, visitMatch, visitRaise, visitTryLike, visitAssert, visitWithLike, visitGlobal, visitNonlocal, checkImportFrom, maybeSetCoroutineForModule). Split into visitStmtDef / visitStmtControl / visitStmtSimple to satisfy gocyclo. funcLike struct + unpackFuncLike factor the FunctionDef / AsyncFunctionDef variants. annotationKey {parent, index} distinguishes adjacent annotations.
  • symtable/build_expr.go: visitExpr split into visitExprComp / visitExprUnary / visitExprLeaf. Per-kind visitName, visitYield, visitAwait, visitCall, visitLambda, raiseIfAnnotationBlock, raiseIfComprehensionBlock, visitFormattedValue, visitSliceParts. visitName recognises super and pulls __class__ into scope.
  • symtable/build_helpers.go: visitArguments, visitParams, visitArgAnnotations, visitAnnotations, visitAnnotation, visitAlias, visitExceptHandler, visitWithItem, visitMatchCase, visitPattern, visitPatternSeq, checkKwdPatterns, visitKeyword, visitTypeParam, visitTypeParamSubexpr, enterTypeParamBlock. Synthetic .type_params, .generic_base, .defaults, .kwdefaults slots match CPython.
  • symtable/build_comp.go: handleNamedExpr walrus retarget, extendNamedExprScope walking the stack reverse with the class / type-block rejections, handleComprehension / runComprehensionBody / comprehensionTypeFor for list / set / dict / generator forms, visitComprehensionIter, implicitArg for the .0 parameter.

Analysis pass

  • symtable/analyze.go: top-level analyze plus analyzeBlock, analyzeChildBlock, analyzeName, analyzeCells, dropClassFree, updateSymbols, inlineComprehension, isFreeInAnyChild, errorAtDirective. Splits the original 200-line analyze_block into prepareClassPreambleSets, finalizeChildSets, analyzeChildren, pickClassEntry, spliceInlinedChildren to satisfy gocognit. nameSet is a small map[string]struct{} helper that mirrors PySet operations (add / discard / contains / clone / union).

Errors

  • symtable/errors.go: SyntaxError {Msg, Filename, Pos} implementing the error interface. CPython error strings preserved verbatim in package-level constants:
    • msgGlobalParam, msgNonlocalParam, msgGlobalAfterAssign, msgNonlocalAfterAss, msgGlobalAfterUse, msgNonlocalAfterUse, msgGlobalAnnot, msgNonlocalAnnot, msgImportStar.
    • msgNamedExprComp, msgNamedExprBound, msgNamedExprAlias, msgNamedExprParam, msgNamedExprConflict, msgNamedExprInner, msgNamedExprIterExpr.
    • msgAnnotationNotAllow, msgNotAllowedInTypVar, msgNotAllowedInAlias, msgNotAllowedInParams, msgDupTypeParam, msgDupArgument.
    • msgAsyncWithOutside, msgAsyncForOutside, msgAsyncCompOutside, msgAwaitOutsideFunc, msgAwaitOutsideAsync.
    • msgAssignDebug, msgDeleteDebug, msgFutureLate.
    • msgNonlocalAtModule, msgNoBindingNonlocal, msgNonlocalGlobal, msgNonlocalTypeParam.
    • errorf(filename, loc, format, args...) constructor.

Tests

  • symtable/symtable_test.go: module / function / class / nested closure / global directive / nonlocal-at-module / nonlocal-no-binding / global-after-use / __debug__ rejection / class method flag / private-name mangling in class / list comprehension / import binding / star-import outside module / duplicate argument / SyntaxError shape.
  • symtable/mangle_test.go: full panel for Mangle and MaybeMangle allow-list.

6. compile package: codegen

Detailed source-of-truth for this section: 1626_gopy_codegen.md. The checklist below is the cross-cutting view. The per-visitor file split, function citations, fblock-stack types, super-instruction contract, with-statement state machine, deferred-annotation panel, PEP 695 type-parameter codegen, and the layered test plan all live in 1626. Tick a box here only after the matching detailed checklist in 1626 is also ticked.

Driver

  • compile/codegen.go: (*Compiler).Codegen(scope *symtable.Entry, mod ast.Mod) (*Unit, error).
  • compile/codegen.go: Compiler struct (current scope stack, unit stack, future flags, filename, source, optimize level, symtable, const cache).
  • compile/codegen_addop.go: addOp / addOpI / addOpJump / loadConst / addOpName / useLabel helpers as methods on *Compiler. Macros land as Go methods rather than a separate macros.go.

Statement visitors

  • [~] FunctionDef / AsyncFunctionDef (decorators, posonly, kwonly, defaults, kwonly defaults, varargs, varkw, closure cells landed. PEP 695 type parameters and PEP 649 deferred annotations are separate panels later in 1626).
  • ClassDef (basic shape: bases, keyword args, decorators). Type parameters / classcell / static-attributes panels land alongside PEP 695 + super().
  • Return, Delete, Assign, AugAssign, AnnAssign (full target panel including attr / subscript / tuple / star unpack).
  • For / AsyncFor (with break/continue unwind).
  • While.
  • If (skeleton: constant-condition specialization pending).
  • With / AsyncWith (with-statement state machine).
  • Match / MatchValue / MatchSingleton / MatchSequence / MatchMapping / MatchClass / MatchStar / MatchAs / MatchOr.
  • Raise (with from).
  • Try / TryStar (PEP 654).
  • Assert (with __debug__ short-circuit).
  • Import / ImportFrom (with star).
  • Global / Nonlocal (no-op at codegen, already in symtable).
  • Expr (top-level expression: CALL_INTRINSIC_1 INTRINSIC_PRINT in interactive mode).
  • Pass / Break / Continue (loop fblock walk; full unwind through with / try lands alongside those visitors).
  • Delete / AugAssign / AnnAssign.
  • TypeAlias (PEP 695, lowered as LOAD_CONST <name>, LOAD_CONST None, <value>, CALL_INTRINSIC_1 INTRINSIC_TYPEALIAS=12, STORE_NAME).

Expression visitors

  • BoolOp (short-circuit jumps).
  • NamedExpr (walrus).
  • BinOp.
  • UnaryOp.
  • Lambda.
  • IfExp.
  • Dict, Set, List, Tuple displays.
  • ListComp, SetComp, DictComp, GeneratorExp.
  • Await, Yield, YieldFrom.
  • Compare (chained compares).
  • Call (with star/starstar args, keyword args).
  • FormattedValue, JoinedStr.
  • [~] Interpolation, TemplateStr (PEP 750 t-strings; deferred).
  • Constant (LOAD_CONST + co_consts allocation).
  • Attribute (LOAD_ATTR + super-instruction LOAD_SUPER_ATTR).
  • Subscript (LOAD/STORE/DELETE).
  • Starred (in target / arg positions).
  • Name (LOAD_FAST / LOAD_DEREF / LOAD_GLOBAL / LOAD_NAME by scope).
  • Slice.

Block / unwind machinery

  • Frame block stack (pushFblock, popFblock, unwind helpers in codegen_fblock.go).
  • Try/except/finally exception table emission.
  • With unwinding (call exit on path).
  • Async-with unwinding (await aexit).
  • [~] Generator return-value handling (RETURN_GENERATOR + RESUME prologue landed; full StopIteration packaging refined in v0.6 vm).
  • [~] Coroutine close handling (CoCoroutine flag set; close protocol landed in v0.6 vm).

Tests

  • compile/codegen_test.go: per-statement smoke tests against a parser-stub feeding hand-built ASTs.
  • Comprehension scope test (free var capture).
  • Match-statement panel (one test per pattern kind).
  • TypeAlias codegen test pinned via the type_alias golden (v05test/testdata/golden/type_alias.golden).
  • [~] PEP 695 type-parameter codegen test for TypeVar / ParamSpec / TypeVarTuple bodies. Validator path covered by validateTypeParams; full codegen golden lands alongside the generic-class panel in v0.6.

7. compile package: flowgraph

Detailed source-of-truth for this section: 1627_gopy_flowgraph.md. The checklist below is the cross-cutting view. The per-pass file split, exact pass ordering inside OptimizeCodeUnit, const-cache structure, optimizeLoadFast ref-stack panel, and the layered test plan live in 1627.

  • [~] compile/flowgraph.go:
    • BasicBlock struct (Instrs, Next, Label, StartDepth, Predecessors, Visited, Cold, Warm, Reachable).
    • Builder struct (Head, Tail, labelMap).
    • FromSequence(*Sequence) (*Builder, error).
    • [~] Optimize(*Sequence, *[]any, nlocals, firstLineno) (*Info, error): runs the v0.5 subset of _PyCfg_OptimizeCodeUnit against the flat sequence. CFG-driven passes are queued for the follow-on.
    • (*Builder).ToSequence() (*Sequence, error).
    • Jump-label resolution via Sequence.ApplyLabelMap(HasTarget).

Optimization passes (each is a parity target)

  • LOAD_CONST chain folding (multi-step). Optimize now drives foldBinaryIntConst plus eliminateDeadCodeAfterTerminator to a fixed point so a fold that exposes a new triple gets caught in the same pass.
  • Int-int BINARY_OP folding (foldBinaryIntConst).
  • Jump threading (JUMP -> JUMP -> X) via threadJumps.
  • Conditional-jump propagation (POP_JUMP_IF_TRUE over a tail-conditional block) via propagateConditionalJumps.
  • Unreachable block elimination via removeUnreachableBlocks (DFS reachability with handler labels pinned as roots; length preserving so the label map and jump opargs stay valid).
  • Dead-code elimination after unconditional terminators.
  • [~] Stack-effect verification (forward linear scan via calculateStackdepth; CFG-based variant pending).
  • RESUME insertion at entry (codegen prologue).
  • Implicit LOAD_CONST None / RETURN_VALUE for fall-through function returns.
  • EXTENDED_ARG insertion for wide opargs (assemble-time).
  • Push-null fix-up for super-instruction call sites.
  • Redundant-NOP compaction (removeRedundantNops).

Tests

  • compile/flowgraph_test.go and flowgraph_passes_test.go: hand-built sequences exercising fold / dead-code / NOP compact / label resolution / stack depth.

8. compile package: assemble

Detailed source-of-truth for this section: 1628_gopy_assemble.md. The checklist below is the cross-cutting view. The PEP 626 line-table dispatcher, PEP 657 exception-table varint encoding, the co_localsplus / fastlocalskinds layout, and the marshal-parity test plan live in 1628.

  • compile/assemble.go:
    • Assembler internals (Code, LineTable, ExceptionTable, Consts, Names, Varnames, Freevars, Cellvars).
    • Assemble(seq *Sequence, info *Info, unit *Unit, filename string) (*Code, error).
  • EXTENDED_ARG wide-oparg expansion.
  • PEP 626 location table (short / one-line / long / no-location / no-column varint forms in assemble_locations.go).
  • PEP 657 exception table (start, end, target, depth, lasti as 6-bit varint deltas in assemble_exceptions.go).
  • Constants table allocation (de-dup by (typeTag, value) with float bit-pattern keying for NaN-safety in codegen_addop.go).
  • Names / Varnames / Freevars / Cellvars population from symtable.
  • co_flags assembly (CoOptimized, CoNewLocals, CoVarargs, CoVarkeywords, CoNested, CoGenerator, CoNoFree, CoCoroutine, CoMethod). CoIterableCoroutine and CoAsyncGenerator land alongside the generator/coroutine state machines.
  • co_qualname build-up via buildQualname walking the unit stack (top-level / class parent / function <locals> parent).
  • co_code byte emission.
  • compile/assemble_test.go plus assemble_locations_test.go, assemble_exceptions_test.go, assemble_flags_test.go:
    • EXTENDED_ARG widening.
    • Location table varint round-trip.
    • Exception table varint round-trip.
    • Const dedup parity with CPython (type-keyed).

9. compile package: driver

  • compile/compiler.go:
    • Compile(mod ast.Mod, filename string, optimize int) (*Code, error).
    • Pipeline order: future.FromAST -> symtable.Build -> Codegen (per-scope) -> Optimize -> Assemble. Validate / Preprocess wire in alongside the parser handover; the v0.5 entry is parser-stub fed.
    • [~] optimize levels -1, 0, 1, 2 mirror CPython (level threaded through; level-2 docstring removal lands in preprocess; full -O panel beyond docstring/assert lands in v0.6).
  • compile/compiler_test.go + the v05test gate package: end-to-end on hand-built ASTs.

10. tokenize package skeleton (1665)

  • tokenize/types.go: hand-written Type declaration plus String() lookup. The numeric constants live in the generated file.
  • tools/tokens_go/: generator from Grammar/Tokens plus Include/internal/pycore_token.h plus Lib/token.py.
  • tokenize/types_gen.go: generator output. 69 token kinds (ENDMARKER=0..ENCODING=68) plus tokenNames table.
  • [~] tokenize/tokenize.go: skeleton (Iter, Token, New, NewReadline, Next) lands with the v0.9 lexer port; v0.5 ships the type table only since the gate uses hand-built ASTs.
  • tokenize/tokenize_test.go: type numeric pinning plus Type.String lookup. Iterator contract tests land in v0.9.

11. dis-equivalent disassembler (gate support)

  • compile/dis.go: Disassemble(co *Code) string rendering bytecode plus recursive headers for nested code objects. Used by the v05test gate for structural assertions.
  • compile/dis_test.go: opcode rendering, oparg display, EXTENDED_ARG recombination, nested-code header. Byte-equal comparison against a CPython golden capture lands with the marshal package.

12. v05test cross-cut gate

  • [~] v05test/gate_test.go: structural panel landed; the byte-equal marshal-roundtrip variant waits on the marshal package and golden corpus.
    • TestGateEmptyModule: empty module returns None.
    • TestGateSimpleAssign: x = 1.
    • TestGateBinaryAdd: a = 1 + 2 (asserts the int-int fold).
    • TestGateLoadAfterStore: x = 1; x.
    • TestGateIfWhile: if/while panel (asserts POP_JUMP_IF_FALSE; the JUMP back-edge is asserted once pseudo-op lowering lands).
    • [~] TestGateTryExcept: wired but t.Skip'd pending CFG-based stack-depth (handler entry seeding).
    • TestGateDef: def f(x): return x + 1.
    • [~] TestGateComprehension: wired but t.Skip'd pending CFG back-edge stack-depth.
    • TestGateAsyncFunction: async def f(): pass (asserts CoCoroutine on the inner code object).
    • [n] TestGateMarshalRoundtrip: marshal-byte parity against CPython is deferred to v0.8 with the import system. Until then the disassembly-text golden corpus (1629) is the gate. The v0.8 follow-on adds a code-object marshal arm and a byte-for-byte panel against host CPython.
  • v05test/testdata/golden/: ten checked-in .golden disassembly snapshots (empty_module, simple_assign, binary_add, load_after_store, if_pass, while_pass, def_add_one, async_def_pass, class_pass, type_alias). Refresh contract via go test ./v05test/ -update -run TestGolden; see 1629 for the corpus rules.

13. Release plumbing for v0.5.0

  • CHANGELOG: v0.5.0 entry.
  • changelog/v0.5.0.md: full release notes.
  • build/version.go: bump to 0.5.0 for the release commit.
  • PR with all-green CI (lint + test on macOS, Linux, Windows).
  • Tag v0.5.0 and create GitHub release. Pending explicit release go-ahead.
  • Bump main to 0.6.0-dev post-release.

14. Docs and side artifacts

  • Update 1602_gopy_filemap.md with the v0.5 entries.
  • Update 1603_gopy_roadmap.md to reflect what landed for v0.5: CFG-driven optimisation passes, validate panel, TypeAlias codegen, disassembly golden corpus (1629). The "shipped" marker flips with the v0.5.0 tag.
  • [n] Cross-reference 1690_gopy_quirks.md when codegen or assemble decisions diverge from a literal C port. 1690 is reserved (v0.5 port has no codegen / assemble divergences worth recording).

Working notes (carry forward)

  • The ast/nodes.go hand-written file is a temporary scaffold so that future and other early-v0.5 ports compile. It must shrink once nodes_gen.go lands; only Pos, NoPos, and IsDocString should remain in non-generated files.
  • compile.Sequence.ApplyLabelMap takes a hasTarget predicate callback to keep instrseq.go independent of the opcode metadata. Once compile/opcodes_gen.go lands, callers should pass compile.HasTarget directly (a generated helper), and a thin wrapper (*Sequence).ApplyLabels() may be added for ergonomics.
  • The asdl generator and the opcode generator share style: both read a CPython-source-tree input file at go generate time and emit a checked-in _gen.go. Generators live under tools/ and are not compiled into the runtime binary.
  • The dis-equivalent in compile/dis.go is a tool, not part of the runtime API surface; it lives in the compile package only because it needs the opcode metadata. The Python-visible dis module port comes much later (stdlib effort).

Gate

src := "a = 1 + 2"
code := compile.Compile(parser.Parse(src), "<gate>", 0)
got := dis.Format(code)
want := "<output captured from CPython 3.14 dis.dis>"
if got != want { t.Fatal(got) }

Plus byte-equal checks on co_code, co_linetable, and co_exceptiontable for a small panel of programs (assignment, if, while, try/except, def, comprehension, async function).

Out of scope for v0.5

  • PEG parser (cpython/Parser/). Separate spec series.
  • Tier-2 optimizer (optimizer*.c). Lands in v0.12.
  • Specialization (specialize.c). Lands in v0.11.
  • Instrumentation hooks. Lands in v0.11.
  • JIT (jit.c). Indefinitely deferred.