1620. gopy compile pipeline
What we are porting
Twelve files from cpython/Python/ (~39k lines) form the compiler that
turns an AST into a code object:
| C source | Lines | Go target |
|---|---|---|
asdl.c | 6 | ast/asdl.go |
Python-ast.c (generated) | 18485 | ast/nodes_gen.go |
ast.c | 1091 | ast/validate.go |
ast_preprocess.c | 990 | ast/preprocess.go |
ast_unparse.c | 1029 | ast/unparse.go |
future.c | 119 | future/future.go |
symtable.c | 3266 | symtable/symtable.go |
instruction_sequence.c | 483 | compile/instrseq.go |
codegen.c | 6483 | compile/codegen.go |
flowgraph.c | 4165 | compile/flowgraph.go |
assemble.c | 802 | compile/assemble.go |
compile.c | 1753 | compile/compiler.go |
The 1620 series runs after the parser hands us an AST and before the VM
runs the bytecode. It is the longest spec series in the project and the
one with the strongest source-shape parity requirement: the dis.dis
output of a compiled module must match CPython 3.14 byte-for-byte.
The parser itself (PEG, in cpython/Parser/) is a separate spec series.
v0.5 assumes parser.Parse(src) (ast.Mod, error) already exists.
Layering
parser.Parse (separate spec)
│
▼
ast.Mod (asdl-typed tree)
│
ast.Validate, ast.Preprocess
│
▼
future.FromAST ───► future flags bitmask
│
▼
symtable.Build ───► per-scope symbol tables
│
▼
compile.Compile
├─ codegen (per-scope instruction sequence)
├─ flowgraph (CFG, peephole optimizations)
└─ assemble (bytecode + line table + exception table)
│
▼
*objects.Code
Each stage is independently testable. The gate test at the v0.5
boundary calls compile.Compile(parser.Parse("a = 1 + 2")) and asserts
the disassembly matches CPython.
ast package
asdl.go
asdl.c is six lines: the macro-expanded _PyAsdl_Sequence_New for
generic, identifier, and int sequences. In Go this is just a type
parameter:
package ast
// Seq is the asdl_seq* equivalent. CPython stores sequence length and
// elements inline; we use a Go slice.
type Seq[T any] []T
No dedicated allocator. The arena lives in arena/ (v0.1).
nodes_gen.go
Python-ast.c is generated by Parser/asdl_c.py from
Parser/Python.asdl. We do not port the C generator; we write a Go
generator (tools/asdl_go) that consumes the same .asdl file and
emits Go structs. Each AST node is a struct embedding a position. Sum
types (mod, stmt, expr, expr_context, etc.) become interfaces with a
sealed marker.
Examples:
type Pos struct{ Lineno, ColOffset, EndLineno, EndColOffset int }
type Mod interface{ isMod() }
type Module struct {
Pos
Body Seq[Stmt]
TypeIgnores Seq[TypeIgnore]
}
func (*Module) isMod() {}
type Stmt interface{ isStmt() }
type Assign struct {
Pos
Targets Seq[Expr]
Value Expr
TypeComment string
}
func (*Assign) isStmt() {}
type ExprContext int
const (
Load ExprContext = iota + 1
Store
Del
)
Constructors mirror _PyAST_*: ast.NewModule(body, typeIgnores, pos),
ast.NewAssign(targets, value, typeComment, pos), etc.
The full enumeration of mod/stmt/expr/pattern/type_param/excepthandler
sums plus the product types (arguments, arg, keyword, alias, withitem,
match_case, type_ignore) lives in the generator output. Enumerating
them here would duplicate the asdl source; treat Parser/Python.asdl
as authoritative.
validate.go (_PyAST_Validate)
Pure tree walk that rejects malformed nodes:
- Position sanity.
lineno <= end_lineno. Column ranges within the surrounding line. No negative offsets unless the canonical "no position" marker (-1, -1, -1, -1). - Forbidden identifiers.
None,True,Falsecannot appear asName,arg.arg,keyword.arg,alias.name(except as the module name infrom None importwhich is itself rejected elsewhere). Constant.valueconstrained to the marshallable subset (int, float, complex, str, bytes, bool, None, Ellipsis, tuple, frozenset of the same).- Comprehensions have at least one generator.
expr_contextconsistency. Targets inStore, deletions inDel, reads inLoad.StarredinLoadonly as a top-level Call arg or as an iterable element.Matchpatterns shape:MatchClass.clsis a Name or Attribute,MatchMappingkeys are constants or attribute lookups, etc.- Type parameter constraints (PEP 695):
TypeVar.boundcannot be a starred expression.
Returns error (CPython sets PyErr; we return).
preprocess.go (_PyAST_Preprocess)
Three passes fused into one walk that mirrors astfold_* in
ast_preprocess.c for CPython 3.14 (note: arithmetic constant folding
on BinOp / UnaryOp / Compare / BoolOp of constants moved to
flowgraph.c LOAD_CONST chain folding in 3.14; do NOT fold those here):
- Targeted constant folding. The 3.14 surface is small:
string % tupleprintf-style format collapses to aJoinedStrwhen the format spec is supported (%s,%r,%awith optional width/precision) and the tuple has noStarred.Name("__debug__")inLoadcontext becomesConstant(!optimize).MatchValue/MatchMappingkeys allow-NandConst ± Constso user code likecase -1orcase 1+2jstill works after folding.__debug__and the format-fold respectsyntax_check_only(no mutation when only running validation).
- PEP 765 finally checks. Walk every
Trynode and warn when afinallyblock containsreturn,break, orcontinue. Emitted as aSyntaxWarningvia the v0.7 warnings module; v0.5 records the warnings on the compiler context. - Body cleanup.
- Under
optimize >= 2, the leading docstring is dropped (replaced withPassif it was the only statement). - If a fold produced a string-typed expression at body position 0,
wrap it in a single-value
JoinedStrso the docstring detector does not re-treat it as a docstring.
- Under
Annotation expressions under CO_FUTURE_ANNOTATIONS (PEP 563) are
skipped: their text is captured by the unparser separately, the
constant fold has no business descending.
unparse.go (_PyAST_Unparse)
Produces a Python source representation of an expression. Used for
PEP 649 deferred-evaluation annotations: a def f(x: T) retains the
unparsed annotation source string for the __annotations__ dict.
Operator precedence drives parenthesization. Float infinity renders
as 1e309 per CPython. Format strings, interpolations, template
strings round-trip back to f-string / t-string syntax.
This is a literal port; the precedence table and per-node visitor
order match ast_unparse.c 1:1.
future package
future.go ports future.c. Walks the module body past the docstring
collecting from __future__ import x statements. Produces:
package future
type Features struct {
Bits uint32
Location ast.Pos
}
const (
Generators = 1 << iota // CO_FUTURE_GENERATORS (legacy, always on)
Division // CO_FUTURE_DIVISION
AbsoluteImport // CO_FUTURE_ABSOLUTE_IMPORT
WithStatement // CO_FUTURE_WITH_STATEMENT
PrintFunction // CO_FUTURE_PRINT_FUNCTION
UnicodeLiterals // CO_FUTURE_UNICODE_LITERALS
BarryAsBdfl // CO_FUTURE_BARRY_AS_BDFL
GeneratorStop // legacy, ignored
Annotations // CO_FUTURE_ANNOTATIONS (PEP 563)
)
func FromAST(mod ast.Mod) (*Features, error)
Errors mirror CPython: from __future__ import braces raises
SyntaxError with the exact same string. Future imports after the
first non-docstring/non-future statement raise SyntaxError. Unknown
feature names raise SyntaxError with the unknown name interpolated.
symtable package
Port of symtable.c. _PySymtable_Build becomes symtable.Build.
package symtable
type Block int
const (
ModuleBlock Block = iota
FunctionBlock
ClassBlock
AnnotationBlock
TypeAliasBlock
TypeParametersBlock
TypeVariableBlock
)
type Scope int
const (
Local Scope = iota + 1
GlobalExplicit
GlobalImplicit
Free
Cell
)
// SymbolFlags packs the DEF_* and USE bits plus the resolved Scope.
type SymbolFlags uint32
const (
DefGlobal SymbolFlags = 1 << iota
DefLocal
DefParam
DefNonlocal
Use
DefFreeClass
DefImport
DefAnnot
DefCompIter
DefTypeParam
DefCompCell
)
type Entry struct {
Type Block
Name string
Symbols map[string]SymbolFlags
Children []*Entry
Varnames []string
// Per-block attributes mirroring ste_*.
Nested bool
Generator bool
Coroutine bool
Comprehension bool
Varargs bool
Varkeywords bool
ReturnsValue bool
NeedsClassClosure bool
NeedsClassDict bool
Loc ast.Pos
}
type Table struct {
Top *Entry
Blocks map[ast.Node]*Entry
Future *future.Features
}
func Build(mod ast.Mod, filename string, ff *future.Features) (*Table, error)
Two-phase algorithm:
- Block visit. Walk the AST building a tree of
Entry. Each function/class/comprehension/lambda creates a child block. Every name use or definition records a flag bit on the enclosing block. - Analysis. Bottom-up over the block tree, resolve each name to
one of
Local,GlobalExplicit,GlobalImplicit,Free,Cell. Free-vars in nested scopes mark their defining scope's slot asCell. Class scopes mediate (they can contributeCellfor methods but their own free names skip the class scope).
Error parity is non-negotiable: the exact SyntaxError strings ("name
'x' is used prior to global declaration", "no binding for nonlocal
'x' found", "name 'x' is assigned to before global declaration", etc.)
must match.
compile package
instrseq.go
Port of instruction_sequence.c. The pre-CFG instruction stream:
package compile
type Label int // -1 means unbound.
type Instr struct {
Op Opcode
Arg int32
Loc ast.Pos
// Exception handler info: index into the exc-handler stack at
// emit time, set during codegen, finalized during assemble.
ExcDepth int32
}
type Sequence struct {
Instrs []Instr
LabelMap []int // label id to instr index, len == NewLabel calls.
Nested []*Sequence
}
func (s *Sequence) NewLabel() Label
func (s *Sequence) UseLabel(l Label)
func (s *Sequence) Addop(op Opcode, arg int32, loc ast.Pos)
func (s *Sequence) Insert(idx int, in Instr)
func (s *Sequence) ApplyLabelMap()
func (s *Sequence) AddNested(child *Sequence)
Opcode is generated from CPython's Lib/opcode.py (or, equivalently,
Include/internal/pycore_opcode.h). The opcode set is shared between
v0.5 and v0.6; the generator lives in tools/opcodes_go and emits
compile/opcodes_gen.go plus vm/opcodes_gen.go.
codegen.go
Detailed source-of-truth:
1626_gopy_codegen.md. The summary below covers the cross-cutting picture; per-visitor file split, fblock stack, match panel, with-statement state machine, deferred annotations, PEP 695 type-parameter codegen, and the comprehensive per-visitor test plan live in 1626.
Port of codegen.c (~6500 lines). Walks each scope's AST and emits
into a Sequence. Per-node visitors mirror compiler_visit_*. The
naming convention is genXxx for the visitor of Xxx:
func (c *compiler) genStmt(s ast.Stmt)
func (c *compiler) genExpr(e ast.Expr)
func (c *compiler) genCall(call *ast.Call)
// ... one per AST node kind.
Macro shorthands from CPython (ADDOP, ADDOP_I, ADDOP_LOAD_CONST,
ADDOP_NAME, ADDOP_JUMP, ADDOP_COMPARE) become methods on
*compiler that wrap Sequence.Addop.
Name resolution piggybacks on the symtable: the codegen looks up each
Name in the current Entry.Symbols to pick LOAD_FAST /
LOAD_DEREF / LOAD_GLOBAL / LOAD_NAME. Comprehensions inline into
the parent stream when they share a scope, and emit a nested sequence
otherwise.
The control-flow constructs (try/except/finally, with, async with,
match, comprehensions) are line-for-line ports. Each block-leaving
construct (return, break, continue, raise) honors the unwind protocol
by emitting the right POP_BLOCK / RERAISE sequence; this is the
trickiest part of codegen and is gated by a dedicated subset of the
CPython test suite (test_compile, test_dis).
flowgraph.go
Detailed source-of-truth:
1627_gopy_flowgraph.md. The summary below lists the visible types and the optimisation set; per-pass file split, exact pass ordering, const-cache structure, super-instruction contract, optimizeLoadFast ref-stack panel, and the layered test plan live in 1627.
Port of flowgraph.c. Converts an instruction stream into a CFG of
basic blocks, runs peephole optimizations, then converts back.
type Block struct {
Instrs []Instr
Succ []*Block
Pred []*Block
StackEntry int32
Reachable bool
}
type CFG struct {
Blocks []*Block
Entry *Block
}
func FromSequence(seq *Sequence) *CFG
func (g *CFG) Optimize()
func (g *CFG) ToSequence() *Sequence
func (g *CFG) ResolveJumps() // labels -> offsets
Optimizations (every one a CPython parity test target):
- Constant folding on
LOAD_CONSTchains. - Jump threading (
JUMP -> JUMP -> XbecomesJUMP X). - Conditional jump propagation (
POP_JUMP_IF_TRUEover a block that ends in another conditional jump). - Dead-code elimination (unreachable blocks and instructions after an unconditional terminator).
- Stack-effect verification.
- Push
RESUMEat function entry. - Insert
LOAD_CONST None,RETURN_VALUEfor fall-through returns.
The order and exact predicates are taken from flowgraph.c. Lints
(gocognit, gocyclo) are exempted on the optimizer entry point
because the CPython source structure is the contract.
assemble.go
Detailed source-of-truth:
1628_gopy_assemble.md. The summary below shows the public type and the four-step plan; the PEP 626 line-table format dispatcher, PEP 657 exception-table varint encoding, the localsplus / fastlocalskinds layout, and the marshal-parity test plan live in 1628.
Port of assemble.c. Lays out the final code object:
type Assembler struct {
Code []byte
LineTable []byte
ExcTable []byte
Consts []objects.Object
Names []string
}
func Assemble(scope *symtable.Entry, seq *Sequence, ff *future.Features) (*objects.Code, error)
Steps:
- Lay out instructions in order, expanding wide args via
EXTENDED_ARG. - Emit the line/column table in the adaptive varint format (PEP 626): short, one-line, long, none.
- Emit the exception table (PEP 657): start, end, target, stack_depth, lasti as varint deltas.
- Allocate constants, names, varnames, freevars, cellvars from the symtable.
The byte format is fixed by CPython 3.14 marshal, so the assembler is
testable against dis.dis output and against marshal.loads of a
CPython-produced .pyc.
compiler.go
Top-level compile.Compile:
func Compile(mod ast.Mod, filename string, optimize int) (*objects.Code, error)
Driver: ast.Validate -> ast.Preprocess -> future.FromAST ->
symtable.Build -> per-scope codegen -> flowgraph.Optimize ->
assemble.Assemble. Errors propagate as Go errors carrying the
SyntaxError string and position.
Phasing inside v0.5
Because v0.5 is the largest single phase in the project, we split it
into commits sized to land independently. Each commit ships with
matched unit tests and is lint-clean. The cross-cut gate at
v05test/gate_test.go exercises the full pipeline and is the last to
land.
Full v0.5 checklist
Status legend: [x] done, [ ] pending, [~] partial / scaffold.
1. ast package
asdl runtime
-
ast/asdl.go:Seq[T],Len,Get,Set,NewSeq. Mirrors_Py_asdl_*_seq_newandasdl_seq_LEN/GET/SET. -
ast/asdl_test.go: empty seq, get/set round-trip, nil Len.
Hand-written node skeleton (foundation for early ports)
-
ast/nodes.go:Pos,NoPos, sealedMod/Stmt/Exprinterfaces,Module,Interactive,Expression,FunctionType,TypeIgnore,ImportFrom,Alias,ExprStmt,Constant,IsDocString(mirrors_PyAST_GetDocString).
asdl-driven Go generator
-
tools/asdl_go/: parser forcpython/Parser/Python.asdl. -
tools/asdl_go/: emitter for Go structs and sealed-interface marker methods. Collision rule renamesstmt.ExprtoExprStmtand single-ctor sums (type_ignore.TypeIgnore) to<name>Node. -
tools/asdl_go/main.go: CLI with-input,-output. -
tools/asdl_go/main_test.go: small-fixture parse and emit shape tests plus a real-asdl smoke test.
Generated nodes (output of the generator)
-
ast/nodes_gen.go: full mod/stmt/expr/pattern/excepthandler sums, type_param sum (PEP 695), product types (arguments,arg,keyword,alias,withitem,match_case), expr_context and operator enums folded in (CPython collapses these into the same asdl file). Hand-written stubs innodes.goretired; onlyPos,NoPos, andIsDocStringremain there.
Validation
-
ast/validate.goplusast/validate_panel.go:Validate(mod ast.Mod) errormirroring_PyAST_Validate.- Position sanity (lineno ordering, no negative offsets except NoPos sentinel).
- Forbidden identifier names (
None,True,False) viavalidateName. -
Constant.Valueconstrained to marshal-allowed kinds. - Comprehension non-empty generators (
validateComprehension). - expr_context consistency via
validateExprCtx(Name/Attribute/Subscript/Starred/List/Tuple ctx slots). -
Starredallowed only as Call arg or iterable element (defaultvalidateExprrejects;CallandvalidateLoadEltspermit in Load context). -
Matchpattern shape rules (validatePattern,validatePatternSeq). - PEP 695 type-parameter constraints (
validateTypeParams).
-
ast/validate_test.go: foundation panel (positions, ImportFrom level, constant kinds, nil rejection). Expands with later validators.
Preprocess
-
ast/preprocess.go:Preprocess(mod, opts) []Warning. Mirrors_PyAST_Preprocessfor CPython 3.14: a single tree walk that does PEP 765 plus the small set of folds 3.14 keeps in the AST layer.- Full
astfoldwalker descending every stmt / expr / pattern / type_param kind (parity withastfold_*cases inast_preprocess.c). - PEP 765 finally-block control-flow checks (
return,break,continue). -
string % tupleprintf-format fold toJoinedStr(viafold_binop). -
Name("__debug__")substitution by Constant(!optimize) in Load context. - MatchValue / MatchMapping const folding (USub of Constant, Add/Sub of all-Constant operands).
- Docstring removal under
-OO(replaces sole-stmt docstring withPassperremove_docstring). - Body re-wrap when a fold produces a leading string expr.
- PEP 563 annotation skip:
Arg.annotation,FunctionDef.returns,AsyncFunctionDef.returns,AnnAssign.annotationnot visited whenCO_FUTURE_ANNOTATIONSis set in the future bits. -
syntax_check_onlyflag suppresses mutating folds. - [n] Arithmetic / Compare / BoolOp / tuple / frozenset constant
folding lives in
flowgraph.cLOAD_CONST chain folding for 3.14; not part ofast_preprocess.c. See section 7 below.
- Full
-
ast/preprocess_test.go: PEP 765 panel plus fold panels for format folding,__debug__, MatchValue numerics, docstring removal, PEP 563 annotation skip, syntax-check-only mode.
Unparse
-
ast/unparse.go:Unparse(expr Expr) (string, error).- Operator precedence parenthesization parity.
- Float infinity rendered as
1e309. - f-string / t-string round-trip.
-
FormattedValue,Interpolation,JoinedStr,TemplateStr.
-
ast/unparse_test.go: golden-string panel matching CPython'sast.unparseoutput.
2. future package
-
future/future.go:Features, CO_FUTURE_* numeric flags matchingInclude/cpython/code.h,FromAST,SyntaxErrorerror type, "not a chance" string parity forfrom __future__ import braces, "future feature %.100s is not defined" parity. -
future/future_test.go: annotations, barry_as_FLUFL, no-op features, braces rejection, unknown rejection, docstring skip, stop-at-first-non-future, relative-import ignored, location tracking, Expression mode.
3. compile package: instruction sequence
-
compile/instrseq.go:Sequence,Instr,ExceptHandlerInfo,JumpTargetLabel,Opcode,MaxOpcode=511,MaxOparg=1<<30,NewLabel,UseLabel,Addop,Insert,AddNested,SetAnnotationsCode,ApplyLabelMap. TakeshasTargetpredicate callback so it does not depend on opcode metadata yet. -
compile/instrseq_test.go: label IDs, addop+UseLabel, idempotent ApplyLabelMap, non-jump preservation, Insert label shift, AddNested, except-handler resolution, SetAnnotationsCode panic, opcode/oparg range panics.
4. opcode generator and metadata
-
tools/opcodes_go/: parser forcpython/Lib/_opcode_metadata.pyandInclude/internal/pycore_opcode_metadata.h. -
tools/opcodes_go/main.go: CLI emittingcompile/opcodes_gen.go. vm/opcodes_gen.go waits on the v0.6 vm port. -
compile/opcodes_gen.go: typedOpcodeconstants for the full 3.14 opmap;HasArg,HasConst,HasName,HasJump,HasFree,HasLocal,HasEvalBreak,HasDeopt,HasError,HasEscapes,HasExit,HasPure,HasPassthrough,HasOpargAnd1,HasErrorNoPop,HasNoSaveIppredicates;Name()lookup;HasTargetshorthand for the label resolver. - [~]
compile/opcodes_metadata_gen.go: per-opcode stack-effect table (push/pop counts) and cache-line sizes. v0.5 hand-codes the stack-effect table insideflowgraph_stackdepth.go; the generated file lands when the bytecodes.c DSL generator does in v0.6. -
compile/opcodes_test.go: opcode number cross-check, flag predicates, HasTarget panel, out-of-range Name. - Wire
Sequence.ApplyLabelMapcallers to passcompile.HasTarget.
5. symtable package
Data model
-
symtable/types.go:Block,Scope,SymbolFlagsconstants byte-equal to CPython (DEF_GLOBAL=1,DEF_LOCAL=2,DEF_PARAM=4,DEF_NONLOCAL=8,USE=0x10, etc., perInclude/internal/pycore_symtable.h). IncludesDEF_FREE_CLASS,DEF_IMPORT,DEF_ANNOT,DEF_COMP_ITER,DEF_TYPE_PARAM,DEF_COMP_CELL,SCOPE_OFFSET,ScopeMask. Block enum order matches_Py_block_ty.ComprehensionTypeenum byte-equal. -
symtable/entry.go:Entrystruct mirroringPySTEntryObjectfield-for-field, including the PEP 649AnnotationBlock, PEP 695MangledNames,CanSeeClassScope,HasConditionalAnnotations,InConditionalBlock,InUnevaluatedAnnotation,CompIterExpr,Method,HasDocstring.Directive {Name, Loc}records for global / nonlocal locations. Methods:IsFunctionLike,GetSymbol,GetScope. -
symtable/table.go:Tablestruct (Top, Blocks, Future, Filename) withLookup(key any). Map key isanybecause noast.Nodeinterface exists. -
symtable/mangle.go:Mangle(private, name)/MaybeMangle(private, ste, name)PEP 8 private name mangling.
Build pass (block visit)
-
symtable/build.go:Build(mod ast.Mod, filename string, ff *future.Features) (*Table, error). Driver dispatches Module / Interactive / Expression bodies;FunctionTypereturns an explicit error to match CPython.buildercarriestable,cur,stack,private,filename,future,nextID. Helpers:enterBlock,enterExisting,exitBlock,addDef,addDefCtx,addDefHelper,checkName,lookup,lookupEntry,recordDirective,allowsTopLevelAwait(stub returns false),isAsyncDef. -
symtable/build_visit.go:visitStmtplus per-kind helpers (visitFunctionLike,visitClassDef,visitTypeAlias,visitReturn,visitAssign,visitAnnAssign,visitForLike,visitWhile,visitIf,visitMatch,visitRaise,visitTryLike,visitAssert,visitWithLike,visitGlobal,visitNonlocal,checkImportFrom,maybeSetCoroutineForModule). Split intovisitStmtDef/visitStmtControl/visitStmtSimpleto satisfy gocyclo.funcLikestruct +unpackFuncLikefactor the FunctionDef / AsyncFunctionDef variants.annotationKey {parent, index}distinguishes adjacent annotations. -
symtable/build_expr.go:visitExprsplit intovisitExprComp/visitExprUnary/visitExprLeaf. Per-kindvisitName,visitYield,visitAwait,visitCall,visitLambda,raiseIfAnnotationBlock,raiseIfComprehensionBlock,visitFormattedValue,visitSliceParts.visitNamerecognisessuperand pulls__class__into scope. -
symtable/build_helpers.go:visitArguments,visitParams,visitArgAnnotations,visitAnnotations,visitAnnotation,visitAlias,visitExceptHandler,visitWithItem,visitMatchCase,visitPattern,visitPatternSeq,checkKwdPatterns,visitKeyword,visitTypeParam,visitTypeParamSubexpr,enterTypeParamBlock. Synthetic.type_params,.generic_base,.defaults,.kwdefaultsslots match CPython. -
symtable/build_comp.go:handleNamedExprwalrus retarget,extendNamedExprScopewalking the stack reverse with the class / type-block rejections,handleComprehension/runComprehensionBody/comprehensionTypeForfor list / set / dict / generator forms,visitComprehensionIter,implicitArgfor the.0parameter.
Analysis pass
-
symtable/analyze.go: top-levelanalyzeplusanalyzeBlock,analyzeChildBlock,analyzeName,analyzeCells,dropClassFree,updateSymbols,inlineComprehension,isFreeInAnyChild,errorAtDirective. Splits the original 200-lineanalyze_blockintoprepareClassPreambleSets,finalizeChildSets,analyzeChildren,pickClassEntry,spliceInlinedChildrento satisfy gocognit.nameSetis a smallmap[string]struct{}helper that mirrors PySet operations (add / discard / contains / clone / union).
Errors
-
symtable/errors.go:SyntaxError {Msg, Filename, Pos}implementing theerrorinterface. CPython error strings preserved verbatim in package-level constants:-
msgGlobalParam,msgNonlocalParam,msgGlobalAfterAssign,msgNonlocalAfterAss,msgGlobalAfterUse,msgNonlocalAfterUse,msgGlobalAnnot,msgNonlocalAnnot,msgImportStar. -
msgNamedExprComp,msgNamedExprBound,msgNamedExprAlias,msgNamedExprParam,msgNamedExprConflict,msgNamedExprInner,msgNamedExprIterExpr. -
msgAnnotationNotAllow,msgNotAllowedInTypVar,msgNotAllowedInAlias,msgNotAllowedInParams,msgDupTypeParam,msgDupArgument. -
msgAsyncWithOutside,msgAsyncForOutside,msgAsyncCompOutside,msgAwaitOutsideFunc,msgAwaitOutsideAsync. -
msgAssignDebug,msgDeleteDebug,msgFutureLate. -
msgNonlocalAtModule,msgNoBindingNonlocal,msgNonlocalGlobal,msgNonlocalTypeParam. -
errorf(filename, loc, format, args...)constructor.
-
Tests
-
symtable/symtable_test.go: module / function / class / nested closure / global directive / nonlocal-at-module / nonlocal-no-binding / global-after-use /__debug__rejection / class method flag / private-name mangling in class / list comprehension / import binding / star-import outside module / duplicate argument / SyntaxError shape. -
symtable/mangle_test.go: full panel forMangleandMaybeMangleallow-list.
6. compile package: codegen
Detailed source-of-truth for this section:
1626_gopy_codegen.md. The checklist below is the cross-cutting view. The per-visitor file split, function citations, fblock-stack types, super-instruction contract, with-statement state machine, deferred-annotation panel, PEP 695 type-parameter codegen, and the layered test plan all live in 1626. Tick a box here only after the matching detailed checklist in 1626 is also ticked.
Driver
-
compile/codegen.go:(*Compiler).Codegen(scope *symtable.Entry, mod ast.Mod) (*Unit, error). -
compile/codegen.go:Compilerstruct (current scope stack, unit stack, future flags, filename, source, optimize level, symtable, const cache). -
compile/codegen_addop.go: addOp / addOpI / addOpJump / loadConst / addOpName / useLabel helpers as methods on*Compiler. Macros land as Go methods rather than a separate macros.go.
Statement visitors
- [~] FunctionDef / AsyncFunctionDef (decorators, posonly, kwonly, defaults, kwonly defaults, varargs, varkw, closure cells landed. PEP 695 type parameters and PEP 649 deferred annotations are separate panels later in 1626).
- ClassDef (basic shape: bases, keyword args, decorators). Type parameters / classcell / static-attributes panels land alongside PEP 695 + super().
- Return, Delete, Assign, AugAssign, AnnAssign (full target panel including attr / subscript / tuple / star unpack).
- For / AsyncFor (with break/continue unwind).
- While.
- If (skeleton: constant-condition specialization pending).
- With / AsyncWith (with-statement state machine).
- Match / MatchValue / MatchSingleton / MatchSequence / MatchMapping / MatchClass / MatchStar / MatchAs / MatchOr.
- Raise (with
from). - Try / TryStar (PEP 654).
- Assert (with
__debug__short-circuit). - Import / ImportFrom (with star).
- Global / Nonlocal (no-op at codegen, already in symtable).
- Expr (top-level expression: CALL_INTRINSIC_1 INTRINSIC_PRINT in interactive mode).
- Pass / Break / Continue (loop fblock walk; full unwind through with / try lands alongside those visitors).
- Delete / AugAssign / AnnAssign.
- TypeAlias (PEP 695, lowered as
LOAD_CONST <name>, LOAD_CONST None, <value>, CALL_INTRINSIC_1 INTRINSIC_TYPEALIAS=12, STORE_NAME).
Expression visitors
- BoolOp (short-circuit jumps).
- NamedExpr (walrus).
- BinOp.
- UnaryOp.
- Lambda.
- IfExp.
- Dict, Set, List, Tuple displays.
- ListComp, SetComp, DictComp, GeneratorExp.
- Await, Yield, YieldFrom.
- Compare (chained compares).
- Call (with star/starstar args, keyword args).
- FormattedValue, JoinedStr.
- [~] Interpolation, TemplateStr (PEP 750 t-strings; deferred).
- Constant (LOAD_CONST + co_consts allocation).
- Attribute (LOAD_ATTR + super-instruction LOAD_SUPER_ATTR).
- Subscript (LOAD/STORE/DELETE).
- Starred (in target / arg positions).
- Name (LOAD_FAST / LOAD_DEREF / LOAD_GLOBAL / LOAD_NAME by scope).
- Slice.
Block / unwind machinery
- Frame block stack (
pushFblock,popFblock, unwind helpers incodegen_fblock.go). - Try/except/finally exception table emission.
- With unwinding (call exit on path).
- Async-with unwinding (await aexit).
- [~] Generator return-value handling (RETURN_GENERATOR + RESUME prologue landed; full StopIteration packaging refined in v0.6 vm).
- [~] Coroutine close handling (CoCoroutine flag set; close protocol landed in v0.6 vm).
Tests
-
compile/codegen_test.go: per-statement smoke tests against a parser-stub feeding hand-built ASTs. - Comprehension scope test (free var capture).
- Match-statement panel (one test per pattern kind).
- TypeAlias codegen test pinned via the
type_aliasgolden (v05test/testdata/golden/type_alias.golden). - [~] PEP 695 type-parameter codegen test for TypeVar / ParamSpec /
TypeVarTuple bodies. Validator path covered by
validateTypeParams; full codegen golden lands alongside the generic-class panel in v0.6.
7. compile package: flowgraph
Detailed source-of-truth for this section:
1627_gopy_flowgraph.md. The checklist below is the cross-cutting view. The per-pass file split, exact pass ordering insideOptimizeCodeUnit, const-cache structure, optimizeLoadFast ref-stack panel, and the layered test plan live in 1627.
- [~]
compile/flowgraph.go:-
BasicBlockstruct (Instrs, Next, Label, StartDepth, Predecessors, Visited, Cold, Warm, Reachable). -
Builderstruct (Head, Tail, labelMap). -
FromSequence(*Sequence) (*Builder, error). - [~]
Optimize(*Sequence, *[]any, nlocals, firstLineno) (*Info, error): runs the v0.5 subset of_PyCfg_OptimizeCodeUnitagainst the flat sequence. CFG-driven passes are queued for the follow-on. -
(*Builder).ToSequence() (*Sequence, error). - Jump-label resolution via
Sequence.ApplyLabelMap(HasTarget).
-
Optimization passes (each is a parity target)
- LOAD_CONST chain folding (multi-step).
Optimizenow drivesfoldBinaryIntConstpluseliminateDeadCodeAfterTerminatorto a fixed point so a fold that exposes a new triple gets caught in the same pass. - Int-int
BINARY_OPfolding (foldBinaryIntConst). - Jump threading (
JUMP -> JUMP -> X) viathreadJumps. - Conditional-jump propagation
(
POP_JUMP_IF_TRUEover a tail-conditional block) viapropagateConditionalJumps. - Unreachable block elimination via
removeUnreachableBlocks(DFS reachability with handler labels pinned as roots; length preserving so the label map and jump opargs stay valid). - Dead-code elimination after unconditional terminators.
- [~] Stack-effect verification (forward linear scan via
calculateStackdepth; CFG-based variant pending). -
RESUMEinsertion at entry (codegen prologue). - Implicit
LOAD_CONST None/RETURN_VALUEfor fall-through function returns. - EXTENDED_ARG insertion for wide opargs (assemble-time).
- Push-null fix-up for super-instruction call sites.
- Redundant-NOP compaction (
removeRedundantNops).
Tests
-
compile/flowgraph_test.goandflowgraph_passes_test.go: hand-built sequences exercising fold / dead-code / NOP compact / label resolution / stack depth.
8. compile package: assemble
Detailed source-of-truth for this section:
1628_gopy_assemble.md. The checklist below is the cross-cutting view. The PEP 626 line-table dispatcher, PEP 657 exception-table varint encoding, the co_localsplus / fastlocalskinds layout, and the marshal-parity test plan live in 1628.
-
compile/assemble.go:- Assembler internals (Code, LineTable, ExceptionTable, Consts, Names, Varnames, Freevars, Cellvars).
-
Assemble(seq *Sequence, info *Info, unit *Unit, filename string) (*Code, error).
- EXTENDED_ARG wide-oparg expansion.
- PEP 626 location table (short / one-line / long / no-location /
no-column varint forms in
assemble_locations.go). - PEP 657 exception table (start, end, target, depth, lasti as
6-bit varint deltas in
assemble_exceptions.go). - Constants table allocation (de-dup by
(typeTag, value)with float bit-pattern keying for NaN-safety incodegen_addop.go). - Names / Varnames / Freevars / Cellvars population from symtable.
-
co_flagsassembly (CoOptimized, CoNewLocals, CoVarargs, CoVarkeywords, CoNested, CoGenerator, CoNoFree, CoCoroutine, CoMethod). CoIterableCoroutine and CoAsyncGenerator land alongside the generator/coroutine state machines. -
co_qualnamebuild-up viabuildQualnamewalking the unit stack (top-level / class parent / function<locals>parent). -
co_codebyte emission. -
compile/assemble_test.goplusassemble_locations_test.go,assemble_exceptions_test.go,assemble_flags_test.go:- EXTENDED_ARG widening.
- Location table varint round-trip.
- Exception table varint round-trip.
- Const dedup parity with CPython (type-keyed).
9. compile package: driver
-
compile/compiler.go:-
Compile(mod ast.Mod, filename string, optimize int) (*Code, error). - Pipeline order: future.FromAST -> symtable.Build -> Codegen (per-scope) -> Optimize -> Assemble. Validate / Preprocess wire in alongside the parser handover; the v0.5 entry is parser-stub fed.
- [~]
optimizelevels -1, 0, 1, 2 mirror CPython (level threaded through; level-2 docstring removal lands in preprocess; full-Opanel beyond docstring/assert lands in v0.6).
-
-
compile/compiler_test.go+ thev05testgate package: end-to-end on hand-built ASTs.
10. tokenize package skeleton (1665)
-
tokenize/types.go: hand-writtenTypedeclaration plusString()lookup. The numeric constants live in the generated file. -
tools/tokens_go/: generator fromGrammar/TokensplusInclude/internal/pycore_token.hplusLib/token.py. -
tokenize/types_gen.go: generator output. 69 token kinds (ENDMARKER=0..ENCODING=68) plustokenNamestable. - [~]
tokenize/tokenize.go: skeleton (Iter,Token,New,NewReadline,Next) lands with the v0.9 lexer port; v0.5 ships the type table only since the gate uses hand-built ASTs. -
tokenize/tokenize_test.go: type numeric pinning plusType.Stringlookup. Iterator contract tests land in v0.9.
11. dis-equivalent disassembler (gate support)
-
compile/dis.go:Disassemble(co *Code) stringrendering bytecode plus recursive headers for nested code objects. Used by the v05test gate for structural assertions. -
compile/dis_test.go: opcode rendering, oparg display, EXTENDED_ARG recombination, nested-code header. Byte-equal comparison against a CPython golden capture lands with the marshal package.
12. v05test cross-cut gate
- [~]
v05test/gate_test.go: structural panel landed; the byte-equal marshal-roundtrip variant waits on the marshal package and golden corpus.-
TestGateEmptyModule: empty module returnsNone. -
TestGateSimpleAssign:x = 1. -
TestGateBinaryAdd:a = 1 + 2(asserts the int-int fold). -
TestGateLoadAfterStore:x = 1; x. -
TestGateIfWhile:if/whilepanel (asserts POP_JUMP_IF_FALSE; the JUMP back-edge is asserted once pseudo-op lowering lands). - [~]
TestGateTryExcept: wired butt.Skip'd pending CFG-based stack-depth (handler entry seeding). -
TestGateDef:def f(x): return x + 1. - [~]
TestGateComprehension: wired butt.Skip'd pending CFG back-edge stack-depth. -
TestGateAsyncFunction:async def f(): pass(asserts CoCoroutine on the inner code object). - [n]
TestGateMarshalRoundtrip: marshal-byte parity against CPython is deferred to v0.8 with the import system. Until then the disassembly-text golden corpus (1629) is the gate. The v0.8 follow-on adds a code-object marshal arm and a byte-for-byte panel against host CPython.
-
-
v05test/testdata/golden/: ten checked-in.goldendisassembly snapshots (empty_module,simple_assign,binary_add,load_after_store,if_pass,while_pass,def_add_one,async_def_pass,class_pass,type_alias). Refresh contract viago test ./v05test/ -update -run TestGolden; see 1629 for the corpus rules.
13. Release plumbing for v0.5.0
- CHANGELOG: v0.5.0 entry.
-
changelog/v0.5.0.md: full release notes. -
build/version.go: bump to0.5.0for the release commit. - PR with all-green CI (lint + test on macOS, Linux, Windows).
- Tag
v0.5.0and create GitHub release. Pending explicit release go-ahead. - Bump
mainto0.6.0-devpost-release.
14. Docs and side artifacts
- Update
1602_gopy_filemap.mdwith the v0.5 entries. - Update
1603_gopy_roadmap.mdto reflect what landed for v0.5: CFG-driven optimisation passes, validate panel, TypeAlias codegen, disassembly golden corpus (1629). The "shipped" marker flips with the v0.5.0 tag. - [n] Cross-reference
1690_gopy_quirks.mdwhen codegen or assemble decisions diverge from a literal C port. 1690 is reserved (v0.5 port has no codegen / assemble divergences worth recording).
Working notes (carry forward)
- The
ast/nodes.gohand-written file is a temporary scaffold so thatfutureand other early-v0.5 ports compile. It must shrink oncenodes_gen.golands; onlyPos,NoPos, andIsDocStringshould remain in non-generated files. compile.Sequence.ApplyLabelMaptakes ahasTargetpredicate callback to keepinstrseq.goindependent of the opcode metadata. Oncecompile/opcodes_gen.golands, callers should passcompile.HasTargetdirectly (a generated helper), and a thin wrapper(*Sequence).ApplyLabels()may be added for ergonomics.- The asdl generator and the opcode generator share style: both read a
CPython-source-tree input file at
go generatetime and emit a checked-in_gen.go. Generators live undertools/and are not compiled into the runtime binary. - The dis-equivalent in
compile/dis.gois a tool, not part of the runtime API surface; it lives in thecompilepackage only because it needs the opcode metadata. The Python-visibledismodule port comes much later (stdlib effort).
Gate
src := "a = 1 + 2"
code := compile.Compile(parser.Parse(src), "<gate>", 0)
got := dis.Format(code)
want := "<output captured from CPython 3.14 dis.dis>"
if got != want { t.Fatal(got) }
Plus byte-equal checks on co_code, co_linetable, and
co_exceptiontable for a small panel of programs (assignment, if,
while, try/except, def, comprehension, async function).
Out of scope for v0.5
- PEG parser (
cpython/Parser/). Separate spec series. - Tier-2 optimizer (
optimizer*.c). Lands in v0.12. - Specialization (
specialize.c). Lands in v0.11. - Instrumentation hooks. Lands in v0.11.
- JIT (
jit.c). Indefinitely deferred.