1628. gopy assemble
Port of cpython/Python/assemble.c (802 lines) to
gopy/compile/assemble.go. This spec is the detailed source-of-truth
for section 8 of 1620.
What assemble does
Assemble takes the optimised instrseq.Sequence from flowgraph (1627)
plus the per-unit metadata from codegen (1626) and produces a final
*objects.Code ready to hand to the VM (1630) or to marshal to a
.pyc file (1623).
The work is mostly serialisation. There are no further optimisations. Three byte streams come out:
co_code: instruction stream packed as_Py_CODEUNIT(one byte opcode + one byte oparg, with EXTENDED_ARG widening for args > 255).co_linetable: PEP 626 location table in the 3.14 varint encoding.co_exceptiontable: PEP 657 exception handler table in varint form.
Plus the materialised pools:
co_consts: tuple of constant objects.co_names: tuple of identifier strings (LOAD_GLOBAL / LOAD_ATTR targets).co_varnames: tuple of local variable names (parameters first).co_freevars: tuple of names captured from outer scopes.co_cellvars: tuple of names that are cells in this scope.co_localsplusnamesandco_localspluskinds: 3.11+ flat layout.co_qualname,co_name,co_filename,co_firstlineno,co_argcount,co_posonlyargcount,co_kwonlyargcount,co_stacksize,co_flags,co_nlocals.
Boundary contract:
Input:
*instrseq.Sequence (post-flowgraph)
*flowgraph.Info (max stackdepth, exception table entries)
*codegen.Unit (consts, names, varnames, freevars, cellvars, flags)
filename string
Output:
*objects.Code (ready for marshal or VM)
File layout
compile/assemble.go is small enough to fit in two files:
| Go file | CPython lines | Contents |
|---|---|---|
assemble.go | 1-200, 460-779 | Assembler struct, init, Assemble public entry, code-stream emit, code object construction |
assemble_locations.go | 196-368 | location-info varint emit (PEP 626 4-bit panel) |
assemble_exceptions.go | 89-194 | exception-table varint emit (PEP 657 panel) |
assemble_consts.go | 457-573 | co_consts dedup, co_names / co_varnames materialisation |
Each function carries // CPython: assemble.c:L<n> <name>.
Public surface
package compile
// Assemble builds a final Code object from the post-flowgraph Sequence
// plus per-unit metadata.
func Assemble(seq *instrseq.Sequence, fg *flowgraph.Info, unit *Unit, filename string) (*objects.Code, error)
// Assembler is the per-call state. Public only so tests can drive
// individual phases.
type Assembler struct {
Filename string
FirstLineno int
InstrSize int // byte length of the code stream
Code []byte // co_code
LineTable []byte // co_linetable
ExceptionTable []byte // co_exceptiontable
}
The Unit type is the same struct codegen produced (1626 spec). It
already carries every list assemble needs.
Code stream emission
writeInstr (assemble.c:369-411) packs each instruction into the
_Py_CODEUNIT format. CPython 3.14:
| opcode (1 byte) | oparg (1 byte) | inline cache (variable) |
Opargs > 255 widen via EXTENDED_ARG prefixes; up to three prefixes
gives a 32-bit oparg. The number of cache lines per opcode is fixed
per opcode (the table is in compile/opcodes_gen.go from the 1620
section 4 generator).
Algorithm:
func (a *Assembler) emitInstr(instr *Instr) error {
arg := instr.Oparg
extended := []int{}
for arg > 0xff {
extended = append([]int{arg & 0xff}, extended...)
arg >>= 8
}
for _, ea := range extended {
a.Code = append(a.Code, EXTENDED_ARG, byte(ea))
}
a.Code = append(a.Code, byte(instr.Opcode), byte(arg & 0xff))
// pad inline caches with zero bytes
caches := opmeta.CacheCount(instr.Opcode)
for i := 0; i < caches; i++ {
a.Code = append(a.Code, 0, 0) // each cache entry is one CODEUNIT
}
return nil
}
Line and exception entries reference byte offsets into co_code so
they emit after the code stream is finalised.
Line table (PEP 626)
The location table is a varint stream with a 4-bit "code" field selecting one of five record formats. CPython: assemble.c:196-336.
Record formats (3.14):
- Code 0-9: short form, 1-line span, fits in two bytes.
- Code 10: one-line form, line delta in next varint, columns in next two varints.
- Code 11: no-column form (start_line and end_line, no columns).
- Code 13: long form (full lineno / col / end_lineno / end_col as signed varints).
- Code 15: no-location.
Encoding helpers:
func writeLocByte(a *Assembler, b byte) { a.LineTable = append(a.LineTable, b) }
func writeVarint(a *Assembler, v uint32) {
for v >= 0x40 {
writeLocByte(a, byte(v) & 0x3f | 0x40)
v >>= 6
}
writeLocByte(a, byte(v) & 0x3f)
}
func writeSignedVarint(a *Assembler, v int32) {
if v < 0 {
writeVarint(a, uint32(-v) << 1 | 1)
} else {
writeVarint(a, uint32(v) << 1)
}
}
writeLocationInfoEntry (assemble.c:286-322) picks the smallest form:
func writeLocInfoEntry(a *Assembler, loc ast.Pos, isize int) {
if loc == nil || loc.Lineno < 0 {
writeLocInfoNone(a, isize); return
}
lineDelta := loc.Lineno - a.lineCursor
sameLine := loc.EndLineno == loc.Lineno
if sameLine && loc.ColOffset >= 0 && loc.EndColOffset >= 0 &&
loc.ColOffset < 80 && (loc.EndColOffset - loc.ColOffset) < 16 &&
lineDelta == 0 && isize <= 8 {
writeLocInfoShortForm(a, isize, loc.ColOffset, loc.EndColOffset)
} else if sameLine && loc.ColOffset >= 0 && loc.EndColOffset >= 0 &&
isize <= 8 && lineDelta >= 0 && lineDelta < 8 {
writeLocInfoOnelineForm(a, isize, lineDelta, loc.ColOffset, loc.EndColOffset)
} else if loc.ColOffset == -1 && loc.EndColOffset == -1 {
writeLocInfoNoColumn(a, isize, lineDelta)
} else {
writeLocInfoLongForm(a, loc, isize)
}
a.lineCursor = loc.Lineno
}
Test panel: compile/assemble_loc_test.go. Required cases:
- all-same-line range (short form)
- one-line delta (oneline form)
- no-column (no-column form)
- full long form
- none / -1 sentinel
- every record at a code-stream offset that crosses an isize boundary
Exception table (PEP 657)
Varint records: start_offset / end_offset / target_offset / depth_lasti.
CPython: assemble.c:106-194.
struct ExceptHandler {
Start uint32 // byte offset into co_code, inclusive
End uint32 // byte offset into co_code, exclusive
Target uint32 // byte offset of the handler entry
Depth uint8 // stack depth
Lasti bool // PEP 657 push-lasti bit
}
byte 0 of each entry has the MSB set; subsequent bytes do not (so a
parser can find entry boundaries).
assembleExceptionTableEntry collapses adjacent runs that share
target / depth / lasti into a single entry. The ordering is by start
offset ascending.
Tests in compile/assemble_except_test.go:
- single entry (try-except)
- two entries with shared target (or-pattern)
- nested try (entries with same handler but different depth)
- try in generator (lasti=true)
- empty table (function with no exception handlers)
Pool materialisation
co_consts
Unit.Consts is already deduped by codegen (1626) and trimmed by
flowgraph's removeUnusedConsts (1627). Assemble just freezes it into
a tuple. CPython freezes via PyTuple_Pack; we use a []any and let
the marshal package (1623) handle wire encoding.
co_names / co_varnames / co_freevars / co_cellvars
Each is a deduplicated insertion-ordered string list. Unit already
carries them in the right order (codegen built them). Assemble copies
the slice into the Code object.
co_localsplus and co_localspluskinds
Flat layout (3.11+). For a function with argcount=2, kwonly=0,
varargs=0, varkw=0, two cells, one free var:
co_localsplus = [arg0, arg1, local0, local1, cell0, cell1, free0]
co_localspluskinds = [
FAST_LOCAL, // arg0
FAST_LOCAL, // arg1
FAST_LOCAL, // local0
FAST_LOCAL, // local1
FAST_CELL, // cell0
FAST_CELL, // cell1
FAST_FREE, // free0
]
FAST_HIDDEN bit is OR'd in for synthetic locals (lambda implicit
arg, comprehension .0 parameter). The flag values match CPython's
pycore_code.h:
const (
FastLocal uint8 = 0x20
FastCell uint8 = 0x40
FastFree uint8 = 0x80
FastHidden uint8 = 0x10
)
CPython: assemble.c:483-572.
Code object construction
AssembleCodeObject (assemble.c:574-672) packs everything into a
PyCodeObject:
return &objects.Code{
Argcount: unit.Argcount,
PosOnlyArgCount: unit.PosOnlyArgCount,
KwOnlyArgCount: unit.KwOnlyArgCount,
NLocals: len(unit.VarNames),
Stacksize: fg.MaxStackDepth,
Flags: unit.Flags,
Code: a.Code,
Consts: unit.Consts,
Names: unit.Names,
LocalsPlusNames: localsplusnames,
LocalsPlusKinds: localspluskinds,
Filename: filename,
Name: unit.Name,
Qualname: unit.Qualname,
Firstlineno: unit.FirstLineno,
Linetable: a.LineTable,
ExceptionTable: a.ExceptionTable,
}
objects.Code is owned by the 1700-series spec (objects). For v0.5
we only need a shape that 1623 marshal can serialise. Mark the
objects.Code struct as a forward declaration in the 1620 spec; the
full port lands when objects/codeobject.c is in scope.
Comprehensive test plan
Layer 1: Per-pass unit
For every function in this spec, one unit test under
compile/assemble_*_test.go:
| Function | Test file | Cases |
|---|---|---|
assembleInit | assemble_test.go | initial state, firstlineno propagation |
emitInstr | assemble_emit_test.go | each oparg width (0/1/2/3 EXTENDED_ARG), every opcode family |
writeVarint / writeSignedVarint | assemble_varint_test.go | zero, small, max-1-byte, max-2-byte, negative, max int32 |
writeLocInfoShortForm | assemble_loc_test.go | columns 0-79, isize 1-8 |
writeLocInfoOnelineForm | assemble_loc_test.go | line-delta 1-7, columns |
writeLocInfoLongForm | assemble_loc_test.go | every signed range |
writeLocInfoNoColumn | assemble_loc_test.go | line-delta only |
writeLocInfoNone | assemble_loc_test.go | -1 lineno |
writeLocInfoEntry dispatcher | assemble_loc_test.go | every form picked correctly |
assembleEmitExceptionTable | assemble_except_test.go | single, multiple, nested, lasti, empty |
assembleEmitExceptionTableItem | assemble_except_test.go | varint MSB encoding |
assembleConsts | assemble_consts_test.go | dedup int, float, str, bytes, tuple, frozenset, complex |
assembleLocalsplus | assemble_localsplus_test.go | function, method, class, generator, comprehension scope |
Assemble driver | assemble_driver_test.go | end-to-end on a hand-built Unit |
Layer 2: Round-trip vs CPython linetable
compile/assemble_loc_parity_test.go (//go:build cpython): for the
parity corpus (~120 snippets), assert co_linetable byte-equal.
Layer 3: Round-trip vs CPython exception table
compile/assemble_except_parity_test.go (//go:build cpython): for
every snippet involving try/except/finally/with/match in the parity
corpus, assert co_exceptiontable byte-equal.
Layer 4: Marshal parity (full)
compile/marshal_parity_test.go (//go:build cpython): for the parity
corpus, marshal both CPython and gopy code objects and assert
marshal.dumps(co) byte-equal. This is the gate test for v0.5.
The corpus is shared with 1626 and 1627 (single source of truth at
compile/testdata/parity_corpus/*.py).
Lint and refactor budget
writeLocationInfoEntry is the only function over the cyclomatic
limit. Split into:
pickLocFormat(loc, isize, lineCursor) locFormatwriteLocByForm(a, form, loc, isize)
Everything else is straight-line varint emission and stays under the limit naturally.
Citation policy
Same as 1626 and 1627. Every Go function gets // CPython: assemble.c:L<n> <name>. Helpers split for lint carry (extracted helper).
Order of work
- Skeleton:
Assemblerstruct,Assemblepublic entry that returns nil. Tests for the empty path. emitInstr: one opcode at a time, with EXTENDED_ARG widening, verified by golden bytes.- Varint helpers: stand-alone tests against hand-computed bytes.
- Line table: short / oneline / long / no-column / none, in that order. Layer 2 parity gates this.
- Exception table: single, then multi, then nested. Layer 3 parity gates this.
- Pool materialisation: consts, names, localsplus.
- Final
Assembledriver wires all pieces. - Cross-check with 1626 + 1627 outputs on the parity corpus.
Each step lands as one PR. Assemble is "done" when Layer 4 marshal parity passes for the entire corpus.