Skip to main content

1628. gopy assemble

Port of cpython/Python/assemble.c (802 lines) to gopy/compile/assemble.go. This spec is the detailed source-of-truth for section 8 of 1620.

What assemble does

Assemble takes the optimised instrseq.Sequence from flowgraph (1627) plus the per-unit metadata from codegen (1626) and produces a final *objects.Code ready to hand to the VM (1630) or to marshal to a .pyc file (1623).

The work is mostly serialisation. There are no further optimisations. Three byte streams come out:

  1. co_code: instruction stream packed as _Py_CODEUNIT (one byte opcode + one byte oparg, with EXTENDED_ARG widening for args > 255).
  2. co_linetable: PEP 626 location table in the 3.14 varint encoding.
  3. co_exceptiontable: PEP 657 exception handler table in varint form.

Plus the materialised pools:

  • co_consts: tuple of constant objects.
  • co_names: tuple of identifier strings (LOAD_GLOBAL / LOAD_ATTR targets).
  • co_varnames: tuple of local variable names (parameters first).
  • co_freevars: tuple of names captured from outer scopes.
  • co_cellvars: tuple of names that are cells in this scope.
  • co_localsplusnames and co_localspluskinds: 3.11+ flat layout.
  • co_qualname, co_name, co_filename, co_firstlineno, co_argcount, co_posonlyargcount, co_kwonlyargcount, co_stacksize, co_flags, co_nlocals.

Boundary contract:

Input:
*instrseq.Sequence (post-flowgraph)
*flowgraph.Info (max stackdepth, exception table entries)
*codegen.Unit (consts, names, varnames, freevars, cellvars, flags)
filename string

Output:
*objects.Code (ready for marshal or VM)

File layout

compile/assemble.go is small enough to fit in two files:

Go fileCPython linesContents
assemble.go1-200, 460-779Assembler struct, init, Assemble public entry, code-stream emit, code object construction
assemble_locations.go196-368location-info varint emit (PEP 626 4-bit panel)
assemble_exceptions.go89-194exception-table varint emit (PEP 657 panel)
assemble_consts.go457-573co_consts dedup, co_names / co_varnames materialisation

Each function carries // CPython: assemble.c:L<n> <name>.

Public surface

package compile

// Assemble builds a final Code object from the post-flowgraph Sequence
// plus per-unit metadata.
func Assemble(seq *instrseq.Sequence, fg *flowgraph.Info, unit *Unit, filename string) (*objects.Code, error)

// Assembler is the per-call state. Public only so tests can drive
// individual phases.
type Assembler struct {
Filename string
FirstLineno int
InstrSize int // byte length of the code stream
Code []byte // co_code
LineTable []byte // co_linetable
ExceptionTable []byte // co_exceptiontable
}

The Unit type is the same struct codegen produced (1626 spec). It already carries every list assemble needs.

Code stream emission

writeInstr (assemble.c:369-411) packs each instruction into the _Py_CODEUNIT format. CPython 3.14:

| opcode (1 byte) | oparg (1 byte) | inline cache (variable) |

Opargs > 255 widen via EXTENDED_ARG prefixes; up to three prefixes gives a 32-bit oparg. The number of cache lines per opcode is fixed per opcode (the table is in compile/opcodes_gen.go from the 1620 section 4 generator).

Algorithm:

func (a *Assembler) emitInstr(instr *Instr) error {
arg := instr.Oparg
extended := []int{}
for arg > 0xff {
extended = append([]int{arg & 0xff}, extended...)
arg >>= 8
}
for _, ea := range extended {
a.Code = append(a.Code, EXTENDED_ARG, byte(ea))
}
a.Code = append(a.Code, byte(instr.Opcode), byte(arg & 0xff))
// pad inline caches with zero bytes
caches := opmeta.CacheCount(instr.Opcode)
for i := 0; i < caches; i++ {
a.Code = append(a.Code, 0, 0) // each cache entry is one CODEUNIT
}
return nil
}

Line and exception entries reference byte offsets into co_code so they emit after the code stream is finalised.

Line table (PEP 626)

The location table is a varint stream with a 4-bit "code" field selecting one of five record formats. CPython: assemble.c:196-336.

Record formats (3.14):

  • Code 0-9: short form, 1-line span, fits in two bytes.
  • Code 10: one-line form, line delta in next varint, columns in next two varints.
  • Code 11: no-column form (start_line and end_line, no columns).
  • Code 13: long form (full lineno / col / end_lineno / end_col as signed varints).
  • Code 15: no-location.

Encoding helpers:

func writeLocByte(a *Assembler, b byte) { a.LineTable = append(a.LineTable, b) }

func writeVarint(a *Assembler, v uint32) {
for v >= 0x40 {
writeLocByte(a, byte(v) & 0x3f | 0x40)
v >>= 6
}
writeLocByte(a, byte(v) & 0x3f)
}

func writeSignedVarint(a *Assembler, v int32) {
if v < 0 {
writeVarint(a, uint32(-v) << 1 | 1)
} else {
writeVarint(a, uint32(v) << 1)
}
}

writeLocationInfoEntry (assemble.c:286-322) picks the smallest form:

func writeLocInfoEntry(a *Assembler, loc ast.Pos, isize int) {
if loc == nil || loc.Lineno < 0 {
writeLocInfoNone(a, isize); return
}
lineDelta := loc.Lineno - a.lineCursor
sameLine := loc.EndLineno == loc.Lineno
if sameLine && loc.ColOffset >= 0 && loc.EndColOffset >= 0 &&
loc.ColOffset < 80 && (loc.EndColOffset - loc.ColOffset) < 16 &&
lineDelta == 0 && isize <= 8 {
writeLocInfoShortForm(a, isize, loc.ColOffset, loc.EndColOffset)
} else if sameLine && loc.ColOffset >= 0 && loc.EndColOffset >= 0 &&
isize <= 8 && lineDelta >= 0 && lineDelta < 8 {
writeLocInfoOnelineForm(a, isize, lineDelta, loc.ColOffset, loc.EndColOffset)
} else if loc.ColOffset == -1 && loc.EndColOffset == -1 {
writeLocInfoNoColumn(a, isize, lineDelta)
} else {
writeLocInfoLongForm(a, loc, isize)
}
a.lineCursor = loc.Lineno
}

Test panel: compile/assemble_loc_test.go. Required cases:

  • all-same-line range (short form)
  • one-line delta (oneline form)
  • no-column (no-column form)
  • full long form
  • none / -1 sentinel
  • every record at a code-stream offset that crosses an isize boundary

Exception table (PEP 657)

Varint records: start_offset / end_offset / target_offset / depth_lasti. CPython: assemble.c:106-194.

struct ExceptHandler {
Start uint32 // byte offset into co_code, inclusive
End uint32 // byte offset into co_code, exclusive
Target uint32 // byte offset of the handler entry
Depth uint8 // stack depth
Lasti bool // PEP 657 push-lasti bit
}

byte 0 of each entry has the MSB set; subsequent bytes do not (so a
parser can find entry boundaries).

assembleExceptionTableEntry collapses adjacent runs that share target / depth / lasti into a single entry. The ordering is by start offset ascending.

Tests in compile/assemble_except_test.go:

  • single entry (try-except)
  • two entries with shared target (or-pattern)
  • nested try (entries with same handler but different depth)
  • try in generator (lasti=true)
  • empty table (function with no exception handlers)

Pool materialisation

co_consts

Unit.Consts is already deduped by codegen (1626) and trimmed by flowgraph's removeUnusedConsts (1627). Assemble just freezes it into a tuple. CPython freezes via PyTuple_Pack; we use a []any and let the marshal package (1623) handle wire encoding.

co_names / co_varnames / co_freevars / co_cellvars

Each is a deduplicated insertion-ordered string list. Unit already carries them in the right order (codegen built them). Assemble copies the slice into the Code object.

co_localsplus and co_localspluskinds

Flat layout (3.11+). For a function with argcount=2, kwonly=0, varargs=0, varkw=0, two cells, one free var:

co_localsplus = [arg0, arg1, local0, local1, cell0, cell1, free0]
co_localspluskinds = [
FAST_LOCAL, // arg0
FAST_LOCAL, // arg1
FAST_LOCAL, // local0
FAST_LOCAL, // local1
FAST_CELL, // cell0
FAST_CELL, // cell1
FAST_FREE, // free0
]

FAST_HIDDEN bit is OR'd in for synthetic locals (lambda implicit arg, comprehension .0 parameter). The flag values match CPython's pycore_code.h:

const (
FastLocal uint8 = 0x20
FastCell uint8 = 0x40
FastFree uint8 = 0x80
FastHidden uint8 = 0x10
)

CPython: assemble.c:483-572.

Code object construction

AssembleCodeObject (assemble.c:574-672) packs everything into a PyCodeObject:

return &objects.Code{
Argcount: unit.Argcount,
PosOnlyArgCount: unit.PosOnlyArgCount,
KwOnlyArgCount: unit.KwOnlyArgCount,
NLocals: len(unit.VarNames),
Stacksize: fg.MaxStackDepth,
Flags: unit.Flags,
Code: a.Code,
Consts: unit.Consts,
Names: unit.Names,
LocalsPlusNames: localsplusnames,
LocalsPlusKinds: localspluskinds,
Filename: filename,
Name: unit.Name,
Qualname: unit.Qualname,
Firstlineno: unit.FirstLineno,
Linetable: a.LineTable,
ExceptionTable: a.ExceptionTable,
}

objects.Code is owned by the 1700-series spec (objects). For v0.5 we only need a shape that 1623 marshal can serialise. Mark the objects.Code struct as a forward declaration in the 1620 spec; the full port lands when objects/codeobject.c is in scope.

Comprehensive test plan

Layer 1: Per-pass unit

For every function in this spec, one unit test under compile/assemble_*_test.go:

FunctionTest fileCases
assembleInitassemble_test.goinitial state, firstlineno propagation
emitInstrassemble_emit_test.goeach oparg width (0/1/2/3 EXTENDED_ARG), every opcode family
writeVarint / writeSignedVarintassemble_varint_test.gozero, small, max-1-byte, max-2-byte, negative, max int32
writeLocInfoShortFormassemble_loc_test.gocolumns 0-79, isize 1-8
writeLocInfoOnelineFormassemble_loc_test.goline-delta 1-7, columns
writeLocInfoLongFormassemble_loc_test.goevery signed range
writeLocInfoNoColumnassemble_loc_test.goline-delta only
writeLocInfoNoneassemble_loc_test.go-1 lineno
writeLocInfoEntry dispatcherassemble_loc_test.goevery form picked correctly
assembleEmitExceptionTableassemble_except_test.gosingle, multiple, nested, lasti, empty
assembleEmitExceptionTableItemassemble_except_test.govarint MSB encoding
assembleConstsassemble_consts_test.godedup int, float, str, bytes, tuple, frozenset, complex
assembleLocalsplusassemble_localsplus_test.gofunction, method, class, generator, comprehension scope
Assemble driverassemble_driver_test.goend-to-end on a hand-built Unit

Layer 2: Round-trip vs CPython linetable

compile/assemble_loc_parity_test.go (//go:build cpython): for the parity corpus (~120 snippets), assert co_linetable byte-equal.

Layer 3: Round-trip vs CPython exception table

compile/assemble_except_parity_test.go (//go:build cpython): for every snippet involving try/except/finally/with/match in the parity corpus, assert co_exceptiontable byte-equal.

Layer 4: Marshal parity (full)

compile/marshal_parity_test.go (//go:build cpython): for the parity corpus, marshal both CPython and gopy code objects and assert marshal.dumps(co) byte-equal. This is the gate test for v0.5.

The corpus is shared with 1626 and 1627 (single source of truth at compile/testdata/parity_corpus/*.py).

Lint and refactor budget

writeLocationInfoEntry is the only function over the cyclomatic limit. Split into:

  • pickLocFormat(loc, isize, lineCursor) locFormat
  • writeLocByForm(a, form, loc, isize)

Everything else is straight-line varint emission and stays under the limit naturally.

Citation policy

Same as 1626 and 1627. Every Go function gets // CPython: assemble.c:L<n> <name>. Helpers split for lint carry (extracted helper).

Order of work

  1. Skeleton: Assembler struct, Assemble public entry that returns nil. Tests for the empty path.
  2. emitInstr: one opcode at a time, with EXTENDED_ARG widening, verified by golden bytes.
  3. Varint helpers: stand-alone tests against hand-computed bytes.
  4. Line table: short / oneline / long / no-column / none, in that order. Layer 2 parity gates this.
  5. Exception table: single, then multi, then nested. Layer 3 parity gates this.
  6. Pool materialisation: consts, names, localsplus.
  7. Final Assemble driver wires all pieces.
  8. Cross-check with 1626 + 1627 outputs on the parity corpus.

Each step lands as one PR. Assemble is "done" when Layer 4 marshal parity passes for the entire corpus.