Skip to main content

1621. gopy bytecodes DSL

What we are porting

CPython's bytecode interpreter is generated. The source of truth is Python/bytecodes.c, a C file written in a small DSL. The upstream generator lives in Tools/cases_generator/ (Python), walks the DSL, and emits five C headers:

Generated fileConsumer
Python/generated_cases.c.hPython/ceval.c
Python/executor_cases.c.hPython/optimizer.c (Tier-2)
Python/optimizer_cases.c.hPython/optimizer.c (analysis)
Include/internal/pycore_opcode_metadata.hshared metadata
Include/internal/pycore_uop_metadata.hTier-2 micro-op metadata

For v0.6 we need just two: the Tier-1 dispatch handlers and the shared metadata. The Tier-2 outputs ship in v0.12.

Strategy

Same shape as 1642 (parser_gen): hand-port the DSL parser to Go and write a Go-emitting backend. We do not wrap the upstream Python generator. Two reasons:

  1. The output we want (Go switch arms calling typed object helpers) is structurally different from the C output (computed gotos with macro-expanded stack ops). A backend swap is more surgical than wrapping plus translating.
  2. CPython updates bytecodes.c nearly every release. Owning the parser end-to-end means a CPython rebase is one regeneration plus a drift check, not a transitive Python toolchain dep.

The DSL itself is small, well-documented in Tools/cases_generator/parsing.py, and stable across recent releases.

DSL surface

A bytecodes.c entry looks like:

inst(BINARY_OP_ADD_INT, (left, right -- res)) {
DEOPT_IF(!PyLong_CheckExact(left));
DEOPT_IF(!PyLong_CheckExact(right));
STAT_INC(BINARY_OP, hit);
res = _PyLong_Add((PyLongObject *)left, (PyLongObject *)right);
DECREF_INPUTS();
ERROR_IF(res == NULL, error);
}

Key forms:

  • inst(NAME, (inputs -- outputs)) { body }: a real instruction.
  • op(NAME, (inputs -- outputs)) { body }: a fragment, composed by macro into instructions.
  • macro(NAME) = OP1 + OP2;: composition.
  • pseudo(NAME, ...): lowered before assembly; never executes.
  • family(NAME, COUNTER) = { BASE, ADAPTIVE_1, ADAPTIVE_2 };: specialization grouping.

Body uses control macros: DEOPT_IF, ERROR_IF, EXIT_IF, DECREF_INPUTS, INPUTS_DEAD, GOTO_ERROR, JUMPBY, STACK_GROW, STACK_SHRINK. The generator translates these to target-language equivalents.

Stack effects are declared in the signature: (left, right -- res) means "pop two, push one". The generator computes n_pushed and n_popped from the signature alone.

Go translation strategy

The generated Go file has one switch arm per real instruction:

// generated by tools/bytecodes_gen; DO NOT EDIT
// bytecodes-sha256: <hash of bytecodes.c at generation time>

func (e *evalState) dispatch(op opcode.Op, oparg uint32) (next int, err error) {
switch op {
// ...
case opcode.BINARY_OP_ADD_INT:
left := e.peek(2)
right := e.peek(1)
if !object.LongCheckExact(left) {
return e.deoptHere()
}
if !object.LongCheckExact(right) {
return e.deoptHere()
}
res, err := object.LongAdd(left.(*object.Long), right.(*object.Long))
e.decrefInputs(2)
if err != nil {
return 0, err
}
e.replace(2, res)
return e.advance(1), nil
// ...
}
}

The control macros translate as follows:

C macroGo target
DEOPT_IF(cond)if cond { return e.deoptHere() }
ERROR_IF(cond, lbl)if cond { return 0, e.error(lbl) }
EXIT_IF(cond)if cond { return e.exitTrace() } (Tier-2 only)
DECREF_INPUTS()e.decrefInputs(n_popped)
INPUTS_DEAD()(no-op in refcount-only path)
GOTO_ERROR(lbl)return 0, e.error(lbl)
JUMPBY(n)return e.advance(int(n)), nil
STACK_GROW(n)e.grow(n)
STACK_SHRINK(n)e.shrink(n)
INSTRUCTION_SIZEconstant, computed from oparg width plus inline cache

The translator is opportunistic, mirroring the parser_gen action translator (1642). Anything it cannot type lands as a panic-stub arm so the generated file always compiles, and gets filled in as the helper surface (object/*) gains the typed methods the translator needs.

Generator pipeline

Five milestones, mirroring 1642:

  • B1 DSL lexer and parser. Tokenize bytecodes.c, produce a typed AST of inst / op / macro / family / pseudo.
  • B2 Stack-effect analysis. Walk the signature, infer n_popped, n_pushed, named bindings.
  • B3 Per-instruction emitter. One switch arm per inst, oparg decode, stack push/pop, body translation.
  • B4 Macro expansion. Inline op fragments into their composing macro declaration before emitting.
  • B5 Specialization family wiring. Adaptive variants in the same family fall back to the base instruction in v0.6 (the specializer ships in v0.11).
  • B6 Action body translator. Same opportunistic shape as the parser_gen action translator: identifier-bound idents pass through, _Py* calls map to typed object helpers, anything with member access or unknown identifiers falls back to a panic-stub arm.
  • B7 Metadata emitter. Stack effects, oparg widths, cache layout, instruction names lifted to compile/opcodes_gen.go for the assembler.
  • B8 Drift check. SHA256 of bytecodes.c recorded in the generated preamble; bytecodes_gen -check-drift fails CI when the recorded hash does not match the current source.

File mapping

C / DSL sourceGo target
Python/bytecodes.c(input)
Python/generated_cases.c.hvm/opcodes_gen.go (generated)
Python/opcode_targets.hvm/opcode_targets_gen.go (generated)
Include/internal/pycore_opcode_metadata.hcompile/opcodes_gen.go (generated)
Tools/cases_generator/parsing.pytools/bytecodes_gen/dsl_parser.go
Tools/cases_generator/analysis.pytools/bytecodes_gen/analyze.go
Tools/cases_generator/tier1_generator.pytools/bytecodes_gen/emit_tier1.go
Tools/cases_generator/generators_common.pytools/bytecodes_gen/emit_common.go
Tools/cases_generator/stack.pytools/bytecodes_gen/stack.go

Checklist

Status legend: [x] shipped, [ ] pending, [~] partial / scaffold, [n] deferred / not in scope this phase.

Files

  • tools/bytecodes_gen/main.go: CLI with -emit-tier1, -emit-metadata, -check-drift flags.
  • tools/bytecodes_gen/dsl_tok.go: tokenizer for the DSL subset (C tokens plus the -- stack-effect separator).
  • tools/bytecodes_gen/dsl_parser.go: parser producing a typed AST of Inst, Op, Macro, Family, Pseudo.
  • tools/bytecodes_gen/analyze.go: stack-effect analysis, binding scope, macro expansion order.
  • tools/bytecodes_gen/stack.go: push/pop sequence builder. Implemented as part of analyze.go since the binding view and the push/pop sequence share the same walk; no separate stack.go file.
  • [n] tools/bytecodes_gen/emit_common.go: collapsed into emit_tier1.go and emit_metadata.go. The two emitters don't share enough surface to warrant a third file in v0.6; revisit if the metadata emitter grows custom oparg shapes.
  • [~] tools/bytecodes_gen/emit_tier1.go: Tier-1 switch-arm emitter. Skeleton only: each arm pops inputs into named locals and emits a panic-stub body until B6 fills it in.
  • tools/bytecodes_gen/emit_metadata.go: stack-effect / cache-size / has-oparg / family tables. Skips op fragments; variadic stack slots emit as count = -1 ("compute at runtime") to mirror CPython. Round-tripped in emit_metadata_test.go.
  • [~] tools/bytecodes_gen/action.go: C body to Go expression translator; opportunistic, falls back to panic-stub. Today understands the control-macro panel (DEOPT_IF, ERROR_IF, EXIT_IF, DECREF_INPUTS, INPUTS_DEAD, STAT_INC/DEC); _Py* helper calls and member-access shapes still bail to the panic-stub.
  • tools/bytecodes_gen/drift.go: SHA256 record / check (HashFile, MarkerLine, ExtractMarker, CheckDrift). Round-tripped in drift_test.go.

Generator output panel

  • [~] vm/opcodes_gen.go: switch dispatch over every Tier-1 opcode in bytecodes.c. Adaptive variants reduce to their base case for v0.6. Generated end-to-end against cpython-314; arm bodies are panic-stubs pending B6 expansion of the action translator.
  • [n] vm/opcode_targets_gen.go: opcode kind table. The Tier-1 loop classifies via compile.Opcode directly; no separate targets table needed until the specializer in v0.11.
  • compile/opcodes_gen.go: opcode constants, mnemonic table, oparg widths. Generated and consumed by the v0.5 assembler.

Surface guarantees

  • Generator round-trips against the upstream Python/bytecodes.c for 3.14.0. Pinned by the SHA256 in the generated preamble (driven by drift.go).
  • Each inst body emits a switch arm with bound stack inputs, translated control macros, and either a typed action or a panic-stub fallback (B6 fills more arms as it grows).
  • Adaptive variants (*_INT, *_STR, *_INSTANCE_VALUE, ...) compile via the FamilyMap reduction to their base case for v0.6. The specializer (v0.11) is what makes the adaptive paths actually fire.
  • Metadata table matches CPython for opcode number, name, oparg width. Numeric values pinned by compile/opcodes_gen.go (generated against _opcode_metadata.py). Push / pop counts emitted as MetadataEntry.Pushes / MetadataEntry.Pops per instruction; round-tripped in emit_metadata_test.go.
  • Cache layout sizes (CacheSize field in MetadataEntry) emit per instruction, including macro-expanded specializable opcodes (BINARY_OP, CALL, LOAD_ATTR, ...). Pinned byte-for-byte against Include/internal/pycore_code.h cache structs by tools/bytecodes_gen/cache_layout_test.go (skips when the CPYTHON env var is unset).
  • Drift check: bytecodes_gen -check-drift fails when the recorded bytecodes-sha256 does not match the current source. Pinned by tools/bytecodes_gen/drift_test.go.

Action translator panel

  • _Py*_Check, _Py*_CheckExact predicate calls. Bail to panic-stub today.
  • _Py*_Add, _Py*_Subtract, ... numeric helpers. The hand-written panel in vm/eval_simple.go covers the v0.6 arithmetic surface; the translator still bails on these.
  • STAT_INC, STAT_DEC translate to no-op. Pinned by the control-macro panel in tools/bytecodes_gen/action.go.
  • Py_INCREF, Py_DECREF, Py_NewRef, Py_XDECREF translate to e.incref / e.decref / e.newref. Refcount ops are no-ops on the GIL build (Go's GC owns lifetime); they stay structural so the panel is readable against the C side.
  • Py_TYPE, Py_SIZE direct field access.
  • Member-access expressions (obj->something) bail to panic-stub. Fill in lazily as the typed object surface lands.

Out of scope for v0.6

  • vm/executor_gen.go (Tier-2 micro-op cases). Lands in v0.12.
  • optimizer/cases_gen.go (Tier-2 abstract-interp cases). Lands in v0.12.
  • vm/uop_metadata_gen.go. Lands in v0.12.

Cross-references

  • Eval loop that consumes the dispatch table: 1636.
  • Frame layout the dispatch table reads: 1637.
  • Tagged stack values: 1638.
  • Assembler that consumes the metadata table: 1628.