1621. gopy bytecodes DSL
What we are porting
CPython's bytecode interpreter is generated. The source of truth
is Python/bytecodes.c, a C file written in a small DSL. The
upstream generator lives in Tools/cases_generator/ (Python),
walks the DSL, and emits five C headers:
| Generated file | Consumer |
|---|---|
Python/generated_cases.c.h | Python/ceval.c |
Python/executor_cases.c.h | Python/optimizer.c (Tier-2) |
Python/optimizer_cases.c.h | Python/optimizer.c (analysis) |
Include/internal/pycore_opcode_metadata.h | shared metadata |
Include/internal/pycore_uop_metadata.h | Tier-2 micro-op metadata |
For v0.6 we need just two: the Tier-1 dispatch handlers and the shared metadata. The Tier-2 outputs ship in v0.12.
Strategy
Same shape as 1642 (parser_gen): hand-port the DSL parser to Go and write a Go-emitting backend. We do not wrap the upstream Python generator. Two reasons:
- The output we want (Go switch arms calling typed object helpers) is structurally different from the C output (computed gotos with macro-expanded stack ops). A backend swap is more surgical than wrapping plus translating.
- CPython updates
bytecodes.cnearly every release. Owning the parser end-to-end means a CPython rebase is one regeneration plus a drift check, not a transitive Python toolchain dep.
The DSL itself is small, well-documented in
Tools/cases_generator/parsing.py, and stable across recent
releases.
DSL surface
A bytecodes.c entry looks like:
inst(BINARY_OP_ADD_INT, (left, right -- res)) {
DEOPT_IF(!PyLong_CheckExact(left));
DEOPT_IF(!PyLong_CheckExact(right));
STAT_INC(BINARY_OP, hit);
res = _PyLong_Add((PyLongObject *)left, (PyLongObject *)right);
DECREF_INPUTS();
ERROR_IF(res == NULL, error);
}
Key forms:
inst(NAME, (inputs -- outputs)) { body }: a real instruction.op(NAME, (inputs -- outputs)) { body }: a fragment, composed bymacrointo instructions.macro(NAME) = OP1 + OP2;: composition.pseudo(NAME, ...): lowered before assembly; never executes.family(NAME, COUNTER) = { BASE, ADAPTIVE_1, ADAPTIVE_2 };: specialization grouping.
Body uses control macros: DEOPT_IF, ERROR_IF, EXIT_IF,
DECREF_INPUTS, INPUTS_DEAD, GOTO_ERROR, JUMPBY,
STACK_GROW, STACK_SHRINK. The generator translates these to
target-language equivalents.
Stack effects are declared in the signature: (left, right -- res)
means "pop two, push one". The generator computes n_pushed and
n_popped from the signature alone.
Go translation strategy
The generated Go file has one switch arm per real instruction:
// generated by tools/bytecodes_gen; DO NOT EDIT
// bytecodes-sha256: <hash of bytecodes.c at generation time>
func (e *evalState) dispatch(op opcode.Op, oparg uint32) (next int, err error) {
switch op {
// ...
case opcode.BINARY_OP_ADD_INT:
left := e.peek(2)
right := e.peek(1)
if !object.LongCheckExact(left) {
return e.deoptHere()
}
if !object.LongCheckExact(right) {
return e.deoptHere()
}
res, err := object.LongAdd(left.(*object.Long), right.(*object.Long))
e.decrefInputs(2)
if err != nil {
return 0, err
}
e.replace(2, res)
return e.advance(1), nil
// ...
}
}
The control macros translate as follows:
| C macro | Go target |
|---|---|
DEOPT_IF(cond) | if cond { return e.deoptHere() } |
ERROR_IF(cond, lbl) | if cond { return 0, e.error(lbl) } |
EXIT_IF(cond) | if cond { return e.exitTrace() } (Tier-2 only) |
DECREF_INPUTS() | e.decrefInputs(n_popped) |
INPUTS_DEAD() | (no-op in refcount-only path) |
GOTO_ERROR(lbl) | return 0, e.error(lbl) |
JUMPBY(n) | return e.advance(int(n)), nil |
STACK_GROW(n) | e.grow(n) |
STACK_SHRINK(n) | e.shrink(n) |
INSTRUCTION_SIZE | constant, computed from oparg width plus inline cache |
The translator is opportunistic, mirroring the parser_gen action
translator (1642). Anything it cannot type lands as a panic-stub
arm so the generated file always compiles, and gets filled in as
the helper surface (object/*) gains the typed methods the
translator needs.
Generator pipeline
Five milestones, mirroring 1642:
- B1 DSL lexer and parser. Tokenize
bytecodes.c, produce a typed AST ofinst/op/macro/family/pseudo. - B2 Stack-effect analysis. Walk the signature, infer
n_popped,n_pushed, named bindings. - B3 Per-instruction emitter. One switch arm per
inst, oparg decode, stack push/pop, body translation. - B4 Macro expansion. Inline
opfragments into their composingmacrodeclaration before emitting. - B5 Specialization family wiring. Adaptive variants in the same family fall back to the base instruction in v0.6 (the specializer ships in v0.11).
- B6 Action body translator. Same opportunistic shape as the
parser_gen action translator: identifier-bound idents pass
through,
_Py*calls map to typed object helpers, anything with member access or unknown identifiers falls back to a panic-stub arm. - B7 Metadata emitter. Stack effects, oparg widths, cache
layout, instruction names lifted to
compile/opcodes_gen.gofor the assembler. - B8 Drift check. SHA256 of
bytecodes.crecorded in the generated preamble;bytecodes_gen -check-driftfails CI when the recorded hash does not match the current source.
File mapping
| C / DSL source | Go target |
|---|---|
Python/bytecodes.c | (input) |
Python/generated_cases.c.h | vm/opcodes_gen.go (generated) |
Python/opcode_targets.h | vm/opcode_targets_gen.go (generated) |
Include/internal/pycore_opcode_metadata.h | compile/opcodes_gen.go (generated) |
Tools/cases_generator/parsing.py | tools/bytecodes_gen/dsl_parser.go |
Tools/cases_generator/analysis.py | tools/bytecodes_gen/analyze.go |
Tools/cases_generator/tier1_generator.py | tools/bytecodes_gen/emit_tier1.go |
Tools/cases_generator/generators_common.py | tools/bytecodes_gen/emit_common.go |
Tools/cases_generator/stack.py | tools/bytecodes_gen/stack.go |
Checklist
Status legend: [x] shipped, [ ] pending, [~] partial / scaffold,
[n] deferred / not in scope this phase.
Files
-
tools/bytecodes_gen/main.go: CLI with-emit-tier1,-emit-metadata,-check-driftflags. -
tools/bytecodes_gen/dsl_tok.go: tokenizer for the DSL subset (C tokens plus the--stack-effect separator). -
tools/bytecodes_gen/dsl_parser.go: parser producing a typed AST ofInst,Op,Macro,Family,Pseudo. -
tools/bytecodes_gen/analyze.go: stack-effect analysis, binding scope, macro expansion order. -
tools/bytecodes_gen/stack.go: push/pop sequence builder. Implemented as part ofanalyze.gosince the binding view and the push/pop sequence share the same walk; no separate stack.go file. - [n]
tools/bytecodes_gen/emit_common.go: collapsed intoemit_tier1.goandemit_metadata.go. The two emitters don't share enough surface to warrant a third file in v0.6; revisit if the metadata emitter grows custom oparg shapes. - [~]
tools/bytecodes_gen/emit_tier1.go: Tier-1 switch-arm emitter. Skeleton only: each arm pops inputs into named locals and emits a panic-stub body until B6 fills it in. -
tools/bytecodes_gen/emit_metadata.go: stack-effect / cache-size / has-oparg / family tables. Skipsopfragments; variadic stack slots emit as count = -1 ("compute at runtime") to mirror CPython. Round-tripped inemit_metadata_test.go. - [~]
tools/bytecodes_gen/action.go: C body to Go expression translator; opportunistic, falls back to panic-stub. Today understands the control-macro panel (DEOPT_IF, ERROR_IF, EXIT_IF, DECREF_INPUTS, INPUTS_DEAD, STAT_INC/DEC); _Py* helper calls and member-access shapes still bail to the panic-stub. -
tools/bytecodes_gen/drift.go: SHA256 record / check (HashFile,MarkerLine,ExtractMarker,CheckDrift). Round-tripped indrift_test.go.
Generator output panel
- [~]
vm/opcodes_gen.go: switch dispatch over every Tier-1 opcode inbytecodes.c. Adaptive variants reduce to their base case for v0.6. Generated end-to-end against cpython-314; arm bodies are panic-stubs pending B6 expansion of the action translator. - [n]
vm/opcode_targets_gen.go: opcode kind table. The Tier-1 loop classifies viacompile.Opcodedirectly; no separate targets table needed until the specializer in v0.11. -
compile/opcodes_gen.go: opcode constants, mnemonic table, oparg widths. Generated and consumed by the v0.5 assembler.
Surface guarantees
- Generator round-trips against the upstream
Python/bytecodes.cfor 3.14.0. Pinned by the SHA256 in the generated preamble (driven bydrift.go). - Each
instbody emits a switch arm with bound stack inputs, translated control macros, and either a typed action or a panic-stub fallback (B6 fills more arms as it grows). - Adaptive variants (
*_INT,*_STR,*_INSTANCE_VALUE, ...) compile via the FamilyMap reduction to their base case for v0.6. The specializer (v0.11) is what makes the adaptive paths actually fire. - Metadata table matches CPython for opcode number, name,
oparg width. Numeric values pinned by
compile/opcodes_gen.go(generated against_opcode_metadata.py). Push / pop counts emitted asMetadataEntry.Pushes/MetadataEntry.Popsper instruction; round-tripped inemit_metadata_test.go. - Cache layout sizes (
CacheSizefield inMetadataEntry) emit per instruction, including macro-expanded specializable opcodes (BINARY_OP, CALL, LOAD_ATTR, ...). Pinned byte-for-byte againstInclude/internal/pycore_code.hcache structs bytools/bytecodes_gen/cache_layout_test.go(skips when theCPYTHONenv var is unset). - Drift check:
bytecodes_gen -check-driftfails when the recordedbytecodes-sha256does not match the current source. Pinned bytools/bytecodes_gen/drift_test.go.
Action translator panel
-
_Py*_Check,_Py*_CheckExactpredicate calls. Bail to panic-stub today. -
_Py*_Add,_Py*_Subtract, ... numeric helpers. The hand-written panel invm/eval_simple.gocovers the v0.6 arithmetic surface; the translator still bails on these. -
STAT_INC,STAT_DECtranslate to no-op. Pinned by the control-macro panel intools/bytecodes_gen/action.go. -
Py_INCREF,Py_DECREF,Py_NewRef,Py_XDECREFtranslate toe.incref/e.decref/e.newref. Refcount ops are no-ops on the GIL build (Go's GC owns lifetime); they stay structural so the panel is readable against the C side. -
Py_TYPE,Py_SIZEdirect field access. - Member-access expressions (
obj->something) bail to panic-stub. Fill in lazily as the typed object surface lands.
Out of scope for v0.6
vm/executor_gen.go(Tier-2 micro-op cases). Lands in v0.12.optimizer/cases_gen.go(Tier-2 abstract-interp cases). Lands in v0.12.vm/uop_metadata_gen.go. Lands in v0.12.
Cross-references
- Eval loop that consumes the dispatch table: 1636.
- Frame layout the dispatch table reads: 1637.
- Tagged stack values: 1638.
- Assembler that consumes the metadata table: 1628.