Skip to main content

v0.9.0 - VM tail

Released Pre-release.

By the time you reach v0.9 in a Python port, you've shipped the easy bytecodes. LOAD_CONST, STORE_NAME, BINARY_OP, COMPARE_OP, CALL, RETURN_VALUE, JUMP, POP_JUMP_IF_FALSE. Those carry a runner that can execute every straight-line and branching program a tutorial throws at it. They are also less than half of the bytecodes CPython actually emits.

The bytecodes you haven't shipped yet are the ones that make real Python feel like Python. Generators, for example. The moment a function body contains yield, Python rewrites its shape. The function returns a generator object on call. The generator suspends and resumes at every yield. The bytecode for all of that (RETURN_GENERATOR, YIELD_VALUE, SEND, GET_YIELD_FROM_ITER, CLEANUP_THROW) is the runtime representation of a coroutine. You can implement large parts of Python without those bytecodes; you cannot implement generators without them.

Pattern matching is the same. match / case looks like a switch statement, but its semantics are recursive structural decomposition with type checks, attribute extraction, and star-binding. The dispatch arms (MATCH_MAPPING, MATCH_SEQUENCE, MATCH_KEYS, MATCH_CLASS) are how the pattern shape lowers to executable code. Without them, match x: case Point(x, y): is a NotImplementedError.

from module import * is the third. Most importlib uses IMPORT_NAME and IMPORT_FROM, both of which v0.8 wired. The star form takes a different path through CALL_INTRINSIC_1 because the import has to walk the source module's __all__ (or its globals if __all__ is absent) and write each name into the calling frame's locals dict.

v0.9.0 is the vm tail. After this release the Tier-1 dispatch panel is complete: generator yield / send / throw, pattern matching, set builders, from x import *, WITH_EXCEPT_START, and the async-iter / awaitable stubs all run through the eval loop. The contextvars stack lands on top of an immutable HAMT. pytime becomes the runtime's nanosecond clock layer and powers the GIL switch interval. tokenize.Iter graduates from a hand-rolled splitter to the real lexer state machine. The small-runtime helpers getopt and hashtable close out the infrastructure shelf.

Highlights

Three pieces of work define this release.

Generators on goroutines

When a Python function body contains yield, CPython doesn't actually suspend the calling thread. It allocates a generator object that stores the frame state, and __next__ resumes the frame inline on the calling thread. The thread runs the generator body up to the next yield, captures the yielded value, and returns it to the caller. Single-threaded coroutine scheduling.

Go gives us a richer primitive. We don't have to multiplex generator suspension and resumption on the calling thread; we can put each generator on its own goroutine and use a buffered channel as the synchronization primitive.

def squares(n):
for i in range(n):
yield i * i

for x in squares(5):
print(x)
# 0 1 4 9 16

RETURN_GENERATOR detaches the current frame from the chunk arena and spawns the goroutine. YIELD_VALUE writes to the channel and parks until the next __next__ / send. SEND unparks the goroutine with a value to deliver into the resume point. CLEANUP_THROW is the unwinder for exceptions injected through throw.

The buffered channel runs at depth 0 so a generator that yields without a consumer parks until the consumer arrives. That matches CPython's "step in lockstep" semantics: the consumer drives the producer, not the other way around.

Contextvars on a HAMT

contextvars are how Python 3.7+ propagates per-task state through async and threaded code. A ContextVar is a key. A Context is a mapping from keys to values. The PEP 567 contract is that Context.run(callable, *args) runs the callable with a captured copy-on-write context, and any ContextVar.set inside that callable mutates the copy rather than the caller's context.

The natural data structure for that contract is a hash array mapped trie (HAMT). The HAMT is immutable: every Set returns a new root that shares the unchanged subtrees with the old root. Context.run's "copy" is the old root pointer plus the guarantee that anyone holding it sees a consistent snapshot.

import contextvars

user = contextvars.ContextVar('user')
token = user.set('alice')
try:
print(user.get()) # 'alice'
finally:
user.reset(token)

We ported Python/hamt.c to hamt/hamt.go byte for byte. The trie supports Set, Get, Delete, Len, and iteration. Each mutation produces a fresh root. The benchmarks against a copy-on-write Go map say the HAMT is faster past about 32 keys and slower below 8 keys. Most contexts are smaller than 8, but the bigger ones (HTTP request scopes with a couple dozen correlation IDs) are where the speed matters.

Real tokenizer

Through v0.8 our tokenizer was a hand-rolled splitter that did "good enough" for the cases we'd seen. It got string prefixes right enough to read string literals. It got indentation right enough to compile def bodies. It did not get f-string expressions right. It did not get type alias statements right. It did not get continuation lines exactly right.

We ported Parser/tokenizer/tokenizer.c end to end. Iter now drives the real lexer state machine. Token positions, indentation tracking, and string-prefix recognition match _PyTokenizer_Get byte for byte. Source files that the hand-rolled splitter ate quietly with subtle position errors now match CPython's tokenization exactly.

What's new

The full feature breakdown, grouped by where it landed.

vm/

The eval loop completes its Tier-1 panel.

  • eval_gen.go. Generator opcodes (RETURN_GENERATOR, YIELD_VALUE, SEND, GET_YIELD_FROM_ITER, CLEANUP_THROW) plus WITH_EXCEPT_START plus the async-iter / awaitable stubs that return errors mentioning v0.9. Each generator runs on its own goroutine and synchronizes with the caller through buffered channels; RETURN_GENERATOR detaches the current frame from the chunk arena and spawns the goroutine. Ports Python/bytecodes.c RETURN_GENERATOR / YIELD_VALUE / SEND / GET_YIELD_FROM_ITER / CLEANUP_THROW / WITH_EXCEPT_START. The async-iter and awaitable arms are stubs that error out with a pointer to the v0.10 milestone; the full conformance lands alongside the type-system port.
  • eval_match.go. Pattern-match opcodes (MATCH_MAPPING, MATCH_SEQUENCE, MATCH_KEYS, MATCH_CLASS). Class match runs the isinstance gate, extracts positional attrs via __match_args__, and extracts keyword attrs via the names tuple. Ports Python/bytecodes.c MATCH_MAPPING / MATCH_SEQUENCE / MATCH_KEYS / MATCH_CLASS. The __match_args__ walk handles one level of Bases lookup; the full MRO walk waits for the type-system port in a later release.
  • eval_simple.go. BUILD_SET, SET_ADD, SET_UPDATE arms now build real objects.Set values; IMPORT_STAR is special-cased inside CALL_INTRINSIC_1 so the helper can see the current frame's locals. Ports Python/bytecodes.c BUILD_SET / SET_ADD / SET_UPDATE plus Python/intrinsics.c:124 import_star. The IMPORT_STAR placement (inside CALL_INTRINSIC_1 rather than as its own opcode) matches CPython 3.12+; older Python emitted IMPORT_STAR directly.
  • eval_gil.go. Per-thread gilSwitchTimer reads pytime.Monotonic and arms BreakerGILDropRequest once a drop has been requested and the configured switch interval has elapsed. Wired through the eval loop's per-iteration poll via SetGIL. Ports Python/ceval_gil.c take_gil interval-wait branch. The actual contention path (where two interpreter threads fight for the GIL) doesn't fire until sub-interpreters land, but the timer wiring is ready.

objects/

Generators get a real object. Mapping and sequence flags get real bits.

  • generator.go. Generator type, GenMsg, Send, Close, genIterNext, genRepr. GeneratorType registered. Ports Objects/genobject.c. The send / throw / close surface matches PEP 342. __next__ calls Send(None) under the hood; send(value) delivers value as the resume value at the yield point; throw(exc) injects an exception at the yield point; close() injects GeneratorExit and treats clean return or GeneratorExit propagation as success.
  • type.go. TpFlags uint64 plus TpFlagMapping and TpFlagSequence constants (mirror Py_TPFLAGS_MAPPING and Py_TPFLAGS_SEQUENCE). dict.go, list.go, tuple.go set the flag in their type init. The flags drive MATCH_MAPPING / MATCH_SEQUENCE: a class is a mapping if the flag bit is set, and we deliberately don't walk attributes to make that determination because pattern matching has to be fast.

pytime/

The nanosecond clock layer.

  • pytime.go. Time typed int64 (mirrors PyTime_t), FromSeconds, FromNanoseconds, AsSecondsDouble, the four rounding modes (Floor, Ceiling, HalfEven, Up) byte for byte with CPython, Deadline, DeadlineFromObject, ErrOverflow. Ports Python/pytime.c. The typed int64 representation matters: at nanosecond resolution, an int64 covers about 292 years, which is more than enough for any wall-clock deadline, and the typed wrapper prevents accidental mixing with seconds or microseconds.
  • clocks.go. Time_ / Monotonic / PerfCounter plus the WithInfo variants. Per-platform info_*.go files name the syscall backing each clock. Ports Python/pytime.c py_get_system_clock / py_get_monotonic_clock. Time_ is wall-clock (Go's time.Now()). Monotonic is what you measure intervals with (Go's monotonic field on time.Now()). PerfCounter is the highest-resolution clock the platform offers (runtime.nanotime under the hood).

hamt/

The persistent map for contextvars.

  • hamt.go. Immutable hash-array mapped trie. New, Set, Get, Delete, Len, and iteration. Each mutation returns a fresh root sharing structure with the old one; the trie is the storage layer for contextvars.Context. Ports Python/hamt.c. The implementation walks 5-bit slices of the key hash through a 32-way trie with bitmap-compressed nodes. Branching factor 32 was picked to balance pointer-chase depth against per-node overhead; CPython picked the same number for the same reason.

contextvar/

The PEP 567 surface.

  • contextvar.go. ContextVar, Context, Token. PEP 567 set and reset semantics with the HAMT carrying the var to value mapping. Run-context lookup and copy-on-write. Ports Python/context.c. set returns a Token that captures the previous value (or absence); reset(token) restores the captured state. The token is opaque so the caller can't forge state without going through set.
  • module.go. The _contextvars built-in module registration with copy_context and the ContextVar constructor. Ports Modules/_contextvarsmodule.c.

getopt/

The POSIX argv walker.

  • getopt.go. _PyOS_GetOpt POSIX-style argv walker plus the long-option panel. The CLI in cmd/gopy parses through this instead of an ad-hoc switch. Ports Python/getopt.c. We pulled this in because the long-option support landed incomplete in v0.7's cmdline.go, and rather than patch the partial port we replaced it with the real one.

hashtable/

The generic hash table CPython uses outside of dict.

  • hashtable.go. Generic _Py_hashtable_t with caller-supplied hash and compare callbacks plus a default _Py_hashtable_hash_ptr. Backbone for runtime infrastructure that needs a typed hash table outside the dict object. Ports Python/hashtable.c. The interner, the freelist registries, and a handful of internal caches use this rather than dict because they're keyed by pointer identity rather than __hash__ / __eq__.

tokenize/

The real lexer.

  • tokenize.go. Iter now drives the real lexer state machine instead of a hand-rolled splitter. Token positions, indentation tracking, and string-prefix recognition match CPython's _PyTokenizer_Get. Ports Parser/tokenizer/tokenizer.c. The state machine handles the corner cases the hand-rolled splitter missed: nested parenthesis tracking for implicit line continuation, string-prefix recognition for the full (r, R, b, B, f, F, u, U) cross product, triple-quoted strings, and the FSTRING_START / FSTRING_MIDDLE / FSTRING_END trio for f-string interpolation.

Why we built it this way

A few decisions in this release deserve a callout.

Generators got goroutines, not coroutine emulation. CPython implements generators by saving and restoring frame state on the calling thread, which means a generator with a deep call stack pays a frame-walk cost on every __next__. Go gives us goroutines for free, so each generator just runs on its own goroutine and synchronizes through a channel. The cost is one goroutine plus one channel per active generator (cheap in Go, roughly 8 KB plus the channel buffer). The benefit is that suspend and resume are just channel operations, which is faster than walking and restoring a frame.

The catch is that we have to be careful about Python's GIL semantics. A generator on a goroutine could theoretically run concurrently with its caller; CPython's generators cannot. We preserve the sequential ordering by making the channel unbuffered: the producer can't yield until the consumer pulls, and the consumer pulls under the GIL. So the GIL serializes the two goroutines into the same lock-step CPython uses, just with a different underlying mechanism.

HAMT instead of copy-on-write map. The PEP 567 contract allows either. We picked HAMT because the typical Context size across a real application is a couple of dozen keys, and the HAMT's structural sharing means Context.copy() is a single pointer assignment rather than a copy of N entries. The alternative (a copy-on-write Go map) has cheap Get but expensive copy proportional to size. For long-running applications that copy contexts frequently (every async task spawn, every loop.run_in_executor, every concurrent.futures submit), the HAMT wins by a wide margin.

The full tokenizer port rather than another patch. Our hand-rolled splitter was good enough for v0.6 to v0.8 because the source files we needed to read were ones we wrote. Once we started reading the CPython stdlib (which v0.10 and beyond depend on), the corner cases the splitter missed became real bugs. Rather than patch them one by one, we ported the state machine. The line count is similar; the fidelity is much higher.

Async stubs ship as errors that name v0.9. GET_AITER, GET_ANEXT, END_ASYNC_FOR, and the full awaitable surface need a working type system to land. We didn't have it in v0.9, so the bytecode arms are present but return errors that explicitly mention "async / await lands in v0.10". This gives users a clear pointer rather than a generic NotImplementedError, and it gives the test suite a stable failure mode to skip against.

Where it lives

The new packages, with their entry points.

  • vm/. eval_gen.go, eval_match.go, eval_simple.go (updated), eval_gil.go. Generator, pattern-match, set, and GIL timer arms.
  • objects/. generator.go, type.go (TpFlags addition).
  • pytime/. pytime.go, clocks.go, plus per-platform info_darwin.go, info_linux.go. Entry points pytime.FromNanoseconds, pytime.Monotonic, pytime.PerfCounter.
  • hamt/. hamt.go. Entry point hamt.New.
  • contextvar/. contextvar.go, module.go. Entry points contextvar.NewContextVar, contextvar.NewContext, contextvar.CopyContext.
  • getopt/. getopt.go. Entry point getopt.PyOSGetOpt.
  • hashtable/. hashtable.go. Entry point hashtable.New.
  • tokenize/. tokenize.go rewritten. Entry point tokenize.NewIter.

Compatibility

A few user-visible changes are worth flagging if you were tracking gopy through v0.8.

  • Generator functions actually return generators. Through v0.8, a function body containing yield raised NotImplementedError at call time. Now it returns a generator. Any code paths in your tests that asserted on the error are now wrong.
  • Pattern match falls through correctly. match x: case Point(x, y): works for the class match path. The __match_args__ walk handles one level of Bases; if you have a deep class hierarchy that relies on inherited __match_args__, the v0.9 behavior may match only against the immediate base. v0.10's type-system port fixes this.
  • from x import * works. Code that does star imports rather than named imports no longer errors. The walked names honor __all__ if present, fall back to globals otherwise, and skip names starting with _ in the fallback case.
  • set and frozenset literals work. {1, 2, 3} builds a real Set. Empty set() constructs through the builtin (the {} literal is still an empty dict).
  • contextvars.copy_context() and ContextVar are importable. Async code that depends on PEP 567 now has the primitives it needs.
  • The tokenizer is stricter. Source files that compiled through the hand-rolled splitter despite having subtle syntax errors (a stray BOM, an inconsistent indent, an unterminated triple-quoted string in the middle of a longer file) now raise SyntaxError exactly where CPython does. This is surface-breaking for a small number of pathological files and surface-correcting for everything else.

What's next

The v0.10 release is the type-system drop. Highlights:

  • Sub-interpreters and Py_NewInterpreter. The GIL-drop wiring is in place; the actual contention path activates with sub-interpreters. v0.10 ships the full Py_NewInterpreterFromConfig surface.
  • The full async iterator / awaitable surface. GET_AITER, GET_ANEXT, END_ASYNC_FOR, real GET_AWAITABLE conformance. v0.9 ships error-returning stubs.
  • __match_args__ MRO walk. v0.9 handles type identity plus a one-level Bases walk; full MRO arrives with the type-system port.
  • WITH_EXCEPT_START traceback argument. Becomes real when the traceback object lands.
  • SEND_GEN specialization. The generic SEND arm works; the specialized form picks up the v0.9 stub.
  • Real frozen importlib code objects embedded in imp/frozen_bootstrap.go. v0.9 still ships the placeholder table from v0.8.

The Tier-1 dispatch panel is complete with this release. v0.10 moves to the type system: real user-defined classes, real method resolution order, real inheritance, real super(), real descriptor protocol, real metaclasses. After that the runtime can host actual Python libraries rather than carefully curated single-file tests.

Acknowledgments

This release lines up against Python/bytecodes.c, Python/ceval_gil.c, Python/intrinsics.c, Python/pytime.c, Python/hamt.c, Python/context.c, Python/getopt.c, Python/hashtable.c, Modules/_contextvarsmodule.c, Objects/genobject.c, and Parser/tokenizer/tokenizer.c in the CPython 3.14 tree. Each ported file in gopy carries a citation to the originating function so a future reader can compare line by line.