v0.9.0 - VM tail
Released Pre-release.
By the time you reach v0.9 in a Python port, you've shipped the
easy bytecodes. LOAD_CONST, STORE_NAME, BINARY_OP,
COMPARE_OP, CALL, RETURN_VALUE, JUMP, POP_JUMP_IF_FALSE.
Those carry a runner that can execute every straight-line and
branching program a tutorial throws at it. They are also less
than half of the bytecodes CPython actually emits.
The bytecodes you haven't shipped yet are the ones that make
real Python feel like Python. Generators, for example. The
moment a function body contains yield, Python rewrites its
shape. The function returns a generator object on call. The
generator suspends and resumes at every yield. The bytecode for
all of that (RETURN_GENERATOR, YIELD_VALUE, SEND,
GET_YIELD_FROM_ITER, CLEANUP_THROW) is the runtime
representation of a coroutine. You can implement large parts of
Python without those bytecodes; you cannot implement generators
without them.
Pattern matching is the same. match / case looks like a
switch statement, but its semantics are recursive structural
decomposition with type checks, attribute extraction, and
star-binding. The dispatch arms (MATCH_MAPPING,
MATCH_SEQUENCE, MATCH_KEYS, MATCH_CLASS) are how the
pattern shape lowers to executable code. Without them,
match x: case Point(x, y): is a NotImplementedError.
from module import * is the third. Most importlib uses
IMPORT_NAME and IMPORT_FROM, both of which v0.8 wired. The
star form takes a different path through CALL_INTRINSIC_1
because the import has to walk the source module's __all__ (or
its globals if __all__ is absent) and write each name into the
calling frame's locals dict.
v0.9.0 is the vm tail. After this release the Tier-1
dispatch panel is complete: generator yield / send / throw,
pattern matching, set builders, from x import *,
WITH_EXCEPT_START, and the async-iter / awaitable stubs all
run through the eval loop. The contextvars stack lands on top of
an immutable HAMT. pytime becomes the runtime's nanosecond
clock layer and powers the GIL switch interval. tokenize.Iter
graduates from a hand-rolled splitter to the real lexer state
machine. The small-runtime helpers getopt and hashtable
close out the infrastructure shelf.
Highlights
Three pieces of work define this release.
Generators on goroutines
When a Python function body contains yield, CPython doesn't
actually suspend the calling thread. It allocates a generator
object that stores the frame state, and __next__ resumes the
frame inline on the calling thread. The thread runs the
generator body up to the next yield, captures the yielded
value, and returns it to the caller. Single-threaded coroutine
scheduling.
Go gives us a richer primitive. We don't have to multiplex generator suspension and resumption on the calling thread; we can put each generator on its own goroutine and use a buffered channel as the synchronization primitive.
def squares(n):
for i in range(n):
yield i * i
for x in squares(5):
print(x)
# 0 1 4 9 16
RETURN_GENERATOR detaches the current frame from the chunk
arena and spawns the goroutine. YIELD_VALUE writes to the
channel and parks until the next __next__ / send. SEND
unparks the goroutine with a value to deliver into the resume
point. CLEANUP_THROW is the unwinder for exceptions injected
through throw.
The buffered channel runs at depth 0 so a generator that yields without a consumer parks until the consumer arrives. That matches CPython's "step in lockstep" semantics: the consumer drives the producer, not the other way around.
Contextvars on a HAMT
contextvars are how Python 3.7+ propagates per-task state
through async and threaded code. A ContextVar is a key. A
Context is a mapping from keys to values. The PEP 567 contract
is that Context.run(callable, *args) runs the callable with a
captured copy-on-write context, and any ContextVar.set inside
that callable mutates the copy rather than the caller's context.
The natural data structure for that contract is a hash array
mapped trie (HAMT). The HAMT is immutable: every Set returns a
new root that shares the unchanged subtrees with the old root.
Context.run's "copy" is the old root pointer plus the
guarantee that anyone holding it sees a consistent snapshot.
import contextvars
user = contextvars.ContextVar('user')
token = user.set('alice')
try:
print(user.get()) # 'alice'
finally:
user.reset(token)
We ported Python/hamt.c to hamt/hamt.go byte for byte. The
trie supports Set, Get, Delete, Len, and iteration. Each
mutation produces a fresh root. The benchmarks against a
copy-on-write Go map say the HAMT is faster past about 32 keys
and slower below 8 keys. Most contexts are smaller than 8, but
the bigger ones (HTTP request scopes with a couple dozen
correlation IDs) are where the speed matters.
Real tokenizer
Through v0.8 our tokenizer was a hand-rolled splitter that did
"good enough" for the cases we'd seen. It got string prefixes
right enough to read string literals. It got indentation right
enough to compile def bodies. It did not get f-string
expressions right. It did not get type alias statements right. It
did not get continuation lines exactly right.
We ported Parser/tokenizer/tokenizer.c end to end. Iter now
drives the real lexer state machine. Token positions, indentation
tracking, and string-prefix recognition match
_PyTokenizer_Get byte for byte. Source files that the
hand-rolled splitter ate quietly with subtle position errors now
match CPython's tokenization exactly.
What's new
The full feature breakdown, grouped by where it landed.
vm/
The eval loop completes its Tier-1 panel.
eval_gen.go. Generator opcodes (RETURN_GENERATOR,YIELD_VALUE,SEND,GET_YIELD_FROM_ITER,CLEANUP_THROW) plusWITH_EXCEPT_STARTplus the async-iter / awaitable stubs that return errors mentioning v0.9. Each generator runs on its own goroutine and synchronizes with the caller through buffered channels;RETURN_GENERATORdetaches the current frame from the chunk arena and spawns the goroutine. PortsPython/bytecodes.c RETURN_GENERATOR / YIELD_VALUE / SEND / GET_YIELD_FROM_ITER / CLEANUP_THROW / WITH_EXCEPT_START. The async-iter and awaitable arms are stubs that error out with a pointer to the v0.10 milestone; the full conformance lands alongside the type-system port.eval_match.go. Pattern-match opcodes (MATCH_MAPPING,MATCH_SEQUENCE,MATCH_KEYS,MATCH_CLASS). Class match runs the isinstance gate, extracts positional attrs via__match_args__, and extracts keyword attrs via the names tuple. PortsPython/bytecodes.c MATCH_MAPPING / MATCH_SEQUENCE / MATCH_KEYS / MATCH_CLASS. The__match_args__walk handles one level ofBaseslookup; the full MRO walk waits for the type-system port in a later release.eval_simple.go.BUILD_SET,SET_ADD,SET_UPDATEarms now build realobjects.Setvalues;IMPORT_STARis special-cased insideCALL_INTRINSIC_1so the helper can see the current frame's locals. PortsPython/bytecodes.c BUILD_SET / SET_ADD / SET_UPDATEplusPython/intrinsics.c:124 import_star. The IMPORT_STAR placement (inside CALL_INTRINSIC_1 rather than as its own opcode) matches CPython 3.12+; older Python emitted IMPORT_STAR directly.eval_gil.go. Per-threadgilSwitchTimerreadspytime.Monotonicand armsBreakerGILDropRequestonce a drop has been requested and the configured switch interval has elapsed. Wired through the eval loop's per-iteration poll viaSetGIL. PortsPython/ceval_gil.c take_gilinterval-wait branch. The actual contention path (where two interpreter threads fight for the GIL) doesn't fire until sub-interpreters land, but the timer wiring is ready.
objects/
Generators get a real object. Mapping and sequence flags get real bits.
generator.go.Generatortype,GenMsg,Send,Close,genIterNext,genRepr.GeneratorTyperegistered. PortsObjects/genobject.c. The send / throw / close surface matches PEP 342.__next__callsSend(None)under the hood;send(value)deliversvalueas the resume value at the yield point;throw(exc)injects an exception at the yield point;close()injectsGeneratorExitand treats clean return orGeneratorExitpropagation as success.type.go.TpFlags uint64plusTpFlagMappingandTpFlagSequenceconstants (mirrorPy_TPFLAGS_MAPPINGandPy_TPFLAGS_SEQUENCE).dict.go,list.go,tuple.goset the flag in their type init. The flags driveMATCH_MAPPING/MATCH_SEQUENCE: a class is a mapping if the flag bit is set, and we deliberately don't walk attributes to make that determination because pattern matching has to be fast.
pytime/
The nanosecond clock layer.
pytime.go.Timetyped int64 (mirrorsPyTime_t),FromSeconds,FromNanoseconds,AsSecondsDouble, the four rounding modes (Floor, Ceiling, HalfEven, Up) byte for byte with CPython,Deadline,DeadlineFromObject,ErrOverflow. PortsPython/pytime.c. The typed int64 representation matters: at nanosecond resolution, an int64 covers about 292 years, which is more than enough for any wall-clock deadline, and the typed wrapper prevents accidental mixing with seconds or microseconds.clocks.go.Time_/Monotonic/PerfCounterplus theWithInfovariants. Per-platforminfo_*.gofiles name the syscall backing each clock. PortsPython/pytime.c py_get_system_clock / py_get_monotonic_clock.Time_is wall-clock (Go'stime.Now()).Monotonicis what you measure intervals with (Go's monotonic field ontime.Now()).PerfCounteris the highest-resolution clock the platform offers (runtime.nanotimeunder the hood).
hamt/
The persistent map for contextvars.
hamt.go. Immutable hash-array mapped trie.New,Set,Get,Delete,Len, and iteration. Each mutation returns a fresh root sharing structure with the old one; the trie is the storage layer forcontextvars.Context. PortsPython/hamt.c. The implementation walks 5-bit slices of the key hash through a 32-way trie with bitmap-compressed nodes. Branching factor 32 was picked to balance pointer-chase depth against per-node overhead; CPython picked the same number for the same reason.
contextvar/
The PEP 567 surface.
contextvar.go.ContextVar,Context,Token. PEP 567 set and reset semantics with the HAMT carrying the var to value mapping. Run-context lookup and copy-on-write. PortsPython/context.c.setreturns aTokenthat captures the previous value (or absence);reset(token)restores the captured state. The token is opaque so the caller can't forge state without going throughset.module.go. The_contextvarsbuilt-in module registration withcopy_contextand theContextVarconstructor. PortsModules/_contextvarsmodule.c.
getopt/
The POSIX argv walker.
getopt.go._PyOS_GetOptPOSIX-style argv walker plus the long-option panel. The CLI incmd/gopyparses through this instead of an ad-hoc switch. PortsPython/getopt.c. We pulled this in because the long-option support landed incomplete in v0.7'scmdline.go, and rather than patch the partial port we replaced it with the real one.
hashtable/
The generic hash table CPython uses outside of dict.
hashtable.go. Generic_Py_hashtable_twith caller-supplied hash and compare callbacks plus a default_Py_hashtable_hash_ptr. Backbone for runtime infrastructure that needs a typed hash table outside thedictobject. PortsPython/hashtable.c. The interner, the freelist registries, and a handful of internal caches use this rather thandictbecause they're keyed by pointer identity rather than__hash__/__eq__.
tokenize/
The real lexer.
tokenize.go.Iternow drives the real lexer state machine instead of a hand-rolled splitter. Token positions, indentation tracking, and string-prefix recognition match CPython's_PyTokenizer_Get. PortsParser/tokenizer/tokenizer.c. The state machine handles the corner cases the hand-rolled splitter missed: nested parenthesis tracking for implicit line continuation, string-prefix recognition for the full (r,R,b,B,f,F,u,U) cross product, triple-quoted strings, and the FSTRING_START / FSTRING_MIDDLE / FSTRING_END trio for f-string interpolation.
Why we built it this way
A few decisions in this release deserve a callout.
Generators got goroutines, not coroutine emulation. CPython
implements generators by saving and restoring frame state on
the calling thread, which means a generator with a deep call
stack pays a frame-walk cost on every __next__. Go gives us
goroutines for free, so each generator just runs on its own
goroutine and synchronizes through a channel. The cost is one
goroutine plus one channel per active generator (cheap in Go,
roughly 8 KB plus the channel buffer). The benefit is that
suspend and resume are just channel operations, which is
faster than walking and restoring a frame.
The catch is that we have to be careful about Python's GIL semantics. A generator on a goroutine could theoretically run concurrently with its caller; CPython's generators cannot. We preserve the sequential ordering by making the channel unbuffered: the producer can't yield until the consumer pulls, and the consumer pulls under the GIL. So the GIL serializes the two goroutines into the same lock-step CPython uses, just with a different underlying mechanism.
HAMT instead of copy-on-write map. The PEP 567 contract
allows either. We picked HAMT because the typical Context size
across a real application is a couple of dozen keys, and the
HAMT's structural sharing means Context.copy() is a single
pointer assignment rather than a copy of N entries. The
alternative (a copy-on-write Go map) has cheap Get but
expensive copy proportional to size. For long-running
applications that copy contexts frequently (every async task
spawn, every loop.run_in_executor, every concurrent.futures
submit), the HAMT wins by a wide margin.
The full tokenizer port rather than another patch. Our hand-rolled splitter was good enough for v0.6 to v0.8 because the source files we needed to read were ones we wrote. Once we started reading the CPython stdlib (which v0.10 and beyond depend on), the corner cases the splitter missed became real bugs. Rather than patch them one by one, we ported the state machine. The line count is similar; the fidelity is much higher.
Async stubs ship as errors that name v0.9. GET_AITER,
GET_ANEXT, END_ASYNC_FOR, and the full awaitable surface
need a working type system to land. We didn't have it in v0.9,
so the bytecode arms are present but return errors that
explicitly mention "async / await lands in v0.10". This gives
users a clear pointer rather than a generic NotImplementedError,
and it gives the test suite a stable failure mode to skip
against.
Where it lives
The new packages, with their entry points.
vm/.eval_gen.go,eval_match.go,eval_simple.go(updated),eval_gil.go. Generator, pattern-match, set, and GIL timer arms.objects/.generator.go,type.go(TpFlags addition).pytime/.pytime.go,clocks.go, plus per-platforminfo_darwin.go,info_linux.go. Entry pointspytime.FromNanoseconds,pytime.Monotonic,pytime.PerfCounter.hamt/.hamt.go. Entry pointhamt.New.contextvar/.contextvar.go,module.go. Entry pointscontextvar.NewContextVar,contextvar.NewContext,contextvar.CopyContext.getopt/.getopt.go. Entry pointgetopt.PyOSGetOpt.hashtable/.hashtable.go. Entry pointhashtable.New.tokenize/.tokenize.gorewritten. Entry pointtokenize.NewIter.
Compatibility
A few user-visible changes are worth flagging if you were tracking gopy through v0.8.
- Generator functions actually return generators. Through
v0.8, a function body containing
yieldraisedNotImplementedErrorat call time. Now it returns a generator. Any code paths in your tests that asserted on the error are now wrong. - Pattern match falls through correctly.
match x: case Point(x, y):works for the class match path. The__match_args__walk handles one level ofBases; if you have a deep class hierarchy that relies on inherited__match_args__, the v0.9 behavior may match only against the immediate base. v0.10's type-system port fixes this. from x import *works. Code that does star imports rather than named imports no longer errors. The walked names honor__all__if present, fall back to globals otherwise, and skip names starting with_in the fallback case.setandfrozensetliterals work.{1, 2, 3}builds a realSet. Emptyset()constructs through the builtin (the{}literal is still an emptydict).contextvars.copy_context()andContextVarare importable. Async code that depends on PEP 567 now has the primitives it needs.- The tokenizer is stricter. Source files that compiled
through the hand-rolled splitter despite having subtle syntax
errors (a stray BOM, an inconsistent indent, an unterminated
triple-quoted string in the middle of a longer file) now
raise
SyntaxErrorexactly where CPython does. This is surface-breaking for a small number of pathological files and surface-correcting for everything else.
What's next
The v0.10 release is the type-system drop. Highlights:
- Sub-interpreters and
Py_NewInterpreter. The GIL-drop wiring is in place; the actual contention path activates with sub-interpreters. v0.10 ships the fullPy_NewInterpreterFromConfigsurface. - The full async iterator / awaitable surface.
GET_AITER,GET_ANEXT,END_ASYNC_FOR, realGET_AWAITABLEconformance. v0.9 ships error-returning stubs. __match_args__MRO walk. v0.9 handles type identity plus a one-levelBaseswalk; full MRO arrives with the type-system port.WITH_EXCEPT_STARTtraceback argument. Becomes real when the traceback object lands.SEND_GENspecialization. The genericSENDarm works; the specialized form picks up the v0.9 stub.- Real frozen importlib code objects embedded in
imp/frozen_bootstrap.go. v0.9 still ships the placeholder table from v0.8.
The Tier-1 dispatch panel is complete with this release. v0.10
moves to the type system: real user-defined classes, real
method resolution order, real inheritance, real super(),
real descriptor protocol, real metaclasses. After that the
runtime can host actual Python libraries rather than carefully
curated single-file tests.
Acknowledgments
This release lines up against Python/bytecodes.c,
Python/ceval_gil.c, Python/intrinsics.c, Python/pytime.c,
Python/hamt.c, Python/context.c, Python/getopt.c,
Python/hashtable.c, Modules/_contextvarsmodule.c,
Objects/genobject.c, and Parser/tokenizer/tokenizer.c in the
CPython 3.14 tree. Each ported file in gopy carries a citation
to the originating function so a future reader can compare line
by line.