Skip to main content

v0.1.0 - Foundations

Released earlier in development (pre-release).

When you read a CPython source file like Modules/_json.c or Objects/dictobject.c, you keep tripping over a small set of primitives that show up everywhere. PyArena_Malloc allocates memory the compiler will free as a unit. PyMutex guards a critical section. PyThread_start_joinable_thread launches a worker. _Py_HashSecret lives in a process-global the hash-randomization code reads on every comparison. None of these primitives are interesting on their own. They are the ground floor every other subsystem stands on. If they are not in place, the floor above them sags.

v0.1.0 ports that ground floor. Four small packages land: arena/, pythread/, pysync/, hash/. Together they cover the compiler-side allocator, the threading shim, the lock and parking-lot machinery, and the per-process hash secret. None of this code runs Python yet. What it does is unblock everything that does.

This is also the first release where the diff against CPython matters. Every file in this cut has a // CPython: citation at the top pointing to the upstream source. The rule we adopted in v0.0.0 (one Go file per CPython file, named the same way, ported function by function with the upstream function names preserved) starts paying off here. You can read pysync/lock.go side by side with Python/lock.c and the structure of the two files is obviously the same.

Highlights

Four pieces of work define this release.

A compiler arena

CPython has a one-file allocator at Python/pyarena.c. The parser, AST builder, symbol table, and code generator all run under a single arena. When the compile finishes, the arena is freed in one call. No per-node refcounts, no individual frees, no cleanup wandering across files.

We ported it faithfully.

// arena/arena.go
type Arena struct {
blocks []*block // linked list of 8 KiB blocks
objects []Object // Python objects to decref on Free
}

func New() *Arena
func (a *Arena) Malloc(n uintptr) unsafe.Pointer
func (a *Arena) AddObject(obj Object) error
func (a *Arena) Free()

The API matches the upstream _PyArena_* surface name for name. Each block is 8 KiB the same way CPython's DEFAULT_BLOCK_SIZE is 8 KiB. Malloc bumps a pointer; if the current block is full, a new one chains in. AddObject registers a Python object whose refcount we will decrement when the arena drops. Free walks the block list and the object list, releasing both.

We deliberately did not replace the bump allocator with Go's runtime/mem arenas (experimental in Go 1.20+, more general in later versions). The CPython arena is small enough to port verbatim, and keeping it verbatim means future CPython arena changes land as one-line ports.

A threading shim

CPython's threading layer is two files. Python/thread.c is the cross-platform half. Python/thread_pthread.h and Python/thread_nt.h are the OS-specific halves; each has ~800 lines of pthread or Win32 plumbing. The cross-platform half is what calls into them.

We ported only the cross-platform half. The OS-specific halves have no Go counterpart; we replace them with Go's goroutine machinery.

// pythread/thread.go
type Handle struct {
done chan struct{}
id uintptr
}

func Start(fn func()) (*Handle, error)
func (h *Handle) Join()
func (h *Handle) Ident() uintptr

func Init()
func GetStacksize() uintptr
func SetStacksize(n uintptr) error

const TimeoutMax = (1<<63 - 1) / 1000 // matches PY_TIMEOUT_MAX

Start launches a goroutine and hands back a Handle. Join blocks until the goroutine returns. Ident returns a stable identifier (the goroutine's address, hashed to a uintptr), which we use as the key for thread-local storage when that lands.

Init, GetStacksize, and SetStacksize are stubs that match the upstream signatures and return sensible defaults. The Go runtime grows goroutine stacks automatically, so SetStacksize is advisory rather than authoritative. We preserve the API because later subsystems (sys.thread_info, threading.stack_size) call into it.

Three pieces of Python/thread.c are deferred until their dependencies land:

  • PyThread_acquire_lock_timed_with_retries. Needs the GIL state machine, which arrives in v0.9.
  • PyThread_tss_*. Thread-local storage. Needs the per-thread state object, also v0.9.
  • PyThread_GetInfo. Builds a sys.thread_info named tuple; needs sys, which lands in v0.11.

None of these gates are blocked by the deferral. We will return to thread.c when we have the upstream callers.

The parking lot, locks, and critical sections

This is the meat of v0.1.0.

CPython 3.13+ replaced its locking primitives with a parking-lot design borrowed from WebKit and Rust. The idea: every lock is one byte. When a lock is uncontended, the byte alone is enough. When contention happens, the waiter parks on a global hash-keyed table keyed by the lock's address. The wakeup walks the same table. No per-lock OS condition variable, no fat lock structure, no per-lock allocation.

We ported the design from three CPython files:

  • Python/lock.c. The mutex, event, once-flag, recursive mutex, read-write lock, and seqlock primitives.
  • Python/parking_lot.c. The address-keyed parking table.
  • Python/critical_section.c. The PEP 703 critical-section stack that the free-threading build uses to coarsen lock scope around object operations.
// pysync/parking_lot.go
const numBuckets = 257 // matches CPython exactly

// Bucket layout matches Python/parking_lot.c bucket.
type bucket struct {
mu sync.Mutex
head *waiter
tail *waiter
}

var buckets [numBuckets]bucket

func Park(addr unsafe.Pointer, validate func() bool, prepark, postpark func(),
timeout time.Duration, parkArg unsafe.Pointer, detach bool) ParkResult
func Unpark(addr unsafe.Pointer, unparkFn func(arg unsafe.Pointer, hasMore bool) unsafe.Pointer) (int, unsafe.Pointer)
func UnparkAll(addr unsafe.Pointer) int
func AfterFork()

The bucket count is 257, matching upstream exactly. We could have used runtime.NumCPU() or a power of two; we stayed with 257 because it is prime, it is what CPython picked, and it has been benchmarked there. There is no reason to second-guess it.

// pysync/lock.go - the Mutex type
type Mutex struct {
v atomic.Uint8 // locked | has-parked, same bits as PyMutex
}

func (m *Mutex) Lock()
func (m *Mutex) Unlock()
func (m *Mutex) TryLock() bool
func (m *Mutex) LockTimed(timeout time.Duration, detach bool) Result

The byte layout is bit-for-bit the same as PyMutex._bits: bit 0 is the locked flag, bit 1 is the parked-waiters flag. This matters because future code that does atomic operations on the mutex byte (the GIL state machine, for instance) needs the layout to be portable.

The fair-handoff timer is 1 ms, matching CPython's TIME_TO_BE_FAIR_NS. When a lock is contended for longer than 1 ms, the unlocker hands the lock directly to the head of the parking queue rather than letting a fresh contender steal it. This prevents the starvation pattern where a hot thread can keep reacquiring a contended lock indefinitely.

// pysync/critical_section.go
type CriticalSection struct {
prev *CriticalSection
obj1, obj2 *objects.Object // protected objects
}

func BeginMutex(cs *CriticalSection, m *Mutex)
func BeginMutex2(cs *CriticalSection, m1, m2 *Mutex)
func End(cs *CriticalSection)

The PEP 703 critical-section stack is bookkeeping only in the GIL build. Under the GIL, a critical section is a no-op; the GIL already serializes the operations a critical section would guard. Under the free-threaded build (v0.14, gated by the pygil_disabled build tag), the same surface acquires real per-object locks. We landed the bookkeeping now so the call sites can be written once and switched over without source changes when v0.14 lands.

The _PySemaphore abstraction CPython uses for parking-lot sleeps is replaced by a buffered chan struct{} inside each waiter. Go channels give us the same wakeup semantics with less ceremony.

The hash secret

CPython has a one-file hash-secret bootstrapper at Python/bootstrap_hash.c. It runs once at interpreter startup and populates _Py_HashSecret from PYTHONHASHSEED. The secret is what siphash13 and fnv1a use to randomize hashes. Without this, dictionaries are vulnerable to algorithmic-complexity attacks (the attacker computes inputs that all hash to the same bucket and forces every lookup to walk a chain).

We ported the seed-init half. The hash functions themselves ship later (in v0.4 with pyhash.c).

// hash/hash.go
func Init() // called once at startup
func Reset() // for tests
func Secret() [SecretSize]byte
const SecretSize = 24

PYTHONHASHSEED parsing matches CPython exactly:

  • random or unset. Seed from OS entropy.
  • 0. Disable randomization. Every interpreter sees the same secret.
  • A positive 32-bit integer. Use the LCG to derive the secret from that seed, byte-identical to what CPython produces.

The LCG constants are upstream's: 214013, 2531011, output bits 16..23. We ran the implementation against the CPython output for every seed in [0, 1024) and confirmed byte identity. This matters because Python tests that pin PYTHONHASHSEED=0 and check hash values (there are several in the test suite) must produce the same bytes against gopy.

OS entropy comes from Go's crypto/rand. This means the _PyOS_URandom* family of functions and the /dev/urandom fallback are redundant; Go already handles platform differences underneath us. We dropped them rather than port them.

What's new

The full feature breakdown, grouped by package.

arena/

The compiler-side bump allocator. Port of Python/pyarena.c.

  • arena.New() creates an empty arena.
  • (*Arena).Malloc(n) bump-allocates n bytes inside the current 8 KiB block; chains a new block if the current one is full.
  • (*Arena).AddObject(obj) registers a Python object the arena owns. The arena will decref it on Free.
  • (*Arena).Free() walks the block list freeing each block and walks the object list dropping each reference.

Block size is 8 KiB matching DEFAULT_BLOCK_SIZE. The block header layout matches _block upstream.

pythread/

Cross-platform half of Python/thread.c. Layered over Go goroutines.

  • pythread.Start(fn) launches a goroutine and returns a *Handle.
  • (*Handle).Join() blocks until the goroutine returns.
  • (*Handle).Ident() returns a stable identifier we use for thread-local keys.
  • pythread.Init() is a no-op that exists for API parity with upstream.
  • pythread.GetStacksize() returns the current goroutine stack hint.
  • pythread.SetStacksize(n) stores a hint; goroutines grow stacks automatically so the value is advisory.
  • pythread.TimeoutMax matches PY_TIMEOUT_MAX exactly so call sites that clamp timeouts produce the same values.

The thread_pthread.h and thread_nt.h files have no counterpart in Go. We drop them.

pysync/

The locking primitives. Port of Python/lock.c, Python/parking_lot.c, and Python/critical_section.c. We named the package pysync (not sync) to keep it visually distinct from Go's standard sync package; in any one file both might be imported and the distinction matters.

  • pysync.Park, Unpark, UnparkAll, AfterFork. The 257-bucket address-keyed parking lot. Bucket count matches upstream.
  • pysync.Mutex. Byte-flag mutex. Two bits: locked, has-parked-waiters. 1 ms fair-handoff. Detach hook plumbed through (we will wire it to GIL release when the GIL lands in v0.9; the parameter exists today so call sites can be written once).
  • pysync.Event. One-shot signal.
  • pysync.OnceFlag. pthread_once-style init guard.
  • pysync.RecursiveMutex. Reentrant mutex.
  • pysync.RWMutex. Reader-writer lock; multiple readers or one writer.
  • pysync.SeqLock. Optimistic-read seqlock used by the type cache.
  • pysync.RawMutex. The non-Python-object mutex; same byte layout but does not interact with critical sections.
  • pysync.CriticalSection, BeginMutex, BeginMutex2, End. PEP 703 critical-section stack. Bookkeeping only under the GIL build.

The internal _PySemaphore abstraction is replaced by a buffered chan struct{} inside each waiter. The Go channel gives us the same blocking semantics with less code.

hash/

Seed-init half of Python/bootstrap_hash.c. The hash functions themselves ship in v0.4.

  • hash.Init(). Called once at startup. Reads PYTHONHASHSEED, fills _Py_HashSecret.
  • hash.Reset(). For tests.
  • hash.Secret(). Returns the secret as a fixed-size array.
  • hash.SecretSize. 24, matching _Py_HashSecret.
  • PYTHONHASHSEED parsing. random, 0, or a positive 32-bit integer. Identical to upstream.
  • LCG. Constants 214013 and 2531011, output bits 16..23. Byte-identical to lcg_urandom for any seed.
  • OS entropy through Go's crypto/rand. We drop _PyOS_URandom* and the /dev/urandom fallback because Go already handles platform differences.

Why we built it this way

A few choices in this release deserve their own callouts.

Why port parking_lot.c faithfully

A natural question: Go already has sync.Mutex. Why not use it directly and skip the parking-lot port?

Two reasons.

The byte layout matters. Future code (the GIL state machine, the type cache's seqlock, the free-threading critical-section machinery) does atomic operations on the lock's byte representation. If pysync.Mutex is a wrapper around sync.Mutex, those operations have to go through Go's lock internals, which are not stable and not documented for that use. Keeping the byte layout under our control gives us a stable representation we can reason about.

The fair-handoff timer matters. Go's sync.Mutex is unfair (by design; fairness costs throughput on most workloads). CPython's mutex is also unfair until 1 ms of contention, then it hands the lock directly to the waiter at the head of the queue. That hybrid policy is what gives Python its "fairness under stress, throughput under calm" behavior. Replicating it requires owning the lock's internals.

Why pysync and not sync

Naming. In any file that does any concurrent work, both this package and the standard library's sync are candidates. If both are named sync, every file needs an import alias. If one is named pysync and the other is the standard sync, the file reads cleanly with no alias. We paid the four-character cost in exchange for clarity everywhere downstream.

Why we defer the lock-acquire-retry path

PyThread_acquire_lock_timed_with_retries is the function the GIL release machinery calls when a thread wants to drop the GIL, sleep on another lock, and then reacquire the GIL. The retry loop interacts with the GIL state machine in non-trivial ways: the retry has to release the GIL on each attempt, otherwise other threads cannot make progress.

We do not have a GIL yet (it lands in v0.9 with the rest of the thread-state machinery). Porting the retry loop now would mean porting it against a placeholder GIL and re-porting it when the real one arrives. Deferring it to v0.9, where the upstream caller exists, is one port instead of two.

Why we keep the LCG bit-identical

Hash randomization is a security feature. The expectation across Python tooling (CPython itself, PyPy, MicroPython, tools like Hypothesis that produce reproducible test orders) is that PYTHONHASHSEED=N produces the same hashes for the same inputs across implementations. We do not get to deviate. Bit-for-bit identity is the contract.

We verified identity by running the seed range [0, 1024) against a CPython 3.14 build with pdb set to dump _Py_HashSecret after init, and comparing byte by byte. All matched. The test fixture for this lives in hash/hash_test.go and runs on every CI pass.

Where it lives

The four packages each live in their own directory at the module root:

  • arena/. arena.go ports Python/pyarena.c.
  • pythread/. thread.go ports the cross-platform half of Python/thread.c. handle.go carries the Go-side goroutine wrapping.
  • pysync/. lock.go ports Python/lock.c. parking_lot.go ports Python/parking_lot.c. critical_section.go ports Python/critical_section.c.
  • hash/. hash.go ports the seed-init half of Python/bootstrap_hash.c.

Test files live alongside each. The arena tests exercise the allocator under stress. The pythread tests confirm goroutine join semantics. The pysync tests cover the parking-lot fairness window and the recursive-mutex reentrance count. The hash tests verify byte identity against CPython's LCG output.

Compatibility

Nothing user-visible runs at this stage, so there is nothing to break at the language surface. But two policy choices set expectations for later releases.

  • Go 1.26 or newer remains the floor.
  • CPython 3.14.0+ behavior remains the behavioral target.
  • Build tag pygil_disabled is reserved for the future free-threaded build (v0.14). v0.1.0 builds the GIL configuration only. The tag exists so packages can start gating code on it now; the tag's default value is "GIL enabled" until v0.14 flips it.

Known limitations

  • No parser, AST, compiler, VM, or standard library yet. The roadmap reaches the object model in v0.2.0 and the first executable Python in v0.6.0.
  • PyThread_acquire_lock_timed_with_retries, PyThread_tss_*, and PyThread_GetInfo are not yet ported; they depend on subsystems that arrive in later phases (GIL state in v0.9, TSS in v0.9, sys.thread_info in v0.11).
  • Hash functions themselves (SipHash-1-3, FNV) are not here. They ship in v0.4 with the Python/pyhash.c port. v0.1.0 supplies only the secret; the consumer arrives later.

What's next

v0.2.0 builds the object model on top of these primitives. The plan:

  • objects/. The Object interface, Header, VarHeader, the refcount machinery, the type-slot surface, and the v0.2 concrete types (int, float, bool, None, tuple, list, dict, slice, range).
  • typeobj/. The C3 linearization and the slot lookup walk.
  • abstract/. The subset of Objects/abstract.c needed to build a dict, hash a tuple, iterate a list.
  • objtest/. The first gate. Build a dict, hash a tuple, iterate a list, all from Go test code.

The interpreter still will not run Python code at the end of v0.2.0. The first runnable Python lands in v0.6.0 once the parser, AST, compiler, and VM are all in place.

Acknowledgments

Python/lock.c, Python/parking_lot.c, and Python/critical_section.c are the work of the PEP 703 authors (Sam Gross and the free-threading working group) plus the broader CPython core. Reading those files end to end was the most fun part of this release. The design makes sense at every layer and the port was straightforward because of it.