v0.1.0 - Foundations
Released earlier in development (pre-release).
When you read a CPython source file like Modules/_json.c or
Objects/dictobject.c, you keep tripping over a small set of
primitives that show up everywhere. PyArena_Malloc allocates
memory the compiler will free as a unit. PyMutex guards a
critical section. PyThread_start_joinable_thread launches a
worker. _Py_HashSecret lives in a process-global the
hash-randomization code reads on every comparison. None of
these primitives are interesting on their own. They are the
ground floor every other subsystem stands on. If they are not
in place, the floor above them sags.
v0.1.0 ports that ground floor. Four small packages land:
arena/, pythread/, pysync/, hash/. Together they cover
the compiler-side allocator, the threading shim, the lock and
parking-lot machinery, and the per-process hash secret. None of
this code runs Python yet. What it does is unblock everything
that does.
This is also the first release where the diff against CPython
matters. Every file in this cut has a // CPython: citation at
the top pointing to the upstream source. The rule we adopted
in v0.0.0 (one Go file per CPython file, named the same way,
ported function by function with the upstream function names
preserved) starts paying off here. You can read pysync/lock.go
side by side with Python/lock.c and the structure of the two
files is obviously the same.
Highlights
Four pieces of work define this release.
A compiler arena
CPython has a one-file allocator at Python/pyarena.c. The
parser, AST builder, symbol table, and code generator all run
under a single arena. When the compile finishes, the arena is
freed in one call. No per-node refcounts, no individual frees,
no cleanup wandering across files.
We ported it faithfully.
// arena/arena.go
type Arena struct {
blocks []*block // linked list of 8 KiB blocks
objects []Object // Python objects to decref on Free
}
func New() *Arena
func (a *Arena) Malloc(n uintptr) unsafe.Pointer
func (a *Arena) AddObject(obj Object) error
func (a *Arena) Free()
The API matches the upstream _PyArena_* surface name for
name. Each block is 8 KiB the same way CPython's
DEFAULT_BLOCK_SIZE is 8 KiB. Malloc bumps a pointer; if the
current block is full, a new one chains in. AddObject
registers a Python object whose refcount we will decrement when
the arena drops. Free walks the block list and the object
list, releasing both.
We deliberately did not replace the bump allocator with Go's
runtime/mem arenas (experimental in Go 1.20+, more general
in later versions). The CPython arena is small enough to port
verbatim, and keeping it verbatim means future CPython arena
changes land as one-line ports.
A threading shim
CPython's threading layer is two files. Python/thread.c is
the cross-platform half. Python/thread_pthread.h and
Python/thread_nt.h are the OS-specific halves; each has
~800 lines of pthread or Win32 plumbing. The cross-platform
half is what calls into them.
We ported only the cross-platform half. The OS-specific halves have no Go counterpart; we replace them with Go's goroutine machinery.
// pythread/thread.go
type Handle struct {
done chan struct{}
id uintptr
}
func Start(fn func()) (*Handle, error)
func (h *Handle) Join()
func (h *Handle) Ident() uintptr
func Init()
func GetStacksize() uintptr
func SetStacksize(n uintptr) error
const TimeoutMax = (1<<63 - 1) / 1000 // matches PY_TIMEOUT_MAX
Start launches a goroutine and hands back a Handle. Join
blocks until the goroutine returns. Ident returns a stable
identifier (the goroutine's address, hashed to a uintptr),
which we use as the key for thread-local storage when that
lands.
Init, GetStacksize, and SetStacksize are stubs that
match the upstream signatures and return sensible defaults.
The Go runtime grows goroutine stacks automatically, so
SetStacksize is advisory rather than authoritative. We
preserve the API because later subsystems (sys.thread_info,
threading.stack_size) call into it.
Three pieces of Python/thread.c are deferred until their
dependencies land:
PyThread_acquire_lock_timed_with_retries. Needs the GIL state machine, which arrives in v0.9.PyThread_tss_*. Thread-local storage. Needs the per-thread state object, also v0.9.PyThread_GetInfo. Builds asys.thread_infonamed tuple; needssys, which lands in v0.11.
None of these gates are blocked by the deferral. We will
return to thread.c when we have the upstream callers.
The parking lot, locks, and critical sections
This is the meat of v0.1.0.
CPython 3.13+ replaced its locking primitives with a parking-lot design borrowed from WebKit and Rust. The idea: every lock is one byte. When a lock is uncontended, the byte alone is enough. When contention happens, the waiter parks on a global hash-keyed table keyed by the lock's address. The wakeup walks the same table. No per-lock OS condition variable, no fat lock structure, no per-lock allocation.
We ported the design from three CPython files:
Python/lock.c. The mutex, event, once-flag, recursive mutex, read-write lock, and seqlock primitives.Python/parking_lot.c. The address-keyed parking table.Python/critical_section.c. The PEP 703 critical-section stack that the free-threading build uses to coarsen lock scope around object operations.
// pysync/parking_lot.go
const numBuckets = 257 // matches CPython exactly
// Bucket layout matches Python/parking_lot.c bucket.
type bucket struct {
mu sync.Mutex
head *waiter
tail *waiter
}
var buckets [numBuckets]bucket
func Park(addr unsafe.Pointer, validate func() bool, prepark, postpark func(),
timeout time.Duration, parkArg unsafe.Pointer, detach bool) ParkResult
func Unpark(addr unsafe.Pointer, unparkFn func(arg unsafe.Pointer, hasMore bool) unsafe.Pointer) (int, unsafe.Pointer)
func UnparkAll(addr unsafe.Pointer) int
func AfterFork()
The bucket count is 257, matching upstream exactly. We could
have used runtime.NumCPU() or a power of two; we stayed with
257 because it is prime, it is what CPython picked, and it has
been benchmarked there. There is no reason to second-guess it.
// pysync/lock.go - the Mutex type
type Mutex struct {
v atomic.Uint8 // locked | has-parked, same bits as PyMutex
}
func (m *Mutex) Lock()
func (m *Mutex) Unlock()
func (m *Mutex) TryLock() bool
func (m *Mutex) LockTimed(timeout time.Duration, detach bool) Result
The byte layout is bit-for-bit the same as PyMutex._bits:
bit 0 is the locked flag, bit 1 is the parked-waiters flag.
This matters because future code that does atomic operations
on the mutex byte (the GIL state machine, for instance) needs
the layout to be portable.
The fair-handoff timer is 1 ms, matching CPython's
TIME_TO_BE_FAIR_NS. When a lock is contended for longer than
1 ms, the unlocker hands the lock directly to the head of the
parking queue rather than letting a fresh contender steal it.
This prevents the starvation pattern where a hot thread can
keep reacquiring a contended lock indefinitely.
// pysync/critical_section.go
type CriticalSection struct {
prev *CriticalSection
obj1, obj2 *objects.Object // protected objects
}
func BeginMutex(cs *CriticalSection, m *Mutex)
func BeginMutex2(cs *CriticalSection, m1, m2 *Mutex)
func End(cs *CriticalSection)
The PEP 703 critical-section stack is bookkeeping only in the
GIL build. Under the GIL, a critical section is a no-op; the
GIL already serializes the operations a critical section would
guard. Under the free-threaded build (v0.14, gated by the
pygil_disabled build tag), the same surface acquires real
per-object locks. We landed the bookkeeping now so the call
sites can be written once and switched over without source
changes when v0.14 lands.
The _PySemaphore abstraction CPython uses for parking-lot
sleeps is replaced by a buffered chan struct{} inside each
waiter. Go channels give us the same wakeup semantics with
less ceremony.
The hash secret
CPython has a one-file hash-secret bootstrapper at
Python/bootstrap_hash.c. It runs once at interpreter startup
and populates _Py_HashSecret from PYTHONHASHSEED. The
secret is what siphash13 and fnv1a use to randomize
hashes. Without this, dictionaries are vulnerable to
algorithmic-complexity attacks (the attacker computes inputs
that all hash to the same bucket and forces every lookup to
walk a chain).
We ported the seed-init half. The hash functions themselves
ship later (in v0.4 with pyhash.c).
// hash/hash.go
func Init() // called once at startup
func Reset() // for tests
func Secret() [SecretSize]byte
const SecretSize = 24
PYTHONHASHSEED parsing matches CPython exactly:
randomor unset. Seed from OS entropy.0. Disable randomization. Every interpreter sees the same secret.- A positive 32-bit integer. Use the LCG to derive the secret from that seed, byte-identical to what CPython produces.
The LCG constants are upstream's: 214013, 2531011, output
bits 16..23. We ran the implementation against the CPython
output for every seed in [0, 1024) and confirmed byte
identity. This matters because Python tests that pin
PYTHONHASHSEED=0 and check hash values (there are several in
the test suite) must produce the same bytes against gopy.
OS entropy comes from Go's crypto/rand. This means the
_PyOS_URandom* family of functions and the /dev/urandom
fallback are redundant; Go already handles platform
differences underneath us. We dropped them rather than port
them.
What's new
The full feature breakdown, grouped by package.
arena/
The compiler-side bump allocator. Port of
Python/pyarena.c.
arena.New()creates an empty arena.(*Arena).Malloc(n)bump-allocatesnbytes inside the current 8 KiB block; chains a new block if the current one is full.(*Arena).AddObject(obj)registers a Python object the arena owns. The arena will decref it onFree.(*Arena).Free()walks the block list freeing each block and walks the object list dropping each reference.
Block size is 8 KiB matching DEFAULT_BLOCK_SIZE. The block
header layout matches _block upstream.
pythread/
Cross-platform half of Python/thread.c. Layered over Go
goroutines.
pythread.Start(fn)launches a goroutine and returns a*Handle.(*Handle).Join()blocks until the goroutine returns.(*Handle).Ident()returns a stable identifier we use for thread-local keys.pythread.Init()is a no-op that exists for API parity with upstream.pythread.GetStacksize()returns the current goroutine stack hint.pythread.SetStacksize(n)stores a hint; goroutines grow stacks automatically so the value is advisory.pythread.TimeoutMaxmatchesPY_TIMEOUT_MAXexactly so call sites that clamp timeouts produce the same values.
The thread_pthread.h and thread_nt.h files have no
counterpart in Go. We drop them.
pysync/
The locking primitives. Port of Python/lock.c,
Python/parking_lot.c, and Python/critical_section.c. We
named the package pysync (not sync) to keep it visually
distinct from Go's standard sync package; in any one file
both might be imported and the distinction matters.
pysync.Park,Unpark,UnparkAll,AfterFork. The 257-bucket address-keyed parking lot. Bucket count matches upstream.pysync.Mutex. Byte-flag mutex. Two bits: locked, has-parked-waiters. 1 ms fair-handoff. Detach hook plumbed through (we will wire it to GIL release when the GIL lands in v0.9; the parameter exists today so call sites can be written once).pysync.Event. One-shot signal.pysync.OnceFlag.pthread_once-style init guard.pysync.RecursiveMutex. Reentrant mutex.pysync.RWMutex. Reader-writer lock; multiple readers or one writer.pysync.SeqLock. Optimistic-read seqlock used by the type cache.pysync.RawMutex. The non-Python-object mutex; same byte layout but does not interact with critical sections.pysync.CriticalSection,BeginMutex,BeginMutex2,End. PEP 703 critical-section stack. Bookkeeping only under the GIL build.
The internal _PySemaphore abstraction is replaced by a
buffered chan struct{} inside each waiter. The Go channel
gives us the same blocking semantics with less code.
hash/
Seed-init half of Python/bootstrap_hash.c. The hash
functions themselves ship in v0.4.
hash.Init(). Called once at startup. ReadsPYTHONHASHSEED, fills_Py_HashSecret.hash.Reset(). For tests.hash.Secret(). Returns the secret as a fixed-size array.hash.SecretSize. 24, matching_Py_HashSecret.PYTHONHASHSEEDparsing.random,0, or a positive 32-bit integer. Identical to upstream.- LCG. Constants
214013and2531011, output bits 16..23. Byte-identical tolcg_urandomfor any seed. - OS entropy through Go's
crypto/rand. We drop_PyOS_URandom*and the/dev/urandomfallback because Go already handles platform differences.
Why we built it this way
A few choices in this release deserve their own callouts.
Why port parking_lot.c faithfully
A natural question: Go already has sync.Mutex. Why not use
it directly and skip the parking-lot port?
Two reasons.
The byte layout matters. Future code (the GIL state machine,
the type cache's seqlock, the free-threading critical-section
machinery) does atomic operations on the lock's byte
representation. If pysync.Mutex is a wrapper around
sync.Mutex, those operations have to go through Go's lock
internals, which are not stable and not documented for that
use. Keeping the byte layout under our control gives us a
stable representation we can reason about.
The fair-handoff timer matters. Go's sync.Mutex is unfair
(by design; fairness costs throughput on most workloads).
CPython's mutex is also unfair until 1 ms of contention,
then it hands the lock directly to the waiter at the head of
the queue. That hybrid policy is what gives Python its
"fairness under stress, throughput under calm" behavior.
Replicating it requires owning the lock's internals.
Why pysync and not sync
Naming. In any file that does any concurrent work, both
this package and the standard library's sync are
candidates. If both are named sync, every file needs an
import alias. If one is named pysync and the other is the
standard sync, the file reads cleanly with no alias. We
paid the four-character cost in exchange for clarity
everywhere downstream.
Why we defer the lock-acquire-retry path
PyThread_acquire_lock_timed_with_retries is the function
the GIL release machinery calls when a thread wants to drop
the GIL, sleep on another lock, and then reacquire the GIL.
The retry loop interacts with the GIL state machine in
non-trivial ways: the retry has to release the GIL on each
attempt, otherwise other threads cannot make progress.
We do not have a GIL yet (it lands in v0.9 with the rest of the thread-state machinery). Porting the retry loop now would mean porting it against a placeholder GIL and re-porting it when the real one arrives. Deferring it to v0.9, where the upstream caller exists, is one port instead of two.
Why we keep the LCG bit-identical
Hash randomization is a security feature. The expectation
across Python tooling (CPython itself, PyPy, MicroPython,
tools like Hypothesis that produce reproducible test orders)
is that PYTHONHASHSEED=N produces the same hashes for the
same inputs across implementations. We do not get to
deviate. Bit-for-bit identity is the contract.
We verified identity by running the seed range
[0, 1024) against a CPython 3.14 build with pdb set to
dump _Py_HashSecret after init, and comparing byte by
byte. All matched. The test fixture for this lives in
hash/hash_test.go and runs on every CI pass.
Where it lives
The four packages each live in their own directory at the module root:
arena/.arena.goportsPython/pyarena.c.pythread/.thread.goports the cross-platform half ofPython/thread.c.handle.gocarries the Go-side goroutine wrapping.pysync/.lock.goportsPython/lock.c.parking_lot.goportsPython/parking_lot.c.critical_section.goportsPython/critical_section.c.hash/.hash.goports the seed-init half ofPython/bootstrap_hash.c.
Test files live alongside each. The arena tests exercise the allocator under stress. The pythread tests confirm goroutine join semantics. The pysync tests cover the parking-lot fairness window and the recursive-mutex reentrance count. The hash tests verify byte identity against CPython's LCG output.
Compatibility
Nothing user-visible runs at this stage, so there is nothing to break at the language surface. But two policy choices set expectations for later releases.
- Go 1.26 or newer remains the floor.
- CPython 3.14.0+ behavior remains the behavioral target.
- Build tag
pygil_disabledis reserved for the future free-threaded build (v0.14). v0.1.0 builds the GIL configuration only. The tag exists so packages can start gating code on it now; the tag's default value is "GIL enabled" until v0.14 flips it.
Known limitations
- No parser, AST, compiler, VM, or standard library yet. The roadmap reaches the object model in v0.2.0 and the first executable Python in v0.6.0.
PyThread_acquire_lock_timed_with_retries,PyThread_tss_*, andPyThread_GetInfoare not yet ported; they depend on subsystems that arrive in later phases (GIL state in v0.9, TSS in v0.9,sys.thread_infoin v0.11).- Hash functions themselves (SipHash-1-3, FNV) are not
here. They ship in v0.4 with the
Python/pyhash.cport. v0.1.0 supplies only the secret; the consumer arrives later.
What's next
v0.2.0 builds the object model on top of these primitives. The plan:
objects/. TheObjectinterface,Header,VarHeader, the refcount machinery, the type-slot surface, and the v0.2 concrete types (int,float,bool,None,tuple,list,dict,slice,range).typeobj/. The C3 linearization and the slot lookup walk.abstract/. The subset ofObjects/abstract.cneeded to build a dict, hash a tuple, iterate a list.objtest/. The first gate. Build a dict, hash a tuple, iterate a list, all from Go test code.
The interpreter still will not run Python code at the end of v0.2.0. The first runnable Python lands in v0.6.0 once the parser, AST, compiler, and VM are all in place.
Acknowledgments
Python/lock.c, Python/parking_lot.c, and
Python/critical_section.c are the work of the PEP 703
authors (Sam Gross and the free-threading working group)
plus the broader CPython core. Reading those files end to
end was the most fun part of this release. The design
makes sense at every layer and the port was straightforward
because of it.