Skip to main content

Python/parking_lot.c

cpython 3.14 @ ab2d84fe1023/Python/parking_lot.c

A parking lot is a wait queue for threads that are blocked on a Python lock in the free-threaded (no-GIL) build. The name and the design come from WebKit's WTF::ParkingLot and the paper "Futexes are Tricky" by Ulrich Drepper.

The key insight is that lock word and wait queue must be managed atomically. A naive implementation would check the lock, decide to sleep, then be preempted before actually sleeping, missing the wakeup. The parking lot solves this by holding a per-bucket spinlock while the sleeping thread checks the lock word: if the word has changed since the thread last read it, _PyParkingLot_Park returns immediately rather than sleeping.

The wait queue is partitioned into NUM_BUCKETS (128) hash buckets keyed by the lock address. Each bucket is a _Bucket struct holding a spinlock and a doubly-linked list of _ParkingLotNode waiters. The spinlock is acquired with a brief spin then a platform sleep (_PyMutex_LockTimed).

At the OS level, _PyParkingLot_Park sleeps on a _PyEventRc (an event object backed by a futex on Linux, a semaphore on macOS, and a CONDITION_VARIABLE on Windows). This keeps the file portable while the hot path stays branchless for the non-contended case.

Map

LinesSymbolRolegopy
1-80_ParkingLotNode, _Bucket, _parking_lot_bucketsData structures: waiter node (event, address, expected, next/prev), bucket array.pythonrun/parking_lot.go:ParkingLotNode / Bucket
81-160bucket_for, lock_bucket, unlock_bucketHash address to bucket; acquire/release the per-bucket spinlock.pythonrun/parking_lot.go:bucketFor
161-270_PyParkingLot_ParkAtomic check-then-sleep: lock bucket, compare *address to expected, enqueue node, release bucket lock, sleep on event.pythonrun/parking_lot.go:Park
271-370_PyParkingLot_UnparkLock bucket, dequeue one waiter, call unpark_fn while holding the lock, signal the waiter's event.pythonrun/parking_lot.go:Unpark
371-420_PyParkingLot_UnparkAllLock bucket, drain the entire waiter list for address, signal each event.pythonrun/parking_lot.go:UnparkAll
421-442_PyParkingLot_BeginUnpark / _PyParkingLot_FinishUnparkTwo-phase unpark variant; BeginUnpark dequeues the waiter while still holding the lock for atomic CAS.pythonrun/parking_lot.go:BeginUnpark

Reading

Bucket-based wait queue (lines 1 to 160)

cpython 3.14 @ ab2d84fe1023/Python/parking_lot.c#L1-160

typedef struct _ParkingLotNode {
struct _ParkingLotNode *next;
struct _ParkingLotNode *prev;
_PyEventRc *event;
const void *address;
/* Expected value of *address at park time. */
uintptr_t expected;
/* Set to 1 by Unpark to indicate the wakeup was intentional. */
int handed_off;
} _ParkingLotNode;

#define NUM_BUCKETS 128

typedef struct {
_PyMutex mutex;
_ParkingLotNode *head;
_ParkingLotNode *tail;
} _Bucket;

static _Bucket _parking_lot_buckets[NUM_BUCKETS];

address is the address of the lock word being waited on, typically a uint8_t * inside a _PyMutex. bucket_for(address) computes (uintptr_t)address * MULTIPLIER >> (64 - LOG2_NUM_BUCKETS) — a multiplicative hash that distributes common small-stack addresses across all 128 buckets. Each bucket is a FIFO list; new waiters are appended to tail so Unpark can wake them in arrival order (important for mutex fairness).

The _PyMutex inside each _Bucket is itself a tiny spinlock backed by _PyMutex_LockTimed with a spin-count threshold. It must not be a full parking-lot mutex to avoid recursion.

_PyParkingLot_Park atomic check-then-sleep (lines 161 to 270)

cpython 3.14 @ ab2d84fe1023/Python/parking_lot.c#L161-270

int
_PyParkingLot_Park(const void *address, const void *expected, size_t size,
_PyTime_t timeout_ns, void *park_arg, int call_before_sleep)
{
_ParkingLotNode node;
_Bucket *bucket = bucket_for(address);

_PyMutex_Lock(&bucket->mutex);

/* Atomic check: has the value already changed? */
if (!_Py_atomic_compare_equal(address, expected, size)) {
_PyMutex_Unlock(&bucket->mutex);
return PY_PARK_AGAIN;
}

node.address = address;
node.event = _PyEventRc_New();
node.handed_off = 0;
enqueue(bucket, &node);
_PyMutex_Unlock(&bucket->mutex);

/* Optional callback while not holding bucket lock. */
if (call_before_sleep) { ... }

/* Sleep until Unpark signals the event. */
int timed_out = !_PyEvent_WaitTimed(node.event->event, timeout_ns);

if (timed_out) {
/* Remove ourselves from the queue if still there. */
_PyMutex_Lock(&bucket->mutex);
int in_queue = remove_waiter(bucket, &node);
_PyMutex_Unlock(&bucket->mutex);
if (in_queue) {
_PyEventRc_Decref(node.event);
return PY_PARK_TIMEOUT;
}
}
_PyEventRc_Decref(node.event);
return node.handed_off ? PY_PARK_OK : PY_PARK_INTR;
}

The bucket lock is held for the minimum window: just long enough to compare *address against expected and enqueue the node. Releasing the lock before sleeping avoids holding it for the duration of the sleep. If *address already differs from expected (a concurrent unlocker already cleared the lock word), the function returns PY_PARK_AGAIN without sleeping. The caller (_PyMutex_LockTimed) then retries its CAS directly.

_Py_atomic_compare_equal uses __atomic_load_n with __ATOMIC_SEQ_CST to guarantee the read is not moved outside the critical section by the compiler.

_PyParkingLot_Unpark with unpark callback (lines 271 to 370)

cpython 3.14 @ ab2d84fe1023/Python/parking_lot.c#L271-370

void
_PyParkingLot_Unpark(const void *address, _Py_unpark_fn_t *fn, void *arg)
{
_Bucket *bucket = bucket_for(address);
_PyMutex_Lock(&bucket->mutex);

_ParkingLotNode *waiter = find_first_waiter(bucket, address);
int has_more = (waiter != NULL && waiter->next != NULL &&
waiter->next->address == address);

if (waiter) {
dequeue(bucket, waiter);
}

/* Call fn while still holding the bucket lock. */
if (fn) {
fn(arg, waiter != NULL, has_more);
}

_PyMutex_Unlock(&bucket->mutex);

if (waiter) {
waiter->handed_off = 1;
_PyEvent_Notify(waiter->event->event);
}
}

The fn callback runs with the bucket lock held. This is the critical invariant: fn can atomically update the lock word (e.g. clear the "has waiters" bit) at the same time as the waiter is removed from the queue. Without this window, a thread woken by _PyEvent_Notify could re-enter _PyParkingLot_Park and observe a stale "has waiters" bit in the lock word before fn has had a chance to clear it.

has_more tells fn whether additional waiters remain; the _PyMutex implementation uses this to keep the "contended" bit set when it should be, avoiding a thundering-herd where every unparked thread tries to acquire without noticing the queue is non-empty.

_PyParkingLot_UnparkAll (lines 371 to 420) follows the same pattern but drains the entire waiter list atomically before releasing the bucket lock, then signals each event outside the lock.