`Python/parking_lot.c`

cpython 3.14 @ ab2d84fe1023/Python/parking_lot.c

A parking lot is a wait queue for threads that are blocked on a Python lock in the free-threaded (no-GIL) build. The name and the design come from WebKit's WTF::ParkingLot and the paper "Futexes are Tricky" by Ulrich Drepper.

The key insight is that lock word and wait queue must be managed atomically. A naive implementation would check the lock, decide to sleep, then be preempted before actually sleeping, missing the wakeup. The parking lot solves this by holding a per-bucket spinlock while the sleeping thread checks the lock word: if the word has changed since the thread last read it, _PyParkingLot_Park returns immediately rather than sleeping.

The wait queue is partitioned into NUM_BUCKETS (128) hash buckets keyed by the lock address. Each bucket is a _Bucket struct holding a spinlock and a doubly-linked list of _ParkingLotNode waiters. The spinlock is acquired with a brief spin then a platform sleep (_PyMutex_LockTimed).

At the OS level, _PyParkingLot_Park sleeps on a _PyEventRc (an event object backed by a futex on Linux, a semaphore on macOS, and a CONDITION_VARIABLE on Windows). This keeps the file portable while the hot path stays branchless for the non-contended case.

Map

Lines	Symbol	Role	gopy
1-80	`_ParkingLotNode`, `_Bucket`, `_parking_lot_buckets`	Data structures: waiter node (event, address, expected, next/prev), bucket array.	`pythonrun/parking_lot.go:ParkingLotNode` / `Bucket`
81-160	`bucket_for`, `lock_bucket`, `unlock_bucket`	Hash address to bucket; acquire/release the per-bucket spinlock.	`pythonrun/parking_lot.go:bucketFor`
161-270	`_PyParkingLot_Park`	Atomic check-then-sleep: lock bucket, compare `*address` to `expected`, enqueue node, release bucket lock, sleep on event.	`pythonrun/parking_lot.go:Park`
271-370	`_PyParkingLot_Unpark`	Lock bucket, dequeue one waiter, call `unpark_fn` while holding the lock, signal the waiter's event.	`pythonrun/parking_lot.go:Unpark`
371-420	`_PyParkingLot_UnparkAll`	Lock bucket, drain the entire waiter list for `address`, signal each event.	`pythonrun/parking_lot.go:UnparkAll`
421-442	`_PyParkingLot_BeginUnpark` / `_PyParkingLot_FinishUnpark`	Two-phase unpark variant; `BeginUnpark` dequeues the waiter while still holding the lock for atomic CAS.	`pythonrun/parking_lot.go:BeginUnpark`

Reading

Bucket-based wait queue (lines 1 to 160)

cpython 3.14 @ ab2d84fe1023/Python/parking_lot.c#L1-160

typedef struct _ParkingLotNode {
    struct _ParkingLotNode *next;
    struct _ParkingLotNode *prev;
    _PyEventRc             *event;
    const void             *address;
    /* Expected value of *address at park time. */
    uintptr_t               expected;
    /* Set to 1 by Unpark to indicate the wakeup was intentional. */
    int                     handed_off;
} _ParkingLotNode;

#define NUM_BUCKETS 128

typedef struct {
    _PyMutex          mutex;
    _ParkingLotNode  *head;
    _ParkingLotNode  *tail;
} _Bucket;

static _Bucket _parking_lot_buckets[NUM_BUCKETS];

address is the address of the lock word being waited on, typically a uint8_t * inside a _PyMutex. bucket_for(address) computes (uintptr_t)address * MULTIPLIER >> (64 - LOG2_NUM_BUCKETS) — a multiplicative hash that distributes common small-stack addresses across all 128 buckets. Each bucket is a FIFO list; new waiters are appended to tail so Unpark can wake them in arrival order (important for mutex fairness).

The _PyMutex inside each _Bucket is itself a tiny spinlock backed by _PyMutex_LockTimed with a spin-count threshold. It must not be a full parking-lot mutex to avoid recursion.

`_PyParkingLot_Park` atomic check-then-sleep (lines 161 to 270)

cpython 3.14 @ ab2d84fe1023/Python/parking_lot.c#L161-270

int
_PyParkingLot_Park(const void *address, const void *expected, size_t size,
                   _PyTime_t timeout_ns, void *park_arg, int call_before_sleep)
{
    _ParkingLotNode node;
    _Bucket *bucket = bucket_for(address);

    _PyMutex_Lock(&bucket->mutex);

    /* Atomic check: has the value already changed? */
    if (!_Py_atomic_compare_equal(address, expected, size)) {
        _PyMutex_Unlock(&bucket->mutex);
        return PY_PARK_AGAIN;
    }

    node.address  = address;
    node.event    = _PyEventRc_New();
    node.handed_off = 0;
    enqueue(bucket, &node);
    _PyMutex_Unlock(&bucket->mutex);

    /* Optional callback while not holding bucket lock. */
    if (call_before_sleep) { ... }

    /* Sleep until Unpark signals the event. */
    int timed_out = !_PyEvent_WaitTimed(node.event->event, timeout_ns);

    if (timed_out) {
        /* Remove ourselves from the queue if still there. */
        _PyMutex_Lock(&bucket->mutex);
        int in_queue = remove_waiter(bucket, &node);
        _PyMutex_Unlock(&bucket->mutex);
        if (in_queue) {
            _PyEventRc_Decref(node.event);
            return PY_PARK_TIMEOUT;
        }
    }
    _PyEventRc_Decref(node.event);
    return node.handed_off ? PY_PARK_OK : PY_PARK_INTR;
}

The bucket lock is held for the minimum window: just long enough to compare *address against expected and enqueue the node. Releasing the lock before sleeping avoids holding it for the duration of the sleep. If *address already differs from expected (a concurrent unlocker already cleared the lock word), the function returns PY_PARK_AGAIN without sleeping. The caller (_PyMutex_LockTimed) then retries its CAS directly.

_Py_atomic_compare_equal uses __atomic_load_n with __ATOMIC_SEQ_CST to guarantee the read is not moved outside the critical section by the compiler.

`_PyParkingLot_Unpark` with unpark callback (lines 271 to 370)

cpython 3.14 @ ab2d84fe1023/Python/parking_lot.c#L271-370

void
_PyParkingLot_Unpark(const void *address, _Py_unpark_fn_t *fn, void *arg)
{
    _Bucket *bucket = bucket_for(address);
    _PyMutex_Lock(&bucket->mutex);

    _ParkingLotNode *waiter = find_first_waiter(bucket, address);
    int has_more = (waiter != NULL && waiter->next != NULL &&
                    waiter->next->address == address);

    if (waiter) {
        dequeue(bucket, waiter);
    }

    /* Call fn while still holding the bucket lock. */
    if (fn) {
        fn(arg, waiter != NULL, has_more);
    }

    _PyMutex_Unlock(&bucket->mutex);

    if (waiter) {
        waiter->handed_off = 1;
        _PyEvent_Notify(waiter->event->event);
    }
}

The fn callback runs with the bucket lock held. This is the critical invariant: fn can atomically update the lock word (e.g. clear the "has waiters" bit) at the same time as the waiter is removed from the queue. Without this window, a thread woken by _PyEvent_Notify could re-enter _PyParkingLot_Park and observe a stale "has waiters" bit in the lock word before fn has had a chance to clear it.

has_more tells fn whether additional waiters remain; the _PyMutex implementation uses this to keep the "contended" bit set when it should be, avoiding a thundering-herd where every unparked thread tries to acquire without noticing the queue is non-empty.

_PyParkingLot_UnparkAll (lines 371 to 420) follows the same pattern but drains the entire waiter list atomically before releasing the bucket lock, then signals each event outside the lock.

Map​

Reading​

Bucket-based wait queue (lines 1 to 160)​

_PyParkingLot_Park atomic check-then-sleep (lines 161 to 270)​

_PyParkingLot_Unpark with unpark callback (lines 271 to 370)​

Map

Reading

Bucket-based wait queue (lines 1 to 160)

`_PyParkingLot_Park` atomic check-then-sleep (lines 161 to 270)

`_PyParkingLot_Unpark` with unpark callback (lines 271 to 370)