Skip to main content

Modules/_collectionsmodule.c

Source:

cpython 3.14 @ ab2d84fe1023/Modules/_collectionsmodule.c

This file implements two Python built-ins in C: collections.deque and collections.defaultdict. Both are exposed through the _collections extension module and re-exported by the pure-Python collections package.

Map

SymbolLines (approx)Purpose
BLOCKLEN macro1-30Block size constant (64 items per node)
block struct31-60Doubly-linked array node holding 64 object pointers
dequeobject struct61-120Deque head: leftblock, rightblock, leftindex, rightindex, len, maxlen, state
deque_new121-170Allocate deque, seed with one centre-aligned block
deque_append171-230Append to right end, O(1) amortized
deque_appendleft231-290Append to left end, O(1) amortized
deque_pop291-350Remove from right end
deque_popleft351-410Remove from left end
deque_rotate411-520Rotate n steps, block-pointer fast path for large n
deque_count521-570Linear scan with rich compare
deque_contains571-620Linear scan, returns bool
dequeiterobject struct621-660Iterator state: deque ref, block pointer, index, mutation counter
dequeiter_new661-710Allocate forward iterator
dequeiter_next711-790Advance iterator, detect mutation via state field
dequereviter_new791-840Allocate reverse iterator
defaultdict struct1800-1910Subclass of dict, adds default_factory slot
defdict_missing1911-1970__missing__: call factory, insert, return value
defdict_copy1971-2020Copy preserving default_factory
module init2021-2400_collections_state, PyInit__collections

Reading

dequeobject layout and block chaining

A deque is not a ring buffer. It is a doubly-linked list of fixed-size array blocks. Each block holds BLOCKLEN (64) PyObject* pointers. The deque tracks only the two outermost blocks and the logical start/end indexes within them.

// CPython: Modules/_collectionsmodule.c:61 dequeobject
typedef struct {
PyObject_VAR_HEAD
block *leftblock;
block *rightblock;
Py_ssize_t leftindex; /* in range(BLOCKLEN) */
Py_ssize_t rightindex; /* in range(BLOCKLEN) */
Py_ssize_t maxlen; /* -1 means unbounded */
uint64_t state; /* mutation counter for iterators */
PyObject *weakreflist;
} dequeobject;

When leftindex reaches 0 on an appendleft, a new block is prepended and leftindex resets to BLOCKLEN - 1. When rightindex reaches BLOCKLEN - 1 on an append, a new block is appended and rightindex resets to 0. Block allocation is O(1), and block deallocation on pop is O(1) as well, making both ends amortized O(1) across any sequence of push/pop operations.

maxlen is stored directly on the struct. When set, deque_append drops the opposite end rather than growing, keeping the deque at exactly maxlen items without any extra allocation.

deque_append, deque_appendleft, and deque_rotate

deque_append and deque_appendleft are mirror images. The right-side path:

// CPython: Modules/_collectionsmodule.c:171 deque_append
static PyObject *
deque_append(dequeobject *deque, PyObject *item)
{
if (deque->rightindex == BLOCKLEN - 1) {
block *b = newblock();
if (b == NULL)
return NULL;
b->leftlink = deque->rightblock;
deque->rightblock->rightlink = b;
deque->rightblock = b;
deque->rightindex = -1;
}
Py_INCREF(item);
deque->len++;
deque->rightindex++;
deque->rightblock->data[deque->rightindex] = item;
if (deque->maxlen >= 0 && Py_SIZE(deque) > deque->maxlen)
deque_popleft(deque, NULL);
Py_RETURN_NONE;
}

deque_rotate(n) moves abs(n) items from one end to the other. For rotations where abs(n) is large enough to span whole blocks, CPython relinks block pointers rather than moving items one by one. The per-item loop is the slow fallback for partial blocks.

// CPython: Modules/_collectionsmodule.c:411 deque_rotate
static PyObject *
deque_rotate(dequeobject *deque, PyObject *const *args, Py_ssize_t nargs)

The state field is incremented on every structural mutation. Iterators snapshot state at creation and raise RuntimeError if it has changed.

dequeiterobject and mutation detection

The iterator holds a direct C pointer into the block chain plus the mutation counter snapshot from creation time.

// CPython: Modules/_collectionsmodule.c:711 dequeiter_next
static PyObject *
dequeiter_next(dequeiterobject *it)
{
if (it->deque->state != it->state) {
it->len = -1;
PyErr_SetString(PyExc_RuntimeError,
"deque mutated during iteration");
return NULL;
}
/* advance index; cross block boundary by following rightlink */
}

Crossing a block boundary during iteration moves it->b to the next block and resets the slot index to 0. This is a single pointer dereference, keeping iteration O(1) per element.

defaultdict and the missing protocol

defaultdict is a thin C subclass of dict. It adds one extra pointer slot, default_factory, and overrides tp_as_mapping->mp_subscript to delegate to __missing__ on key errors.

// CPython: Modules/_collectionsmodule.c:1911 defdict_missing
static PyObject *
defdict_missing(defdictobject *dd, PyObject *key)
{
PyObject *factory = dd->default_factory;
PyObject *value;
if (factory == NULL || factory == Py_None) {
PyErr_SetObject(PyExc_KeyError, key);
return NULL;
}
value = PyObject_CallNoArgs(factory);
if (value == NULL)
return NULL;
if (PyObject_SetItem((PyObject *)dd, key, value) < 0) {
Py_DECREF(value);
return NULL;
}
return value;
}

Passing None as default_factory explicitly disables the missing behaviour and restores normal KeyError semantics, matching the documented contract.

gopy notes

Status: not yet ported.

Planned package path: module/collections/.

The deque block chain maps naturally to a Go slice of [64]py.Object fixed arrays. The state mutation counter becomes a uint64 on the struct, compared by iterators on each step.

defaultdict can be a Go struct embedding *objects.Dict with an additional defaultFactory py.Object field. The __missing__ method is registered on the type object during module init, mirroring the C defdict_missing function.

maxlen enforcement and block recycling are the two pieces most likely to need close testing against CPython edge cases: boundary conditions at exactly maxlen, rotate with maxlen set, and pickle round-trip behaviour.