Skip to main content

Objects/dictobject.c

cpython 3.14 @ ab2d84fe1023/Objects/dictobject.c

The dict implementation. CPython dicts use a compact hash table introduced in 3.6 that preserves insertion order. The layout has two arrays: a sparse dk_indices array (the hash table proper, storing indices into a dense dk_entries array), and the dense dk_entries array (hash, key, value triples in insertion order). This achieves both O(1) lookup and ordered iteration without copying. Split tables (shared key structure between instances of the same class) optimize memory for many instances with the same attribute layout.

Map

LinesSymbolRolegopy
1-200_Py_dict_lookup, find_empty_slot, insertion_resizeHash table lookup and resize.objects/dict.go:lookup
200-600dict_new_presized, PyDict_New, PyDict_NewPresizedAllocation and pre-sizing.objects/dict.go:NewDict
600-1200PyDict_GetItem, PyDict_GetItemWithError, _PyDict_GetItemWithErrorLookup by key.objects/dict.go:(*Dict).GetItem
1200-1800PyDict_SetItem, insertdict, _PyDict_SetItem_KnownHashInsert and update.objects/dict.go:(*Dict).SetItem
1800-2200PyDict_DelItem, dict_del_commonDeletion.objects/dict.go:(*Dict).DelItem
2200-2800PyDict_Update, PyDict_Merge, dict_mergeMerge dicts.objects/dict_mutate.go:Merge
2800-3400PyDict_Keys, PyDict_Values, PyDict_Items, PyDict_SizeViews and size.objects/dict.go
3400-4000dict_repr, dict_iter, dictiter_new, dictiter_iternextkey, dictiter_iternextvalue, dictiter_iternextitemRepr and iterators.objects/dict_iter.go
4000-4800dict_equal, dict_richcompare, dict_copy, dict_clear, dict_pop, dict_popitemComparison and mutation.objects/dict_mutate.go
4800-5600_PyDict_FromKeys, dict_fromkeys, dict_setdefault, dict_getClass methods.objects/dict.go:FromKeys
5600-6400_PyObject_MakeDictFromInstanceAttributes, _PyDict_LoadGlobal, _PyDictKeys_DecRefSplit table and class dict support.objects/dict.go:splitTable
6400-7824PyDict_AddWatcher, _PyDict_SendEvent, version tag bookkeeping, PyDict_TypeWatcher API and type definition.objects/dict.go:AddWatcher

Reading

Compact layout (lines 1 to 200)

cpython 3.14 @ ab2d84fe1023/Objects/dictobject.c#L1-200

The two-array structure. dk_indices is a typed array (int8, int16, int32, or int64 depending on table size) that maps hash slots to entry indices. dk_entries is a dense array of PyDictKeyEntry {hash, key, value} structs in insertion order. Lookup hashes the key, indexes into dk_indices to get an entry index, then checks the entry for a key match.

Collision resolution uses a perturbation-based probe sequence:

// Initial probe slot:
Py_ssize_t i = hash & mask;

// Subsequent probes:
perturb >>= PERTURB_SHIFT;
i = (i * 5 + 1 + perturb) & mask;

PERTURB_SHIFT is 5. The perturbation term mixes in high bits of the hash so that clusters formed by the lower-bit probe sequence eventually disperse. The dk_indices element at position i holds either a non-negative index into dk_entries, DKIX_EMPTY (-1), or DKIX_DUMMY (-2, a deleted slot sentinel).

insertdict (lines 1200 to 1500)

cpython 3.14 @ ab2d84fe1023/Objects/dictobject.c#L1200-1500

The insert path. Computes the hash, calls _Py_dict_lookup to find the slot, then branches on the result:

static int
insertdict(PyInterpreterState *interp, PyDictObject *mp,
PyObject *key, Py_hash_t hash, PyObject *value)
{
PyObject *old_value;
PyDictKeyEntry *ep;
Py_ssize_t ix = _Py_dict_lookup(mp, key, hash, &old_value);
if (ix == DKIX_ERROR)
return -1;

if (ix == DKIX_EMPTY) {
/* Key not present: insert into next free dk_entries slot. */
if (mp->ma_keys->dk_usable <= 0) {
if (insertion_resize(interp, mp, 1) < 0)
return -1;
ix = find_empty_slot(mp->ma_keys, hash);
}
...
ep = &DK_ENTRIES(mp->ma_keys)[mp->ma_keys->dk_nentries];
ep->me_key = Py_NewRef(key);
ep->me_hash = hash;
ep->me_value = Py_NewRef(value);
...
mp->ma_used++;
mp->ma_version_tag = DICT_NEXT_VERSION(interp);
} else {
/* Key already present: replace value. */
Py_SETREF(ep->me_value, Py_NewRef(value));
...
}
return 0;
}

If the load factor exceeds 2/3 of dk_size, insertion_resize doubles the table. The new dk_indices array is re-allocated and all existing entries are re-inserted by walking dk_entries in order; the insertion-order property is preserved because dk_entries is not moved.

Version tag (lines 6400 to 6600)

cpython 3.14 @ ab2d84fe1023/Objects/dictobject.c#L6400-6600

Every dict mutation bumps ma_version_tag, a 64-bit counter global to the interpreter. The eval loop's LOAD_GLOBAL inline cache stores the dict's version tag at specialization time. A mismatch at runtime forces a real dict lookup and re-specialization:

static inline uint64_t
DICT_NEXT_VERSION(PyInterpreterState *interp)
{
return ++interp->dict_state.global_version;
}

_PyDict_SendEvent is called on every mutation. Registered watchers receive a PyDict_EVENT_MODIFIED, PyDict_EVENT_ADDED, or PyDict_EVENT_DELETED notification. PyDict_AddWatcher registers a callback and returns a small integer watcher ID stored in ma_version_tag's high bits.

Split tables (lines 5600 to 6000)

cpython 3.14 @ ab2d84fe1023/Objects/dictobject.c#L5600-6000

When many instances of the same class have the same attribute names (common for __dict__), CPython shares a single PyDictKeysObject between all instances. Each instance has only a dk_values array with the actual values; the keys and hash table are shared. The split layout reduces per-instance memory from O(n*3) entries to O(n) values:

/* Materialize a full combined dict from a split one.
Called when instance attributes diverge from the shared key set. */
PyObject *
_PyObject_MakeDictFromInstanceAttributes(PyObject *obj, PyDictValues *values)
{
PyInterpreterState *interp = _PyInterpreterState_GET();
PyDictKeysObject *keys = ...;
PyDictObject *mp = (PyDictObject *)new_dict(interp, keys, values, 0, 0);
...
}

Once an instance dict diverges (a new key is added that does not exist in the shared key structure), _PyObject_MakeDictFromInstanceAttributes materializes a full combined dict and the instance switches to the combined layout.

gopy mirror

objects/dict.go, objects/dict_iter.go, objects/dict_mutate.go. The compact layout is preserved with dk_indices as a Go byte-slice typed by slot size. Version tag is a uint64 field. Split tables are supported for class __dict__ optimization. Watchers map to Go callbacks registered via (*Dict).AddWatcher.

CPython 3.14 changes

Dict watchers (PyDict_AddWatcher) added in 3.12. Compact ordered layout stable since 3.7. Inline cache version tag check stable since 3.11 specializing interpreter. The ma_version_tag watcher-ID encoding in the high bits was added in 3.12 alongside the watcher API.