Skip to main content

Include/internal/pycore_hashtable.h

cpython 3.14 @ ab2d84fe1023/Include/internal/pycore_hashtable.h

This header declares _Py_hashtable_t, a simple open-addressing hash table used internally by CPython for bookkeeping that must not touch the Python object layer (and therefore cannot use dict). It supports both string-keyed and pointer-keyed tables through a configurable hash/compare function pair. The main consumers are tracemalloc (allocation records keyed by pointer), pystate (weak-reference list lookup), and importlib (path hook caches). The API is not part of the stable ABI and changes without deprecation notice.

Map

SymbolRole
_Py_hashtable_tThe table struct; holds entry array, load factor, and function pointers
_Py_hashtable_entry_tOne slot: key pointer plus opaque value storage
_Py_hashtable_newAllocate a table with custom hash and compare functions
_Py_hashtable_new_fullLike _new but also accepts custom allocator and destructor
_Py_hashtable_setInsert or overwrite a key/value pair
_Py_hashtable_getLook up a key; returns a pointer to the value storage or NULL
_Py_hashtable_stealRemove and return a key's value
_Py_hashtable_foreachIterate over all entries; callback may not mutate the table
_Py_hashtable_destroyFree all memory; calls the entry destructor if one was provided
_Py_hashtable_sizeReturn the number of live entries

Predefined hash/compare helpers: _Py_hashtable_hash_ptr, _Py_hashtable_compare_direct, _Py_hashtable_hash_str, _Py_hashtable_compare_str.

Reading

Table structure and entry layout

The table is a simple struct with an inline array of _Py_hashtable_entry_t slots, a collision chain encoded as a singly-linked list per bucket, and a pair of function pointers for hashing and comparison:

// Include/internal/pycore_hashtable.h
struct _Py_hashtable_t {
size_t nentries; /* number of live entries */
size_t nbuckets; /* current bucket count (always a power of two) */
_Py_hashtable_entry_t *buckets;
_Py_hashtable_hash_func hash_func;
_Py_hashtable_compare_func compare_func;
_Py_hashtable_destroy_func destroy_key;
_Py_hashtable_destroy_func destroy_value;
PyMemAllocatorEx alloc;
};

Values are stored as void * alongside the key pointer. Callers that need to store larger values typically heap-allocate a struct and store a pointer, then supply a destroy_value callback so _Py_hashtable_destroy frees it.

tracemalloc usage pattern

Modules/_tracemalloc.c keeps one _Py_hashtable_t per memory domain, keyed by the raw allocation pointer. Each value is a pointer to a traceback_t struct recording the Python stack at allocation time:

// Modules/_tracemalloc.c (simplified)
static _Py_hashtable_t *tracemalloc_traces;

static void
tracemalloc_init(void)
{
tracemalloc_traces = _Py_hashtable_new(
_Py_hashtable_hash_ptr,
_Py_hashtable_compare_direct);
}

static void
tracemalloc_add_trace(void *ptr, traceback_t *tb)
{
_Py_hashtable_set(tracemalloc_traces, ptr, tb);
}

When free() is intercepted, _Py_hashtable_steal removes and returns the traceback_t * so the frame can be released:

traceback_t *tb = _Py_hashtable_steal(tracemalloc_traces, ptr);
if (tb != NULL) {
traceback_decref(tb);
}

Resizing and load factor

The table doubles in size when the load factor (nentries / nbuckets) exceeds 2/3. Resizing allocates a fresh bucket array, rehashes all live entries into it, and frees the old array. The hash function is called again for each entry during rehash because _Py_hashtable_t does not cache hashes. For pointer-keyed tables this is cheap (one shift-and-XOR). For string-keyed tables, callers that are performance-sensitive typically intern the string first so the key pointer is stable.

// Python/hashtable.c (simplified)
static int
_Py_hashtable_rehash(_Py_hashtable_t *ht)
{
size_t new_size = ht->nbuckets * 2;
_Py_hashtable_entry_t *new_buckets = ht->alloc.malloc(
ht->alloc.ctx,
new_size * sizeof(_Py_hashtable_entry_t));
/* re-insert all entries ... */
ht->alloc.free(ht->alloc.ctx, ht->buckets);
ht->buckets = new_buckets;
ht->nbuckets = new_size;
return 0;
}

gopy mirror

_Py_hashtable_t has no direct counterpart in gopy. Go's built-in map type covers every use case this struct serves: pointer-keyed lookup, string-keyed lookup, and iteration are all native. gopy uses plain Go maps wherever CPython uses _Py_hashtable_t.

If gopy eventually supports tracemalloc-style allocation tracking (for embedding diagnostics), the equivalent would be a sync.Map or a map[uintptr]*TracebackEntry protected by a sync.Mutex, depending on whether concurrency is required.

CPython 3.14 changes

CPython 3.14 added _Py_hashtable_new_full, which accepts a custom PyMemAllocatorEx so that tables used during interpreter startup (before the default allocator is ready) can use the raw system allocator. Earlier versions always used PyMem_RawMalloc, which requires the allocator to be initialised first. The destroy_key and destroy_value callbacks were also promoted from _Py_hashtable_new_full into the main struct, making _Py_hashtable_destroy responsible for cleaning up entries unconditionally rather than requiring callers to iterate and free before destroying.