Skip to main content

1662. gopy hamt

What we are porting

cpython/Python/hamt.c (~2885 lines) plus the public API in cpython/Include/internal/pycore_hamt.h. Output:

C sourceLinesGo target
Python/hamt.c2885hamt/hamt.go
Include/internal/pycore_hamt.h (public API)~120hamt/api.go

The HAMT (Hash Array Mapped Trie) is an immutable persistent mapping used as the backing store for Context in PEP 567. Its API is private to the runtime: stdlib code reaches it only through contextvars. gopy ships it as a small, self-contained package with no callers other than contextvar/.

Why immutable

Three properties matter:

  1. O(1) copy. ctx.run(...) copies the entire mapping; HAMT shares structure so the copy is a single pointer assignment.
  2. O(log32 N) get / set / delete. Branching factor 32 keeps the tree shallow; for any practical context size the depth is ≤ 7.
  3. Structural sharing. set and delete rebuild the spine but keep all untouched subtrees, so memory footprint per snapshot is O(log N) rather than O(N).

Tree shape

A 32-bit Python hash drives indexing. Each level consumes 5 bits, so the maximum tree depth is ceil(32/5) = 7. A collision node at level 7 holds keys whose 32-bit hashes agree; that gives a total maximum depth of 8 (the constant _Py_HAMT_MAX_TREE_DEPTH).

Three node types match CPython exactly:

NodeWhen used
BitmapNodeThe default. Holds up to 16 entries with a popcount index.
ArrayNodePromoted from BitmapNode when entries exceed the bitmap.
CollisionNodeLevel-7 leaf for keys with identical 32-bit hashes.

The bitmap layout is two 32-bit fields: one for keys (shift_array) and one for sub-nodes. Slot index is popcount(bitmap & (bit-1)). The Go port pins this byte-for-byte; iteration order matches CPython.

Go shape

package hamt

type Hamt struct {
objects.Header
root node
count int
}

type node interface {
assoc(level, hash int, key, val objects.Object) (node, bool, error)
without(level, hash int, key objects.Object) (node, bool, error)
find(level, hash int, key objects.Object) (objects.Object, bool, error)
}

type bitmapNode struct {
bitmap uint32
array []objects.Object // alternating keys, values, or sub-nodes
}

type arrayNode struct {
count int
children [32]node
}

type collisionNode struct {
hash int32
array []objects.Object // alternating keys, values
}

// Public API.
func New() *Hamt
func (h *Hamt) Assoc(key, val objects.Object) (*Hamt, error)
func (h *Hamt) Without(key objects.Object) (*Hamt, bool, error)
func (h *Hamt) Find(key objects.Object) (objects.Object, bool, error)
func (h *Hamt) Len() int
func (h *Hamt) Eq(other *Hamt) (bool, error)

// Iterator state matches PyHamtIteratorState: zero-allocation depth-first.
type Iter struct {
nodes [maxDepth]node
pos [maxDepth]int
level int8
}

func (h *Hamt) Iter() *Iter
func (it *Iter) Next() (key, val objects.Object, ok bool, err error)

Hamt is registered as HamtType for repr and equality. Three "view" types (HamtKeysType, HamtValuesType, HamtItemsType) expose the corresponding iterators; CPython exposes them as _PyHamtKeys_Type etc. but they are private to runtime callers.

CPython parity points

  • Bitmap-to-array promotion happens at exactly 16 entries (CPython _PyHamt_Array_From_Bitmap threshold).
  • Collision node insertion preserves insertion order on hash collisions, matching hamt_node_collision_assoc.
  • Iteration emits keys in tree order. The exact ordering is observable through Context.copy() and through stdlib tests, so the port pins the bitmap walk direction.
  • Eq short-circuits on identity, then walks both trees in parallel and is O(N).

Errors

HAMT operations raise KeyError only via Without on a missing key (returns ok=false rather than the C NULL-with-PyErr pattern). Hashing errors propagate through Find / Assoc / Without from the underlying objects.Hash.

Gate

hamt/hamt_test.go:

  • Round-trip: New().Assoc(k1,v1).Assoc(k2,v2).Find(k1) returns (v1, true, nil).
  • Structural sharing: h2 := h1.Assoc(k,v); check that nodes outside the path-from-root are pointer-equal between h1 and h2.
  • Promotion: insert 17 keys whose hashes share the top 5 bits and assert the level-1 node is *arrayNode.
  • Collision: insert two keys whose 32-bit hashes are equal; assert level-7 holds a *collisionNode with both entries.
  • Iter: ordering matches a CPython oracle for a fixed key set. (The oracle is a one-off Python-side script that prints list(_HamtClass(...).items()); the result is checked in.)
  • Eq: equal trees built by different insertion orders compare equal.

A property-based check generates random sequences of Assoc / Without and asserts equivalence with a Go map[Object]Object oracle.

Out of scope

  • The _testcapi HAMT exports. CPython only uses these in Lib/test/test_context.py; we cover the same behaviour through the contextvars tests.
  • Mutating helpers used inside _PyHamt_Assoc for transient nodes. Go has no refcount, so the "transient" optimisation collapses into the immutable path with no measurable cost.

v0.9 checklist

Files

  • hamt/hamt.go: Hamt, node, bitmapNode, arrayNode, collisionNode. Shipped in commit 9c1d96f.
  • hamt/api.go: public Assoc/Without/Find/Len/Eq.
  • hamt/iter.go: depth-first Iter.
  • hamt/types.go: HamtType registered.
  • hamt/hamt_test.go: 12-test gate panel.

Surface guarantees

  • Tree depth ≤ 8 for any 32-bit hash distribution.
  • BitmapNode promoted to ArrayNode at the CPython threshold.
  • Iteration order pinned.
  • Hamt is hashable through tp_hash returning frozenset-style XOR over (hash(k), hash(v)) pairs (matches _PyHamt_Hash).