1662. gopy hamt
What we are porting
cpython/Python/hamt.c (~2885 lines) plus the public API in
cpython/Include/internal/pycore_hamt.h. Output:
| C source | Lines | Go target |
|---|---|---|
Python/hamt.c | 2885 | hamt/hamt.go |
Include/internal/pycore_hamt.h (public API) | ~120 | hamt/api.go |
The HAMT (Hash Array Mapped Trie) is an immutable persistent mapping
used as the backing store for Context in PEP 567. Its API is private
to the runtime: stdlib code reaches it only through contextvars. gopy
ships it as a small, self-contained package with no callers other
than contextvar/.
Why immutable
Three properties matter:
- O(1) copy.
ctx.run(...)copies the entire mapping; HAMT shares structure so the copy is a single pointer assignment. - O(log32 N) get / set / delete. Branching factor 32 keeps the tree shallow; for any practical context size the depth is ≤ 7.
- Structural sharing.
setanddeleterebuild the spine but keep all untouched subtrees, so memory footprint per snapshot is O(log N) rather than O(N).
Tree shape
A 32-bit Python hash drives indexing. Each level consumes 5 bits, so
the maximum tree depth is ceil(32/5) = 7. A collision node at level
7 holds keys whose 32-bit hashes agree; that gives a total maximum
depth of 8 (the constant _Py_HAMT_MAX_TREE_DEPTH).
Three node types match CPython exactly:
| Node | When used |
|---|---|
BitmapNode | The default. Holds up to 16 entries with a popcount index. |
ArrayNode | Promoted from BitmapNode when entries exceed the bitmap. |
CollisionNode | Level-7 leaf for keys with identical 32-bit hashes. |
The bitmap layout is two 32-bit fields: one for keys (shift_array)
and one for sub-nodes. Slot index is popcount(bitmap & (bit-1)).
The Go port pins this byte-for-byte; iteration order matches CPython.
Go shape
package hamt
type Hamt struct {
objects.Header
root node
count int
}
type node interface {
assoc(level, hash int, key, val objects.Object) (node, bool, error)
without(level, hash int, key objects.Object) (node, bool, error)
find(level, hash int, key objects.Object) (objects.Object, bool, error)
}
type bitmapNode struct {
bitmap uint32
array []objects.Object // alternating keys, values, or sub-nodes
}
type arrayNode struct {
count int
children [32]node
}
type collisionNode struct {
hash int32
array []objects.Object // alternating keys, values
}
// Public API.
func New() *Hamt
func (h *Hamt) Assoc(key, val objects.Object) (*Hamt, error)
func (h *Hamt) Without(key objects.Object) (*Hamt, bool, error)
func (h *Hamt) Find(key objects.Object) (objects.Object, bool, error)
func (h *Hamt) Len() int
func (h *Hamt) Eq(other *Hamt) (bool, error)
// Iterator state matches PyHamtIteratorState: zero-allocation depth-first.
type Iter struct {
nodes [maxDepth]node
pos [maxDepth]int
level int8
}
func (h *Hamt) Iter() *Iter
func (it *Iter) Next() (key, val objects.Object, ok bool, err error)
Hamt is registered as HamtType for repr and equality. Three
"view" types (HamtKeysType, HamtValuesType, HamtItemsType)
expose the corresponding iterators; CPython exposes them as
_PyHamtKeys_Type etc. but they are private to runtime callers.
CPython parity points
- Bitmap-to-array promotion happens at exactly 16 entries (CPython
_PyHamt_Array_From_Bitmapthreshold). - Collision node insertion preserves insertion order on hash
collisions, matching
hamt_node_collision_assoc. - Iteration emits keys in tree order. The exact ordering is
observable through
Context.copy()and through stdlib tests, so the port pins the bitmap walk direction. Eqshort-circuits on identity, then walks both trees in parallel and is O(N).
Errors
HAMT operations raise KeyError only via Without on a missing key
(returns ok=false rather than the C NULL-with-PyErr pattern).
Hashing errors propagate through Find / Assoc / Without from
the underlying objects.Hash.
Gate
hamt/hamt_test.go:
- Round-trip:
New().Assoc(k1,v1).Assoc(k2,v2).Find(k1)returns(v1, true, nil). - Structural sharing:
h2 := h1.Assoc(k,v); check that nodes outside the path-from-root are pointer-equal betweenh1andh2. - Promotion: insert 17 keys whose hashes share the top 5 bits and
assert the level-1 node is
*arrayNode. - Collision: insert two keys whose 32-bit hashes are equal; assert
level-7 holds a
*collisionNodewith both entries. - Iter: ordering matches a CPython oracle for a fixed key set. (The
oracle is a one-off Python-side script that prints
list(_HamtClass(...).items()); the result is checked in.) - Eq: equal trees built by different insertion orders compare equal.
A property-based check generates random sequences of Assoc /
Without and asserts equivalence with a Go map[Object]Object
oracle.
Out of scope
- The
_testcapiHAMT exports. CPython only uses these inLib/test/test_context.py; we cover the same behaviour through the contextvars tests. - Mutating helpers used inside
_PyHamt_Assocfor transient nodes. Go has no refcount, so the "transient" optimisation collapses into the immutable path with no measurable cost.
v0.9 checklist
Files
-
hamt/hamt.go:Hamt,node,bitmapNode,arrayNode,collisionNode. Shipped in commit9c1d96f. -
hamt/api.go: public Assoc/Without/Find/Len/Eq. -
hamt/iter.go: depth-firstIter. -
hamt/types.go:HamtTyperegistered. -
hamt/hamt_test.go: 12-test gate panel.
Surface guarantees
- Tree depth ≤ 8 for any 32-bit hash distribution.
- BitmapNode promoted to ArrayNode at the CPython threshold.
- Iteration order pinned.
-
Hamtis hashable throughtp_hashreturningfrozenset-style XOR over(hash(k), hash(v))pairs (matches_PyHamt_Hash).