1671. gopy object protocol
What we are porting
cpython/Include/object.h and cpython/Include/cpython/object.h define
PyObject and PyVarObject. Every Python value carries a header with a
refcount and a type pointer; variable-size types (tuple, str, bytes, int)
add an ob_size field.
In CPython:
struct _object {
Py_ssize_t ob_refcnt;
PyTypeObject *ob_type;
};
typedef struct {
PyObject ob_base;
Py_ssize_t ob_size;
} PyVarObject;
The free-threaded build adds ob_tid, ob_mutex, ob_gc_bits, ob_ref_local,
ob_ref_shared. v0.1 to v0.13 ship the GIL build, so we only need the GIL-build
fields. The free-threaded fields land in v0.14.
Go translation
// Header is the per-object header. Mirrors struct _object in
// Include/object.h. The zero value is invalid; objects are built by
// each type's constructor which initializes refcount=1 and type.
type Header struct {
refcnt int64
typ *Type
}
// VarHeader extends Header with ob_size. Used by tuple, str, bytes,
// int (long), and any other variable-length builtin.
type VarHeader struct {
Header
size int64
}
// Object is what every Python value satisfies. Concrete types embed
// *Header (or *VarHeader) and add their own data.
type Object interface {
Type() *Type
Hdr() *Header
}
Hdr() lets generic code reach the refcount and type without knowing
the concrete shape. The C macro Py_TYPE(o) becomes o.Type(). The
macro Py_REFCNT(o) becomes o.Hdr().refcnt (unexported; only the
runtime fiddles with it).
Why a method, not field access
Go interfaces force a method, not a field. We pay one indirection per
Type() lookup. CPython pays the same indirection through Py_TYPE in
the free-threaded build. v0.6 adds a vectorized fast path for common
types if the indirection shows up in the VM hot path.
Refcount
The Go GC reclaims memory; the refcount exists to drive __del__, weak
references, and the cycle collector exactly as CPython does. v0.2 ships
the refcount field and the Incref/Decref ops; the actual cycle
collector arrives in v0.10.
// Incref bumps the refcount. Mirrors Py_INCREF.
func Incref(o Object) {
atomic.AddInt64(&o.Hdr().refcnt, 1)
}
// Decref drops the refcount. If it reaches zero, the type's tp_dealloc
// is invoked. Mirrors Py_DECREF.
func Decref(o Object) {
if atomic.AddInt64(&o.Hdr().refcnt, -1) == 0 {
o.Type().Dealloc(o)
}
}
In v0.2 there is no Dealloc to call (we are not yet running user code
that creates cycles). The slot stays nil and Decref is a no-op when the
slot is nil. v0.3 wires up __del__ and v0.10 wires up the cycle
collector.
Atomicity
GIL-build CPython uses non-atomic ++/--. Go uses atomic.AddInt64
because Go has no GIL of its own; even GIL-build gopy needs atomic
refcount because goroutines can race. The cost is one LOCK XADD per
incref/decref. CPython pays the same cost in the free-threaded build.
Identity (is)
Two Go values are is-equal if they are the same Object pointer.
Concrete types embed pointer headers, so identity is pointer identity.
func Is(a, b Object) bool {
return a == b
}
Singletons (None, True, False, the small-int cache) preserve identity across constructions because the constructor returns the cached pointer.
Equality (==)
== calls the type's tp_richcompare. v0.2 lands a richcompare slot
on Type and the four builtins that need it for the gate:
int: numeric equality withint/bool/float.float: numeric equality.bool: numeric equality (True == 1, False == 0).tuple: elementwise equality, fall through to identity.- Other types in v0.2 (list, dict, slice, range): identity only. Full richcompare lands in v0.4 alongside string equality.
Hash (__hash__)
Hashable in v0.2: int, float, bool, None, tuple. Unhashable:
list, dict. Frozenset arrives in v0.4 with the real SipHash.
The hash protocol is tp_hash. Returning -1 signals an error in CPython;
in gopy we return (int64, error).
type HashFunc func(o Object) (int64, error)
v0.2 uses placeholder hash functions that match CPython's algorithm exactly except the SipHash key is all-zero. The v0.1 hash package already produces the per-process key; v0.4 wires it through.
Header alignment
CPython uses pointer-sized alignment. Go does the same automatically.
unsafe.Sizeof(Header{}) is 16 bytes on 64-bit (8 for refcount, 8 for
type pointer). VarHeader adds 8 for size. Same as CPython.
Borrowed vs owned references
CPython distinguishes borrowed from owned references at the API level.
gopy does not. Go GC handles lifetime; the refcount is bookkeeping for
finalizers and the cycle collector. We do not write Py_INCREF /
Py_DECREF at every borrow. We bump refs only at observable transfer
points (assignment to a container, return from a constructor, finalize
hooks).
This is a meaningful divergence from CPython's API surface but not
from its observable behaviour. As long as __del__ runs at the right
time and weakrefs trigger at the right time, the user cannot tell that
the underlying counting is sparser.
The exact set of "observable transfer points" lives in 1671 §"Refcount emit points" once v0.10 lands. v0.2 ships a placeholder where Incref and Decref are spam-callable.
File mapping
| C source | Go target |
|---|---|
Include/object.h (struct) | objects/header.go |
Include/object.h (Py_INCREF) | objects/refcount.go |
Objects/object.c (Py_NewRef) | objects/object.go |
Objects/object.c (PyObject_Hash) | objects/hash.go |
Objects/object.c (PyObject_Repr) | objects/repr.go |
Checklist
Status legend: [x] shipped, [ ] pending, [~] partial / scaffold,
[n] deferred / not in scope this phase.
Files
- [~]
objects/header.go:Header,VarHeader,Objectinterface. Scaffold landed in v0.2; field naming and method set still match this spec, but the embedded-pointer convention for concrete types needs an audit pass once the long port lands. - [~]
objects/refcount.go:Incref,Decrefoveratomic.AddInt64. Decref currently no-ops whentp_deallocis nil; that branch goes away once v0.10 wires the cycle collector. -
objects/object.go:NewRef,XNewRef,Clear, the protocol helpers fromObjects/object.cthat are not hash or repr. -
objects/hash.go:Hash(o)dispatcher that callstp_hashand threads the per-process SipHash key fromhash/. Placeholder zero-key path retired. -
objects/repr.go:Repr(o),Str(o), plus the recursion guard fromObjects/object.c:PyObject_Repr(thePy_ReprEnter/Py_ReprLeavepair). -
objects/identity.go:Is(a, b), singleton registry hooks.
Surface guarantees
-
Hdr()returns the same*Headerfor the lifetime of the object. Pinned by the v0.2 gate (dict insert + lookup round-trip). - Refcount writes are atomic on every architecture Go supports.
-
ReprandStrround-trip every concrete builtin's value to a string thatevalwould accept for the literal types. Pinned bycompat/repr_test.go(lands withcompat/). -
Hashmatches CPython underPYTHONHASHSEED=0forint,float,bool,None,tuple,bytes,str,frozenset. Pinned bycompat/hash_test.go. -
Isreturnstruefor the documented singletons (None, True, False, NotImplemented, Ellipsis,(), small ints-5..256) across every constructor path.
Refcount emit points (v0.10)
Tracked here so the v0.10 cycle GC has a written contract to honour. Placeholders only until then.
- Container insert: list append, tuple build, dict insert, set add. Each bumps the refcount of the inserted value.
- Container remove: list pop / del, dict del, set discard. Each drops one ref.
- Frame locals: STORE_FAST and friends. The VM owns one ref per local slot.
- Return value: every C-implemented callable returns an owned ref; the caller must drop it.
- Weakref callbacks fire before the refcount hits the freelist.