1613. gopy gc, weakrefs, finalizers
Goal
Port CPython's cycle collector and weakref machinery to Go so that
the user-visible behaviour of the gc and weakref modules matches
upstream byte-for-byte. v0.3 shipped a refcount/finalizer skeleton at
gc/gc.go; gc.Collect() is currently a no-op
(gc/gc.go:113). v0.10 makes it real.
Why a Python collector on top of Go's GC
Go's runtime already reclaims unreachable cycles. We do not need a second collector to free memory. We do need a CPython-shaped collector for three reasons that Go's GC cannot satisfy on its own:
- Finalizer ordering. CPython runs
tp_finalize/__del__on a tracked object as part of the cycle pass and obeys the legacy finalizer rules (no finalizer on objects reachable from a finalizer, resurrection check, exactly-once invocation).runtime.SetFinalizerin Go runs in arbitrary order at arbitrary times. - Weakref callback timing. PEP 442 specifies that weakrefs to objects in a collected cycle are cleared before finalizers run, and callbacks fire in a defined order. Go has no equivalent.
- The
gcmodule API surface.gc.collect(),gc.get_objects(),gc.get_count(),gc.set_threshold(),gc.is_tracked(),gc.freeze()are all CPython-shaped and the stdlib uses them.
The collector therefore runs as a bookkeeping pass over the tracked set: it walks references, classifies reachability, fires weakref callbacks and finalizers in the right order, and clears references on collected objects so Go's GC can reclaim the memory on the next cycle. We do not maintain refcounts; "is this object dead in a cycle" is determined by walking references inside the tracked set, the same way CPython does.
Sources of truth
| CPython file | Lines | Target |
|---|---|---|
Python/gc.c | 2057 | gc/collector.go plus splits |
Python/gc_gil.c | 17 | gc/gil.go |
Python/object_stack.c | 66 | gc/objstack.go |
Objects/weakrefobject.c | 1134 | objects/weakref.go |
Modules/gcmodule.c | ~600 | gc/module.go (built-in registration) |
Include/internal/pycore_gc.h | ~300 | type/state declarations spread across the targets |
gc/gc.go is the existing v0.3 skeleton (Track / Untrack /
RegisterFinalizer / Finalize / Collect-stub). v0.10 keeps the public
API of that file and grows the package.
Package layout
gc/
gc.go Track / Untrack / RegisterFinalizer / Finalize public API (kept)
state.go GCState struct, generation lists, thresholds, enabled flag
list.go gc_list_* doubly-linked list helpers (gc.c:205-313)
refs.go update_refs / subtract_refs / visit_decref (gc.c:392-495)
reachable.go move_unreachable / visit_reachable / clear_unreachable_mask (gc.c:497-728)
finalize.go has_legacy_finalizer / move_legacy_finalizer_reachable / finalize_garbage (gc.c:672-1067)
weakref.go handle_weakrefs callback queue + clearing pass (gc.c:772-948)
collector.go gc_collect_main, gc_select_generation, _PyGC_Collect entry points (gc.c:1258-1696)
module.go gc built-in module: collect, enable, disable, get_threshold, set_threshold, get_count, get_objects, is_tracked, freeze, unfreeze
gil.go collector-vs-mutator interlock (gc_gil.c)
objstack.go _PyObjectStack helper (object_stack.c)
objects/
weakref.go PyWeakref / PyWeakrefRef / PyWeakrefProxy types, callback list, _PyWeakref_ClearWeakRefsExceptCallbacks
Tracked-object protocol
Every container type that can participate in a cycle calls
gc.Track(o) from its constructor and gc.Untrack(o) from its clear
hook. v0.3 already wires this in for tuples, lists, dicts, sets,
generators. v0.10 extends to:
- frames (
objects.Frame), - coroutines / async generators,
- code objects (don't track; they have no cycle path),
- bound methods,
- partial / weakref proxy objects.
The collector visits an object's references through a per-type
tp_traverse slot. v0.10 adds the slot to objects.Type and
populates it for every tracked type. CPython: Include/object.h:339 tp_traverse.
Generation lists
CPython uses three generations (young, old, permanent) plus a
freeze list. Each generation has a count; the collector runs
generation n when count[n] > threshold[n]. Defaults are 700 / 10 /
10. We mirror these and the same per-generation linked list layout.
CPython: Include/internal/pycore_gc.h:122 GCState,
Modules/gcmodule.c gc_collect_generations.
Cycle algorithm (port of gc_collect_main)
update_refs: for each object in the candidate generation, copy its refcount-equivalent intogc_refs.subtract_refs: for each object, walktp_traverseand decrementgc_refsfor every reference into the candidate set.- After the walk, any object whose
gc_refsis zero is unreachable unless it is referenced from a reachable object.move_unreachabledoes a second pass to find the closure of reachable objects and moves the rest to anunreachablelist. move_legacy_finalizersseparates objects that have legacy__del__finalizers; their dependencies are pulled back into reachable to preserve resurrection invariants.handle_weakrefsclears every weakref pointing into the unreachable set and queues their callbacks.finalize_garbagerunstp_finalizeon each unreachable object.- After finalizers, anything still unreachable is cleared via
tp_clearso Go's GC can reclaim the memory.
Refcounts: gopy doesn't keep one. gc_refs is computed fresh by
counting incoming edges from inside the candidate set. A node is
unreachable iff after subtract_refs its gc_refs is zero and no
visit_reachable walk pulls it back. This matches CPython's algorithm
without needing the runtime refcount field.
Weakrefs (objects/weakref.go)
CPython: Objects/weakrefobject.c ports verbatim to objects/weakref.go.
Three observable types: weakref.ref, weakref.proxy,
weakref.ReferenceType. Each tracked object has an optional pointer
to a singly-linked list of weakrefs in its header; the GC walks this
list during handle_weakrefs.
The list head goes on objects.Header as a new optional field. Types
that opt into weakref support set a flag in tp_flags. CPython:
Include/cpython/object.h:235 tp_weaklistoffset.
Callbacks: a weakref with a callback survives object death long enough
to fire the callback once on a clean Python frame. CPython queues
these on tstate.async_exc; we use a per-thread channel.
Module registration (gc/module.go)
Built-in gc module exposes:
| Python | Go | CPython entry |
|---|---|---|
gc.collect([gen]) | Collect(gen int) int | Modules/gcmodule.c gc_collect |
gc.enable() | Enable() | gc.c PyGC_Enable |
gc.disable() | Disable() | gc.c PyGC_Disable |
gc.isenabled() | IsEnabled() bool | gc.c PyGC_IsEnabled |
gc.get_threshold() | GetThreshold() (int,int,int) | gcmodule.c gc_get_threshold |
gc.set_threshold(...) | SetThreshold(int,int,int) | gcmodule.c gc_set_threshold |
gc.get_count() | GetCount() (int,int,int) | gcmodule.c gc_get_count |
gc.get_objects([gen]) | GetObjects(gen int) []Object | gc.c _PyGC_GetObjects |
gc.is_tracked(o) | IsTracked(o Object) bool | (already in gc.go) |
gc.freeze() / unfreeze() / get_freeze_count() | Freeze() / Unfreeze() / GetFreezeCount() int | gc.c _PyGC_Freeze |
gc.get_referrers(*objs) | GetReferrers(...Object) []Object | gc.c _PyGC_GetReferrers |
gc.get_referents(*objs) | GetReferents(...Object) []Object | gcmodule.c gc_get_referents |
Callbacks (gc.callbacks list) and stats (gc.set_debug, gc.get_stats)
land in v0.10 if they fit; otherwise v0.10.x.
v0.10 checklist (1613-A through Q)
- 1613-A
gc/state.go: GCState struct, three generation lists, thresholds, enabled flag, mu lock. - 1613-B
gc/list.go: gc_list_init / append / remove / move / merge / size / clear_collecting. - 1613-C
gc/objstack.go:_PyObjectStackhelper. - [~] 1613-D
gc/gil.go: collector entry/exit guards. CPython gc_gil.c reduces to_PyGC_ClearAllFreeListsagainst the per-type freelists; gopy has no freelists, so the port is a documented no-op (1613-O). - [~] 1613-E
Type.TpTraverseslot + per-type callbacks. Slot shipped onobjects/type.go; impls live for tuple, list, dict, odict, set, frozenset. Frame, generator, coroutine, async_gen, bound method, cell tracked separately as 1613-N. - 1613-F
gc/refs.go: update_refs, subtract_refs. - 1613-G
gc/reachable.go: move_unreachable, visit_reachable, clear_unreachable_mask, untrack_tuples. - 1613-H
objects/weakref.go: PyWeakref types, header slot, _PyWeakref_ClearWeakRefsExceptCallbacks. - 1613-I
gc/weakref.go: handle_weakrefs + callback drain. - [~] 1613-J
gc/finalize.go: finalize_garbage shipped via PEP 442Type.Finalize. Pre-PEP-442__del__andgc.garbagepopulation for resurrected cycles tracked separately as 1613-P. - 1613-K
gc/collector.go: gc_collect_main, gc_select_generation. - 1613-L
gc/module.go: 18 callables (collect, enable, disable, isenabled, get_threshold, set_threshold, get_count, is_tracked, get_objects, get_referrers, get_referents, freeze, unfreeze, get_freeze_count, set_debug, get_debug, get_stats, is_finalized) registered via inittab. - 1613-M
gc/cycle_test.goplus the per-component test files.
Beyond the original A through M, these landed during v0.10.x:
- set_debug / get_debug with DEBUG_* flag constants.
- gc.callbacks invoked before and after collection.
- gc.get_stats per-generation counters.
- gc.is_finalized lookup.
- weakref.proxy + CallableProxyType.
- Type.Finalize wired through tp_finalize and the GC table.
- gc.get_referrers / gc.get_referents.
- gc.freeze / unfreeze / get_freeze_count.
Remaining work after v0.10.x:
- 1613-N
objects/Type.TpTraverseimpls for frame, generator, coroutine, async_generator (+ ASend / AThrow), bound method, cell. Mirrors Objects/frameobject.c frame_traverse, Objects/genobject.c gen_traverse, Objects/cellobject.c cell_traverse, Objects/classobject.c method_traverse. Note that Generator / Coroutine / AsyncGenerator carry no Object fields directly (the suspended frame lives on the body goroutine), so only the wrapper types (coro_wrapper, async_gen_asend, async_gen_athrow) get traverse impls. - 1613-O
gc/gil.go: stubClearAllFreeListsdocumenting the no-op parity with gc_gil.c so future readers know why the collector skips the call. - 1613-P populate
gc.garbagewhen a finalizer resurrects an object or when a cycle stays alive after finalize_garbage; honor DEBUG_SAVEALL by appending collectable objects too. Resurrection detection ingc/finalize.go:handleResurrectedre-runs deduce on the post-finalize list and pulls survivors back to the destination generation; DEBUG_SAVEALL appends the still-dead set to the shared gc.garbage list (gc/collector.go:appendGarbage). Mirrors gc.c:1261 handle_resurrected_objects + gc.c:1142 delete_garbage SAVEALL branch. - 1613-Q
weakref/package: ports the Lib/_weakrefset.py and Lib/weakref.py collection classes (WeakSet, WeakValueDictionary, WeakKeyDictionary) as Go types layered on objects.Weakref. Each container's_removecallback is an objects.BuiltinFunction whose closure deletes the dead weakref's entry; gc.RegisterWeakref wires the callback to the cycle collector's handle_weakrefs path. Outstanding follow-up (1613-S): the closure captures the container strongly, leaking empty containers until items die; mirroring CPython'sselfref = ref(self)trick needs a TpTraverse-aware closure shape.
v0.10 release gate
gc.collect() reclaims a deliberately constructed reference cycle.
weakref.ref(target) returns None after the target is collected.
A class with __del__ is finalized exactly once when its instance
becomes unreachable through a cycle. The CPython test_gc smoke
fixtures (subset; full suite needs the type system port) pass.
Out of scope for v0.10 (deferred to v0.10.x or v0.11)
- Free-threaded build's separate collector path (
gc_free_threading.c). gc.callbacksuser hooks.gc.set_debugdebug flags.tracemallocintegration (own spec at 1666).- Biased reference counting (own spec at 1614, post v0.10).
Open design questions
- gc_refs without refcounts: confirmed approach is to count
in-edges from the candidate set rather than copy
ob_refcnt. The per-objectgc_refsfield lives onobjects.Headeronly during a collection pass; cleared on entry, scratch on exit. Memory cost: 8 bytes per tracked object during collection only (zero between). - Weakref backref: do we put the weaklist head on every
objects.Header, or only on types that opt in viatp_flags? Decision: opt-in via flag, matching CPython'stp_weaklistoffsetmodel. Saves one word per non-weakreffable object. - Finalizer queue under goroutine yield:
tp_finalizeruns Python code that may yield to other goroutines; we must hold the collection lock across the queue drain to avoid a second collector pass observing partially-finalized objects. Detail in 1613-J.