Skip to main content

1668. gopy runtime helpers

What we are porting

Two unrelated but small files bundled into one spec:

C sourceLinesGo target
Python/getopt.c163getopt/getopt.go
Python/hashtable.c424hashtable/hashtable.go

Both are pre-existing infrastructure that other v0.9 callers depend on but neither warranted their own multi-page spec. They share this file because the alternative is two near-empty specs.

getopt

Python/getopt.c is not GNU getopt; CPython ships its own to avoid locale and extern int optind issues. The interpreter init code (Modules/main.cpylifecycle.c) calls _PyOS_GetOpt on the command-line argv to parse the -c, -m, -O, -W, etc. flags. Three module-globals back the iteration:

  • _PyOS_opterr (error reporting on/off).
  • _PyOS_optind (current index).
  • _PyOS_optarg (arg for option taking a value).

What gopy already does

gopy currently has its own ad-hoc CLI parsing in cmd/gopy/main.go. v0.9 replaces that with the port so flags and error messages match CPython byte-for-byte. The user-facing gopy --help text is whatever Python/initconfig.c prints today.

Go shape

package getopt

// State is per-call (no globals). _PyOS_GetOpt's three module globals
// collapse into one struct that the caller threads through.
type State struct {
OptInd int
OptArg string
OptErr bool // print errors? defaults to true
}

func New() *State

// LongOption mirrors the static `longopts` table in initconfig.c.
type LongOption struct {
Name string
HasArg bool
Flag *int // nil means return Val
Val int
}

// GetOpt parses one option from argv starting at s.OptInd. Returns
// the option char or 0 on long-option-with-flag, or -1 at end of
// args. The integer matches `option->val`.
func (s *State) GetOpt(argv []string, shortOpts string, longOpts []LongOption) (int, error)

The Go signature drops the wide-char (wchar_t) argv because gopy already converts os.Args to UTF-8 strings during preconfig.

Error parity

Two messages are pinned verbatim:

  • gopy: option requires an argument -- '%s'
  • gopy: unrecognized option '%s'

CPython prints python rather than gopy; the substitution happens through Py_GetProgramName() which already returns "gopy" (1622).

hashtable

_Py_hashtable_t is a generic open-addressing hash table used inside the runtime: tracemalloc, type-cache invalidation, the freelist diagnostic counters, and the .pyc rdep tracker. It is not the Python dict; it has no PyObject parts.

Surface (verbatim translation):

package hashtable

type HashFunc func(key any) uint64
type CompareFunc func(a, b any) bool
type DestroyFunc func(value any)

type Entry struct {
Key any
Value any
Hash uint64
}

type Table struct {
nentries int
nbuckets int
buckets []*Entry
hash HashFunc
cmp CompareFunc
keyDtor DestroyFunc
valDtor DestroyFunc
alloc Allocator // matches PyMem_RawMalloc family
}

func New(hash HashFunc, cmp CompareFunc) *Table
func NewFull(hash HashFunc, cmp CompareFunc, keyDtor, valDtor DestroyFunc, alloc Allocator) *Table

func (t *Table) Get(key any) (any, bool)
func (t *Table) Set(key, val any) error
func (t *Table) Steal(key any) (any, bool) // matches _Py_hashtable_steal
func (t *Table) Foreach(fn func(*Entry) error) error
func (t *Table) Clear()
func (t *Table) Len() int
func (t *Table) Size() int // memory footprint, for tracemalloc
func (t *Table) Destroy()

The implementation follows hashtable.c byte-for-byte:

  • Initial bucket count: 16 (matches HASHTABLE_INITIAL_SIZE).
  • Resize on load factor > 75% (matches HASHTABLE_HIGH).
  • Linear probing on collisions.
  • _Py_hashtable_hash_ptr (pointer-as-key) maps to a Go helper that uses unsafe.Pointer arithmetic; we keep this because callers in the GC and tracemalloc code rely on the exact distribution.

Why not a plain Go map?

Two reasons:

  1. CPython callers want stable iteration order (Foreach) and a Steal operation that returns the value while removing the entry without invoking the destructor. Go's builtin map has neither.
  2. Some callers pre-size the table from a known entry count to avoid rehash storms during startup. The runtime startup path expects this knob.

For most user-facing code a Go map[K]V is the right tool; this package exists specifically for the runtime infra that mirrors CPython.

Gate

getopt/getopt_test.go:

  • Parse -c "print(1)" foo. Assert OptInd and OptArg match CPython's documented behaviour for that argv.
  • Parse --help. Assert long-option dispatch.
  • Parse -W ignore::DeprecationWarning. Assert combined short+arg.

hashtable/hashtable_test.go:

  • Round-trip insert / get / steal.
  • Resize: insert 100 entries, assert no rehash error and final bucket count matches _Py_hashtable_size for that size.
  • Foreach visits every entry exactly once.
  • Destroy calls the value destructor for each entry.

Out of scope

  • _PyOS_GetOpt long-option flag pointer mode. CPython sets a separate variable when flag is non-NULL; we keep the field but every gopy caller passes nil (matches CPython's actual usage).
  • _Py_hashtable_get_entry_generic vs _Py_hashtable_get_entry_ptr specialisation. Go's interface dispatch makes the distinction invisible at the call site; we keep one entry path.

v0.9 checklist

Files

  • getopt/getopt.go: State, LongOption, GetOpt. Shipped in commit 921e618.
  • getopt/getopt_test.go: gate panel. 13 tests covering -c, --help, --help-all, -W, clustered, glued, --, lone dash, --version, --check-hash-based-pycs.
  • cmd/gopy/main.go: switched from flag to getopt.GetOpt, walking PythonShortOpts / PythonLongOpts. --copyright dropped (CPython has no such CLI flag; the copyright() builtin remains the user-facing surface). Test panel updated; build clean.
  • hashtable/hashtable.go: Table type and methods. Shipped in commit 8aba0f6.
  • hashtable/hashtable_test.go: gate panel. 9 tests.

Surface guarantees

  • CPython's two error strings reproduced verbatim. Pinned by TestUnknownShortOption, TestMissingArgument, TestLongOptionUnknown. (Note: actual CPython strings are Unknown option: -%c and Argument expected for the -%c option, not the gopy: option requires... shape this spec drafted; the port follows CPython verbatim per the port-not-patch rule.)
  • Hashtable load factor 50% (matches CPython HASHTABLE_HIGH, not the 75% this spec drafted) and initial size 16, pinned by TestGrowthCurve (100 entries → 256 buckets, load ≤ 0.50).