Skip to main content

1607. gopy hash secret spec

bootstrap_hash.c does two things:

  1. Provides the _PyOS_URandom* family that fills a buffer with OS entropy (Windows BCryptGenRandom, Linux getrandom, /dev/urandom, etc.).
  2. Initializes _Py_HashSecret, the per-process secret used by SipHash to randomize hash output. The initialization respects the PYTHONHASHSEED environment variable: a positive integer gives a deterministic LCG-derived secret; 0 zeroes the secret (so hashes are not randomized, useful for debugging); the literal string "random" uses OS entropy; absence of the variable also uses OS entropy.

In v0.1 we port only the secret-init portion. The hashing functions themselves (SipHash-1-3, FNV, x86_aes acceleration) land in v0.4 together with pyhash.c; until then, no code reads the secret.

What we drop and why

  • _PyOS_URandom, _PyOS_URandomNonblock. Go's crypto/rand.Reader performs the same job and works on every supported OS (Windows, macOS, Linux, all the BSDs). We use it directly. The non-blocking variant is unnecessary because Go's reader already uses getrandom(GRND_NONBLOCK) where the kernel allows it.
  • The Linux/macOS fallback paths to /dev/urandom. crypto/rand takes care of that fallback.
  • dev_urandom_close (called from _Py_HashRandomization_Fini). No file descriptor to close.
  • _Py_HashSecret_Initialized global flag. We use a sync.Once.

What we port

The lcg_urandom LCG is a numerical match for the C implementation and must produce byte-identical output for a given seed; CPython tests that run with a fixed PYTHONHASHSEED rely on it. We port it verbatim.

x = x0;
for (i = 0; i < size; i++) {
x = x * 214013 + 2531011;
out[i] = (x >> 16) & 0xff;
}

Go API

package hash

// SecretSize is the size of the hash secret. Matches
// sizeof(_Py_HashSecret_t) in CPython 3.14: 24 bytes covering the
// SipHash key (16) plus FNV salt (8).
const SecretSize = 24

// Secret is the byte vector consumed by SipHash and FNV in v0.4.
// Until then it is filled but not read.
var Secret [SecretSize]byte

// SecretMode classifies how Init was resolved.
type SecretMode int

const (
SecretRandom SecretMode = iota // OS entropy
SecretZeroed // PYTHONHASHSEED=0
SecretSeeded // PYTHONHASHSEED=<positive int>
)

// Init seeds Secret. cfg is the resolved configuration; if nil, the
// PYTHONHASHSEED environment variable is consulted. Init is safe to
// call from multiple goroutines; only the first call performs work.
//
// Returns the resolved mode and an error if PYTHONHASHSEED is set
// to a value that is neither "random", "0", nor a non-negative
// integer in [1, 4294967295].
func Init(cfg *Config) (SecretMode, error)

// Config mirrors the relevant fields of PyConfig consumed by
// _Py_HashRandomization_Init. The full PyConfig lives in
// initconfig (v0.7).
type Config struct {
UseHashSeed bool // true if hash_seed is set explicitly
HashSeed uint32 // value when UseHashSeed is true
}

// Reset is exposed for tests. It clears the once-flag so a follow-up
// Init runs again. Production code should not call it.
func Reset()

PYTHONHASHSEED parsing

Matches CPython's config_init_hash_seed (in initconfig.c, ported in v0.7) but until that file lands we handle parsing here so the secret can be seeded at process start by cmd/gopy. The parser accepts:

InputResolved
unset / ""random (OS entropy)
"random"random
"0"zeroed secret
decimal 1..4294967295seeded with that value via lcg_urandom
anything elseerror

Tests

hash/secret_test.go:

  • Init with cfg.UseHashSeed=false produces a non-zero Secret (probabilistic; the chance of a 24-byte zero from crypto/rand is vanishing).
  • Init with HashSeed=0 produces a zeroed Secret.
  • Init with a fixed HashSeed produces deterministic output that matches a hand-computed LCG run.
  • The LCG matches a precomputed reference for seed 0xdeadbeef, generated by running the C implementation once and pasting the bytes into the test. (We compute the expected sequence in Go and document the C origin.)
  • Init is idempotent: a second call with a different config does not change Secret.

Cross-runtime parity

Once the hash port (1640 in the next spec batch) lands in v0.4, we add a compat/hash test that hashes a small corpus of strings under PYTHONHASHSEED=0 and asserts byte-equality with CPython's output. Until then, parity is assumed via the deterministic LCG.