1607. gopy hash secret spec
bootstrap_hash.c does two things:
- Provides the
_PyOS_URandom*family that fills a buffer with OS entropy (Windows BCryptGenRandom, Linux getrandom, /dev/urandom, etc.). - Initializes
_Py_HashSecret, the per-process secret used by SipHash to randomize hash output. The initialization respects thePYTHONHASHSEEDenvironment variable: a positive integer gives a deterministic LCG-derived secret;0zeroes the secret (so hashes are not randomized, useful for debugging); the literal string"random"uses OS entropy; absence of the variable also uses OS entropy.
In v0.1 we port only the secret-init portion. The hashing functions
themselves (SipHash-1-3, FNV, x86_aes acceleration) land in v0.4
together with pyhash.c; until then, no code reads the secret.
What we drop and why
_PyOS_URandom,_PyOS_URandomNonblock. Go'scrypto/rand.Readerperforms the same job and works on every supported OS (Windows, macOS, Linux, all the BSDs). We use it directly. The non-blocking variant is unnecessary because Go's reader already usesgetrandom(GRND_NONBLOCK)where the kernel allows it.- The Linux/macOS fallback paths to
/dev/urandom.crypto/randtakes care of that fallback. dev_urandom_close(called from_Py_HashRandomization_Fini). No file descriptor to close._Py_HashSecret_Initializedglobal flag. We use async.Once.
What we port
The lcg_urandom LCG is a numerical match for the C implementation
and must produce byte-identical output for a given seed; CPython tests
that run with a fixed PYTHONHASHSEED rely on it. We port it
verbatim.
x = x0;
for (i = 0; i < size; i++) {
x = x * 214013 + 2531011;
out[i] = (x >> 16) & 0xff;
}
Go API
package hash
// SecretSize is the size of the hash secret. Matches
// sizeof(_Py_HashSecret_t) in CPython 3.14: 24 bytes covering the
// SipHash key (16) plus FNV salt (8).
const SecretSize = 24
// Secret is the byte vector consumed by SipHash and FNV in v0.4.
// Until then it is filled but not read.
var Secret [SecretSize]byte
// SecretMode classifies how Init was resolved.
type SecretMode int
const (
SecretRandom SecretMode = iota // OS entropy
SecretZeroed // PYTHONHASHSEED=0
SecretSeeded // PYTHONHASHSEED=<positive int>
)
// Init seeds Secret. cfg is the resolved configuration; if nil, the
// PYTHONHASHSEED environment variable is consulted. Init is safe to
// call from multiple goroutines; only the first call performs work.
//
// Returns the resolved mode and an error if PYTHONHASHSEED is set
// to a value that is neither "random", "0", nor a non-negative
// integer in [1, 4294967295].
func Init(cfg *Config) (SecretMode, error)
// Config mirrors the relevant fields of PyConfig consumed by
// _Py_HashRandomization_Init. The full PyConfig lives in
// initconfig (v0.7).
type Config struct {
UseHashSeed bool // true if hash_seed is set explicitly
HashSeed uint32 // value when UseHashSeed is true
}
// Reset is exposed for tests. It clears the once-flag so a follow-up
// Init runs again. Production code should not call it.
func Reset()
PYTHONHASHSEED parsing
Matches CPython's config_init_hash_seed (in initconfig.c, ported in
v0.7) but until that file lands we handle parsing here so the secret
can be seeded at process start by cmd/gopy. The parser accepts:
| Input | Resolved |
|---|---|
unset / "" | random (OS entropy) |
"random" | random |
"0" | zeroed secret |
decimal 1..4294967295 | seeded with that value via lcg_urandom |
| anything else | error |
Tests
hash/secret_test.go:
Initwithcfg.UseHashSeed=falseproduces a non-zeroSecret(probabilistic; the chance of a 24-byte zero from crypto/rand is vanishing).InitwithHashSeed=0produces a zeroedSecret.Initwith a fixedHashSeedproduces deterministic output that matches a hand-computed LCG run.- The LCG matches a precomputed reference for seed
0xdeadbeef, generated by running the C implementation once and pasting the bytes into the test. (We compute the expected sequence in Go and document the C origin.) Initis idempotent: a second call with a different config does not changeSecret.
Cross-runtime parity
Once the hash port (1640 in the next spec batch) lands in v0.4, we
add a compat/hash test that hashes a small corpus of strings under
PYTHONHASHSEED=0 and asserts byte-equality with CPython's output.
Until then, parity is assumed via the deterministic LCG.