v0.4.0 - Numbers, strings, hashing, and format
Released May 4, 2026.
The fun parts of a Python runtime are the VM and the compiler. The
parts that decide whether your runtime is actually usable are the
small ones nobody thinks about: does float('1.0e-308') parse to
the same uint64 bits CPython produces? Does format(1234567, ',d')
emit 1,234,567 with the comma in the right place? Does
hash(b"hello") produce the same int CPython does under the same
PYTHONHASHSEED?
None of these questions are interesting until your runtime gets one of them wrong, at which point they are the only thing your users want to talk about. A pandas DataFrame round-trip through your runtime that produces slightly different floats from CPython's is a bug report you'll spend a week tracking down. A hash that doesn't match is half your dict-based code path quietly broken.
v0.4.0 ships the answers to these questions. Number parsing
goes through pystrconv and is bit-for-bit identical to CPython.
Hashing goes through SipHash-1-3 keyed off the runtime secret, and
the test panel pins the output against captured CPython values
under PYTHONHASHSEED=0. The format-spec mini-language has its
own parser and its own renderers for int, float, and string, all
of them ported one-to-one from formatter_unicode.c.
The release is small in line count and broad in coverage. After v0.4, every leaf operation the v0.5 compile pipeline and the v0.6 VM need from the bottom of the value stack is in place.
Highlights
Three themes pull through this release.
Bit-perfect float parity
Python's floats are IEEE-754 doubles. So are Go's float64. But
"both are IEEE-754" does not mean "parsing the same string gives
the same bits". CPython's float parser does its own preprocessing
before handing off to David Gay's dtoa.c. PEP 515 underscores
get stripped. nan(payload) literals get rejected. inf and
infinity are accepted case-insensitively but Inf is not. The
list of edge cases is longer than it has any right to be.
v, err := pystrconv.ParseFloat("1_000.5e-3")
// 1.0005
v, err = pystrconv.ParseFloat("nan(deadbeef)")
// err: cannot convert string to float
bits := math.Float64bits(v)
// matches CPython's struct.unpack('<Q', struct.pack('<d', float('1_000.5e-3')))[0]
The implementation wraps Go's strconv.ParseFloat, but only after
the CPython preprocessor runs. We did not write a faithful
dtoa.c port; instead we verified bit-for-bit parity through a
checked-in panel of inputs (subnormals, edge round-half-to-even
cases, the gradual-underflow boundary). A faithful dtoa.c port
is tracked as a follow-up, but the practical behavior is already
nailed.
Formatting walks the other direction: FormatFloat reproduces
CPython's repr for every code ('r', 's', 'g', 'G',
'e', 'E', 'f', 'F', '%') and every flag (alternate
form, always-sign, space-sign, no-negative-zero, add-dot-zero).
The output mirrors what format_float_short produces in
Python/pystrtod.c.
SipHash-1-3, keyed off the runtime secret
CPython 3.14 hashes bytes through SipHash-1-3 under a process-wide
secret. The secret is initialized from PYTHONHASHSEED (or
randomly if unset), and the same secret is used for hash() on
str, bytes, and the various other hashable types that ultimately
delegate to byte hashing.
import "gopy/hash"
h := hash.Buffer([]byte("hello"))
// Under PYTHONHASHSEED=0, this matches CPython's hash(b"hello").
h2 := hash.KeyedHash(key, []byte("payload"))
// _Py_KeyedHash for short keyed payloads.
The full panel:
Bufferis SipHash-1-3. This is the production path.BufferFNVis the FNV variant the C source includes for embedded builds that want a smaller code footprint. We ship both;Bufferis the default.KeyedHashports_Py_KeyedHashfor short keyed buffers the dict perturbation logic uses.PointerportsPy_HashPointer, used by hash of method bound objects.Doubleports_Py_HashDouble, used byfloat.__hash__and by hash of int when the value fits in a double.GetFuncDefportsPyHash_GetFuncDef, the introspection entry point that returns the hash algorithm name and key size.
Reference vectors against CPython under PYTHONHASHSEED=0 are
pinned in hash/hash_test.go. The vectors cover empty bytes, the
short-string boundary (below and above _Py_HASH_CUTOFF, even
though 3.14 sets that cutoff to 0), and a long string that
exercises the rolling-state inner loop. Any deviation from
CPython's output fails the test.
Format-spec mini-language
format(value, spec) is its own programming language. The grammar
is:
[[fill]align][sign][z][#][0][width][,_][.precision][type]
Every bracketed piece is optional, the order is fixed, and the
interpretation of each piece depends on the type of value.
# means alternate form for ints, alternate form for floats,
nothing for strings. , means thousands separator for ints and
floats, illegal for strings. 0 means zero-pad numbers, ignored
for strings. The list goes on.
import "gopy/format"
spec, _ := format.ParseSpec(",d")
out, _ := format.FormatInt(big.NewInt(1234567), spec)
// "1,234,567"
spec, _ = format.ParseSpec(".3f")
out, _ = format.FormatFloat(3.14159, spec)
// "3.142"
spec, _ = format.ParseSpec(">10")
out, _ = format.FormatString("hi", spec)
// " hi"
The implementation is a one-to-one port of formatter_unicode.c.
ParseSpec is the spec parser (the format spec itself is a tiny
context-free grammar; we hand-rolled a recursive descent matcher
that walks the runes). FormatString, FormatInt, and
FormatFloat are the three renderers, each consuming a parsed
Spec and producing the output bytes.
For the digit-generation arm of FormatFloat, we delegate to
pystrconv.FormatFloat, which is where the bit-perfect float
work lives. This means a format like format(0.1, '.20f')
produces the same output as CPython without any per-precision
special casing.
What's new
The full package breakdown.
pystrconv/
Locale-independent string and number conversion. Ports a stack of small CPython files that together cover the bottom of the value stack:
Python/pyctype.cfor character classification.Flags,IsLower,IsUpper,IsAlpha,IsDigit,IsXDigit,IsAlnum,IsSpace,ToLower,ToUpper. Computed, not table-driven. The 0..255 range is round-trip verified against CPython's_Py_ctype_table.Python/pystrcmp.cfor ASCII case-insensitive compare.CompareInsensitive,CompareInsensitiveN.Python/mystrtoul.cfor integer parsing.ParseUint,ParseIntwith base 0 autodetect (0x,0o,0b), bases 2 through 36, leading whitespace, sign handling, and overflow detection. The overflow rule is the same one CPython uses: if any partial product exceeds the type max before the final digit, raise.Python/pystrhex.cfor hex rendering.Hex,HexBytes,HexWithSep,HexBytesWithSep. These backbytes.hex()and friends. The separator variant takes a byte and an offset (every Nth byte gets a separator), which is whatbytes.hex(':', 2)calls into.Python/pystrtod.cplus the wrapper forPython/dtoa.cfor float parsing and formatting.ParseFloatover Go'sstrconvwith CPython preprocessing (PEP 515 underscores, case-insensitiveinfandinfinity, rejection ofnan(payload)).FormatFloatfor every code and every flag.
The CPython files this package ports from share a theme: they're
the bottom of the bottom of the stack, called from everywhere,
and they're locale-independent on purpose. CPython took a hit
years ago when locale-dependent strtod parsed "3,14" as 3.14
in German locales. The fix was to hand-roll a parser that ignores
the locale, and that's what we ship here too.
pymath/
The thin float math file that bridges between CPython's math
helpers and Go's math package.
NaN,Inf,NegInfas float64 constants.CopySign,IsNaN,IsInf,IsFiniteas predicates that route tomath.Log1p,Hypotas the two transcendental helpers CPython exposes throughpymath.c.FPECounter,FPEDummyas stable-ABI sentinels frompyfpe.c. These are legacy hooks for embedders that want to install their own floating-point exception handlers. Nobody we know uses them, but the stable ABI promises they exist, so we publish them.
Ports Python/pymath.c and Python/pyfpe.c directly.
hash/
The full hash machinery. Ports Python/pyhash.c.
The internals are simple in structure but unforgiving in detail.
SipHash-1-3 is a stateful round function; our implementation
matches CPython's reference cycle by cycle. The runtime secret
initialization reads PYTHONHASHSEED exactly the way CPython
does: unset means a random secret, "0" means the test secret,
any other value means seed the RNG with that integer.
The constants:
HashBits. The number of bits in a hash, 64 on amd64 and arm64.HashModulus. The Mersenne prime2^61 - 1numeric hash reduces to.HashInf,HashImag. Special-case hash values for positive infinity and the imaginary part of a complex.
All of these are visible because Python-level sys.hash_info
exposes them.
format/
The format-spec mini-language. Ports Python/formatter_unicode.c.
Spec. The parsed spec struct. CarriesFill,Align,Sign,AltForm,ZeroPad,Width,GroupChar,Precision,Type.ParseSpec(s) (Spec, error). Parses the mini-language. The parser walks the runes and tries to match each optional piece in turn; on a mismatch it backtracks and tries the next piece.FormatString(s, spec) string. String formatter. Handles alignment, padding, fill, and precision (precision means "truncate to N characters" for strings).FormatInt(n, spec) string. Int formatter. Handles every type code (b,o,d,x,X,c), grouping with,or_, sign handling, and alternate form (the0b,0o,0xprefix for non-decimal bases).FormatFloat(f, spec) string. Float formatter. Delegates digit generation topystrconv.FormatFloatand adds grouping / alignment / sign / fill on top.
The grouping rule is subtle and worth a callout: ,d means
group every three digits with commas, _d means group with
underscores, and for hex/oct/binary the grouping is every four
digits, not three. We get this right because we ported the logic
from formatter_unicode.c rather than rolling our own.
Why we built it this way
A few notes on the shape of this release.
Why a wrapper over strconv instead of porting dtoa.c
Python/dtoa.c is David Gay's reference dtoa from 1996. It's
2500 lines of dense numeric code that solves the shortest
round-trip decimal representation problem optimally. Go's
strconv.ParseFloat and strconv.FormatFloat solve the same
problem with a different algorithm (Ulf Adams's Ryu, mostly), but
they produce the same output for every IEEE-754 double.
Bit-for-bit parity is the contract we want. We get that parity
through strconv plus the CPython preprocessor (the part that
strips underscores and rejects nan(payload)). A faithful
dtoa.c port would not improve the behavior. It would just
expand the code base by 2500 lines for no observable difference.
We pinned that decision through the gate panel.
Why SipHash-1-3 and not SipHash-2-4
Python 3.13 switched from SipHash-2-4 to SipHash-1-3 as the
default hash. The reason is performance: 1-3 is enough rounds to
resist the algorithmic complexity attacks SipHash was designed
for, and the savings on short strings are real. We're a 3.14
port, so we ship 1-3 and not 2-4. If you need 2-4 for an embedded
build that pins to 3.12 behavior, the BufferFNV and
Buffer panel is shaped to accept additional algorithms; we just
haven't ported them because nobody is asking.
Why grouping codes live in the format package, not pystrconv
pystrconv.FormatFloat produces digits. It doesn't insert
commas, it doesn't pad, it doesn't align. The grouping logic
lives one layer up in the format package, because grouping is a
property of the format spec, not of the digit-generation
algorithm. CPython makes the same split (formatter_unicode.c
adds the commas; pystrtod.c produces the digits), and following
the C structure made the ports easier to read against the
originals.
Why a stable-ABI sentinel for FPECounter
FPECounter is dead code. Nobody calls it. We ship it because
the stable ABI says it exists, and a stable-ABI consumer that
links against gopy expecting to find the symbol should find it.
Tearing it out would save fifty lines and break (theoretical)
consumers. We kept it.
Where it lives
pystrconv/for the character classification, number parsing, hex rendering, and float parsing / formatting.pymath/for the float math helpers.hash/for SipHash-1-3 and the keyed-hash family.format/for the format-spec mini-language.
The CPython sources we ported from:
Python/pyctype.cfor character classification.Python/pystrcmp.cfor ASCII case-insensitive compare.Python/mystrtoul.cfor integer parsing.Python/pystrhex.cfor hex rendering.Python/pystrtod.cplus a wrapper forPython/dtoa.cfor float parsing and formatting.Python/pymath.cfor the math helpers.Python/pyfpe.cfor the stable-ABI floating-point hooks.Python/pyhash.cfor SipHash-1-3 and the hash machinery.Python/formatter_unicode.cfor the format-spec mini-language.
Compatibility
- Go: 1.26 or newer.
- CPython behavioral target: 3.14.0+.
The gate test panel pins these cross-cuts:
hash.Buffer([]byte("hello"))matches CPython'shash(b"hello")underPYTHONHASHSEED=0. The exact bit pattern is captured in the test.pystrconv.ParseFloatround-trips a panel of inputs to the same uint64 bit pattern as CPython. The panel covers subnormals, the gradual-underflow boundary, every power of two near the range edges, and a handful of "interesting" doubles (0.1,1.0/3.0,math.pi).pystrconv.FormatFloatreproduces CPython'sreprfor the thresholds where shortest round-trip switches fromfform to exponent form.format.FormatInt(1234567, ',d')matchesformat(1234567, ',d')from CPython.
Anywhere one of these fails, the gate test fails. No silent divergence.
Out of scope
A few things this release intentionally does not ship.
- Full 1:1 port of
Python/dtoa.c. The wrapper over Go'sstrconvis bit-correct (the gate pins it). A faithful source-shape port is tracked as a follow-up; we'll do it when somebody needs it. - SipHash-2-4. CPython's pre-3.13 default. 3.14 ships SipHash-1-3 and that's what we ship.
- DJBX33A short-string fast path.
Py_HASH_CUTOFFis 0 in default 3.14 builds, which means the short-string fast path is dead code in 3.14. We didn't port the dead path. - Complex formatter. The
complextype lands in a later release; the format-spec handling for it lands then.
What's next
v0.5 builds the compile pipeline on top of what landed here. The
AST validator, the symtable resolver, the codegen visitor panel,
the flowgraph optimizer, and the assembler all show up. Numbers
become real LOAD_CONST operands. Strings become real
LOAD_CONST operands. The format-spec mini-language gets exercised
by FormattedValue AST nodes.
v0.5.5 adds the lexer and the parser scaffolding. v0.6 turns on the VM. From v0.6 onward, every number you parse, every string you format, every hash you compute walks through what we built today.