Skip to main content

Include/cpython/bytesobject.h

cpython 3.14 @ ab2d84fe1023/Include/cpython/bytesobject.h

This header exposes the CPython-internal bytes API that extension authors need but that is not part of the stable ABI. The most important piece is _PyBytesWriter, a stack-allocated buffer that grows to the heap only when the initial reservation overflows. It is the standard way CPython builds a bytes object without knowing the final length up front, used in codec encoders, the % formatter, escape decoding, and elsewhere.

Map

LinesSymbolRolegopy
~7PyBytesObjectConcrete struct layout (ob_shash + ob_val)objects/bytes.go Bytes
~18PyBytes_AS_STRINGMacro: unsafe pointer into ob_val without refcount checkobjects/bytes.go Bytes.Bytes()
~20PyBytes_GET_SIZEMacro: ob_size cast directly from PyVarObjectobjects/bytes.go Bytes.Len()
~25_PyBytes_ResizeIn-place realloc of a bytes object the caller uniquely ownsnot ported
~35_PyBytes_FormatExCore of bytes % args; printf-style with full format specnot ported
~40_PyBytes_DecodeEscapeDecode a raw byte string literal with backslash sequencesparser/string/decode.go
~45_PyBytesWriterStack-allocated incremental builder that spills to heapnot ported (uses strings.Builder)
~50_PyBytesWriter_InitZero-initialise a writernot ported
~52_PyBytesWriter_AllocReserve size bytes in the writer's buffernot ported
~54_PyBytesWriter_FinishFreeze the buffer into a PyBytesObjectnot ported
~56_PyBytesWriter_DeallocRelease the writer without producing an objectnot ported
~58_PyBytesWriter_PrepareEnsure at least size more bytes fit without reallocationnot ported

Reading

PyBytesObject layout and the AS_STRING / GET_SIZE macros

PyBytesObject stores its payload inline: the struct ends with char ob_val[1] and CPython over-allocates so ob_val[0..ob_size-1] holds the actual bytes. The two macros skip all safety checks:

// Include/cpython/bytesobject.h:18-20
#define PyBytes_AS_STRING(op) (((PyBytesObject *)(op))->ob_val)
#define PyBytes_GET_SIZE(op) (((PyBytesObject *)(op))->ob_base.ob_size)

gopy stores the payload as a Go []byte slice rather than an inline array, so the same access is just a field read:

// objects/bytes.go
type Bytes struct {
VarHeader
v []byte
}

func (b *Bytes) Bytes() []byte { return b.v }
func (b *Bytes) Len() int { return len(b.v) }

The trade-off is that Bytes() returns a shared slice instead of a raw pointer. Callers must not mutate it, matching the CPython rule that AS_STRING is read-only on a non-builder object.

_PyBytesWriter: incremental construction

CPython codec writers and the escape decoder avoid repeated realloc by using _PyBytesWriter as a scratch buffer that starts on the C stack:

// Objects/bytesobject.c (simplified)
_PyBytesWriter writer;
_PyBytesWriter_Init(&writer);
char *p = _PyBytesWriter_Alloc(&writer, estimated_size);
// ... fill p ...
return _PyBytesWriter_Finish(&writer, p);

When the initial estimate fits, no heap allocation happens at all. When it overflows, _PyBytesWriter_Prepare calls realloc and updates p. gopy achieves the same goal with strings.Builder, which uses a []byte slice that grows via append:

// objects/bytes_methods_wrap.go (example pattern)
var b strings.Builder
b.Grow(estimated)
// ... write to b ...
return NewBytesFromString(b.String()), nil

The difference is that strings.Builder always allocates on the heap; the stack optimisation is lost. This is acceptable for correctness but may matter in hot paths that produce many small bytes objects.

_PyBytes_DecodeEscape

_PyBytes_DecodeEscape processes a raw bytes literal, resolving \n, \t, \xNN, \0oo octal, and the \N{...} / \uXXXX / \UXXXXXXXX forms that are invalid in bytes literals (raising DeprecationWarning when they appear). gopy ports this in parser/string/decode.go:

// parser/string/decode.go:141
// CPython: Objects/bytesobject.c _PyBytes_DecodeEscape
func decodeBytesEscape(src []byte) ([]byte, error) { ... }

gopy mirror

CPython symbolGo identifierFile
PyBytesObjectBytesobjects/bytes.go
PyBytes_AS_STRINGBytes.Bytes()objects/bytes.go
PyBytes_GET_SIZEBytes.Len()objects/bytes.go
PyBytes_FromStringAndSizeNewBytesobjects/bytes.go
PyBytes_FromStringNewBytesFromStringobjects/bytes.go
bytes_reprbytesRepr (unexported)objects/bytes.go
bytes_hashbytesHash (unexported)objects/bytes.go
bytes_richcomparebytesRichCmp (unexported)objects/bytes.go
_PyBytes_DecodeEscapedecodeBytesEscapeparser/string/decode.go
_PyBytesWriter(not ported; uses strings.Builder)
_PyBytes_Resize(not ported)
_PyBytes_FormatEx(not ported)

CPython 3.14 changes

3.14 promotes _PyBytesWriter internals further into Include/internal/ and away from the semi-public cpython/ tier; extension authors who relied on it are expected to switch to PyBytes_FromStringAndSize + PyBytes_Concat. The _PyBytes_FormatEx signature gained an extra Py_ssize_t * out-parameter for the written length. PyBytes_AS_STRING and PyBytes_GET_SIZE remain stable macros in the cpython/ header because the stable ABI needs them for inline expansion.