Skip to main content

Objects/bytesobject.c

cpython 3.14 @ ab2d84fe1023/Objects/bytesobject.c

The bytes type. A PyBytesObject is an immutable sequence of bytes allocated as a single contiguous block: the PyBytesObject header followed immediately by ob_val[], a flexible array that always carries a NUL terminator so the buffer is also a valid C string. The ob_size field (from PyVarObject) is the logical length excluding that NUL. The hash is stored in ob_shash and initialized to -1; bytes_hash computes and caches it on first access.

Two caches eliminate allocations for the most common cases. bytes_empty is the singleton for the empty bytes value b"". A characters[256] array holds one object for each single-byte value 0-255, mirroring the small-integer cache for int. PyBytes_FromStringAndSize(p, 1) returns the cached entry for p[0] without touching the allocator.

The file also contains bytes_concat (the + operator with a fast path for non-shared left operands), bytes_richcompare (memcmp-based), bytes_repr (ASCII printable pass-through plus octal escapes for everything else), bytes_decode (delegates to PyUnicode_Decode), and the full method table ending with PyBytes_Type.

Map

LinesSymbolRolegopy
1-100PyBytesObject struct, bytes_empty, characters[256]Object layout and the two allocation caches.objects/bytes.go
100-500PyBytes_FromStringAndSize, PyBytes_FromString, _PyBytes_FromSizeAllocation entry points; single-byte cache lookup.objects/bytes.go:NewBytes
500-900bytes_hashSipHash-1-3; cache in ob_shash; immortal-object stable hash.objects/bytes.go:(*Bytes).Hash
900-1300bytes_concat, bytes_repeat+ and * operators; in-place fast path when refcount == 1.objects/bytes.go:(*Bytes).Concat
1300-1700bytes_richcomparememcmp for equal length; length comparison otherwise.objects/bytes.go:(*Bytes).RichCompare
1700-2200bytes_repr, bytes_strEscaping: printable ASCII, \\, \', \n, \r, \t, \xNN.objects/bytes.go:(*Bytes).Repr
2200-2800bytes_decode, bytes_join, bytes_split, bytes_rsplitDecode to str and sequence methods.objects/bytes.go
2800-3400bytes_find, bytes_count, bytes_replace, bytes_startswith, bytes_endswithSearch and mutation methods.objects/bytes.go
3400-3736bytes_new, method table, PyBytes_Typetp_new, method table, type object definition.objects/bytes.go:BytesType

Reading

Single-byte cache (lines 1 to 500)

cpython 3.14 @ ab2d84fe1023/Objects/bytesobject.c#L1-500

The characters array is a module-level table:

static PyBytesObject *characters[256];
static PyBytesObject *bytes_empty;

PyBytes_FromStringAndSize checks for the two fast paths before calling the allocator:

PyObject *
PyBytes_FromStringAndSize(const char *str, Py_ssize_t size)
{
if (size == 0 && bytes_empty != NULL) {
return Py_NewRef(bytes_empty);
}
if (size == 1 && str != NULL) {
PyBytesObject *op = characters[(unsigned char)str[0]];
if (op != NULL)
return Py_NewRef(op);
}
/* General path: allocate and copy. */
PyBytesObject *op = (PyBytesObject *)
PyObject_Malloc(PyBytesObject_SIZE + size);
...
op->ob_shash = -1;
Py_SET_SIZE(op, size);
if (str != NULL)
memcpy(op->ob_val, str, size);
op->ob_val[size] = '\0';
...
}

bytes_empty is populated by _PyBytes_Init at interpreter startup. The characters cache is filled lazily on first allocation of each byte value and is never evicted. Both caches hold immortal references so the objects survive for the life of the interpreter.

Hash (lines 500 to 900)

cpython 3.14 @ ab2d84fe1023/Objects/bytesobject.c#L500-900

bytes_hash follows the same pattern as unicode_hash:

static Py_hash_t
bytes_hash(PyBytesObject *a)
{
if (a->ob_shash != -1)
return a->ob_shash;
Py_uhash_t x = _Py_HashBytes(a->ob_val, Py_SIZE(a));
if (x == (Py_uhash_t)-1)
x = 1520022418;
a->ob_shash = x;
return x;
}

_Py_HashBytes dispatches to SipHash-1-3 using the per-process secret initialized by Py_Initialize. The -1 sentinel avoids confusing a computed hash of 0xFFFF... with the "not yet computed" state; the replacement value 1520022418 is arbitrary but stable.

Concat (lines 900 to 1300)

cpython 3.14 @ ab2d84fe1023/Objects/bytesobject.c#L900-1300

bytes_concat implements the + operator. The fast path avoids a second allocation when the left operand is not shared:

static PyObject *
bytes_concat(PyObject *a, PyObject *b)
{
Py_buffer va, vb;
...
Py_ssize_t size = va.len + vb.len;
...
/* In-place resize when caller holds the only reference. */
if (Py_REFCNT(a) == 1 && PyBytes_CheckExact(a) &&
_PyBytes_Resize(&a, size) == 0)
{
memcpy(PyBytes_AS_STRING(a) + va.len, vb.buf, vb.len);
...
return a;
}
PyObject *result = PyBytes_FromStringAndSize(NULL, size);
if (result != NULL) {
memcpy(PyBytes_AS_STRING(result), va.buf, va.len);
memcpy(PyBytes_AS_STRING(result) + va.len, vb.buf, vb.len);
}
...
return result;
}

_PyBytes_Resize calls PyObject_Realloc in place; this works because bytes objects are allocated with the object allocator and do not have interior pointers. The new trailing NUL is written by _PyBytes_Resize.

Repr (lines 1700 to 2200)

cpython 3.14 @ ab2d84fe1023/Objects/bytesobject.c#L1700-2200

bytes_repr walks the buffer and classifies each byte. Printable ASCII (0x20-0x7e, excluding ' and \\) is copied verbatim. Special characters get named escapes: \\n, \\r, \\t. All other bytes are written as \\xNN using two uppercase hex digits. The quote character is chosen by scanning for ' and " occurrences in the buffer first — the repr uses whichever quote appears less, falling back to ' on a tie — so b"it's" is rendered as b"it's" rather than b'it\'s'.

/* Quote selection. */
quote = '\'';
if (memchr(op->ob_val, '\'', Py_SIZE(op)) &&
!memchr(op->ob_val, '"', Py_SIZE(op)))
quote = '"';

The output length is computed in a first pass so the result bytes object can be allocated at the right size before the second (writing) pass begins.