Objects/bytesobject.c
cpython 3.14 @ ab2d84fe1023/Objects/bytesobject.c
The bytes type. A PyBytesObject is an immutable sequence of bytes allocated
as a single contiguous block: the PyBytesObject header followed immediately
by ob_val[], a flexible array that always carries a NUL terminator so the
buffer is also a valid C string. The ob_size field (from PyVarObject) is
the logical length excluding that NUL. The hash is stored in ob_shash and
initialized to -1; bytes_hash computes and caches it on first access.
Two caches eliminate allocations for the most common cases. bytes_empty is
the singleton for the empty bytes value b"". A characters[256] array holds
one object for each single-byte value 0-255, mirroring the small-integer cache
for int. PyBytes_FromStringAndSize(p, 1) returns the cached entry for
p[0] without touching the allocator.
The file also contains bytes_concat (the + operator with a fast path for
non-shared left operands), bytes_richcompare (memcmp-based), bytes_repr
(ASCII printable pass-through plus octal escapes for everything else),
bytes_decode (delegates to PyUnicode_Decode), and the full method table
ending with PyBytes_Type.
Map
| Lines | Symbol | Role | gopy |
|---|---|---|---|
| 1-100 | PyBytesObject struct, bytes_empty, characters[256] | Object layout and the two allocation caches. | objects/bytes.go |
| 100-500 | PyBytes_FromStringAndSize, PyBytes_FromString, _PyBytes_FromSize | Allocation entry points; single-byte cache lookup. | objects/bytes.go:NewBytes |
| 500-900 | bytes_hash | SipHash-1-3; cache in ob_shash; immortal-object stable hash. | objects/bytes.go:(*Bytes).Hash |
| 900-1300 | bytes_concat, bytes_repeat | + and * operators; in-place fast path when refcount == 1. | objects/bytes.go:(*Bytes).Concat |
| 1300-1700 | bytes_richcompare | memcmp for equal length; length comparison otherwise. | objects/bytes.go:(*Bytes).RichCompare |
| 1700-2200 | bytes_repr, bytes_str | Escaping: printable ASCII, \\, \', \n, \r, \t, \xNN. | objects/bytes.go:(*Bytes).Repr |
| 2200-2800 | bytes_decode, bytes_join, bytes_split, bytes_rsplit | Decode to str and sequence methods. | objects/bytes.go |
| 2800-3400 | bytes_find, bytes_count, bytes_replace, bytes_startswith, bytes_endswith | Search and mutation methods. | objects/bytes.go |
| 3400-3736 | bytes_new, method table, PyBytes_Type | tp_new, method table, type object definition. | objects/bytes.go:BytesType |
Reading
Single-byte cache (lines 1 to 500)
cpython 3.14 @ ab2d84fe1023/Objects/bytesobject.c#L1-500
The characters array is a module-level table:
static PyBytesObject *characters[256];
static PyBytesObject *bytes_empty;
PyBytes_FromStringAndSize checks for the two fast paths before calling the
allocator:
PyObject *
PyBytes_FromStringAndSize(const char *str, Py_ssize_t size)
{
if (size == 0 && bytes_empty != NULL) {
return Py_NewRef(bytes_empty);
}
if (size == 1 && str != NULL) {
PyBytesObject *op = characters[(unsigned char)str[0]];
if (op != NULL)
return Py_NewRef(op);
}
/* General path: allocate and copy. */
PyBytesObject *op = (PyBytesObject *)
PyObject_Malloc(PyBytesObject_SIZE + size);
...
op->ob_shash = -1;
Py_SET_SIZE(op, size);
if (str != NULL)
memcpy(op->ob_val, str, size);
op->ob_val[size] = '\0';
...
}
bytes_empty is populated by _PyBytes_Init at interpreter startup. The
characters cache is filled lazily on first allocation of each byte value and
is never evicted. Both caches hold immortal references so the objects survive
for the life of the interpreter.
Hash (lines 500 to 900)
cpython 3.14 @ ab2d84fe1023/Objects/bytesobject.c#L500-900
bytes_hash follows the same pattern as unicode_hash:
static Py_hash_t
bytes_hash(PyBytesObject *a)
{
if (a->ob_shash != -1)
return a->ob_shash;
Py_uhash_t x = _Py_HashBytes(a->ob_val, Py_SIZE(a));
if (x == (Py_uhash_t)-1)
x = 1520022418;
a->ob_shash = x;
return x;
}
_Py_HashBytes dispatches to SipHash-1-3 using the per-process secret
initialized by Py_Initialize. The -1 sentinel avoids confusing a computed
hash of 0xFFFF... with the "not yet computed" state; the replacement value
1520022418 is arbitrary but stable.
Concat (lines 900 to 1300)
cpython 3.14 @ ab2d84fe1023/Objects/bytesobject.c#L900-1300
bytes_concat implements the + operator. The fast path avoids a second
allocation when the left operand is not shared:
static PyObject *
bytes_concat(PyObject *a, PyObject *b)
{
Py_buffer va, vb;
...
Py_ssize_t size = va.len + vb.len;
...
/* In-place resize when caller holds the only reference. */
if (Py_REFCNT(a) == 1 && PyBytes_CheckExact(a) &&
_PyBytes_Resize(&a, size) == 0)
{
memcpy(PyBytes_AS_STRING(a) + va.len, vb.buf, vb.len);
...
return a;
}
PyObject *result = PyBytes_FromStringAndSize(NULL, size);
if (result != NULL) {
memcpy(PyBytes_AS_STRING(result), va.buf, va.len);
memcpy(PyBytes_AS_STRING(result) + va.len, vb.buf, vb.len);
}
...
return result;
}
_PyBytes_Resize calls PyObject_Realloc in place; this works because
bytes objects are allocated with the object allocator and do not have
interior pointers. The new trailing NUL is written by _PyBytes_Resize.
Repr (lines 1700 to 2200)
cpython 3.14 @ ab2d84fe1023/Objects/bytesobject.c#L1700-2200
bytes_repr walks the buffer and classifies each byte. Printable ASCII
(0x20-0x7e, excluding ' and \\) is copied verbatim. Special characters get
named escapes: \\n, \\r, \\t. All other bytes are written as \\xNN using
two uppercase hex digits. The quote character is chosen by scanning for ' and
" occurrences in the buffer first — the repr uses whichever quote appears
less, falling back to ' on a tie — so b"it's" is rendered as b"it's"
rather than b'it\'s'.
/* Quote selection. */
quote = '\'';
if (memchr(op->ob_val, '\'', Py_SIZE(op)) &&
!memchr(op->ob_val, '"', Py_SIZE(op)))
quote = '"';
The output length is computed in a first pass so the result bytes object can
be allocated at the right size before the second (writing) pass begins.