Skip to main content

Modules/_struct.c

cpython 3.14 @ ab2d84fe1023/Modules/_struct.c

_struct.c is the C accelerator for Python's struct module. The public Lib/struct.py is a thin shim that imports everything from _struct and re-exports it, so all real work happens here. The module converts between Python values and C binary representations described by a compact format string, which is essential for network protocols, binary file formats, and interfacing with C libraries through ctypes or mmap.

Format strings combine an optional endian/alignment prefix (< little-endian, > big-endian, ! network, = native, @ native with alignment) with one or more format characters (H unsigned short, I unsigned int, L unsigned long, Q unsigned long long, s char array, f float, d double, and others). The compiled Struct object caches the per-format-character pack and unpack function pointers so repeated use of the same format avoids re-parsing.

Struct(fmt) is the primary interface. Its pack, unpack, and iter_unpack methods operate on the pre-compiled format. One-shot module-level functions (struct.pack, struct.unpack, struct.calcsize) compile the format on every call and are convenience wrappers around the same underlying machinery.

Map

LinesSymbolRolegopy
1-120formatdef tableMaps each format character to a size, pack function, and unpack function
121-400s_object struct + s_initCompiled Struct object: format string, size, list of formatcode entries
401-700pack_into / s_packWrites Python values into a writable buffer according to the format
701-950unpack_from / s_unpackReads a buffer and returns a tuple of Python values
951-1150iter_unpack / unpackiterIterator that yields successive fixed-size chunks from a buffer
1151-1350calcsize + s_get_sizeReturns total byte count for a compiled or one-shot format
1351-1600Module init + method tablesPyModuleDef, one-shot wrappers, Struct type registration

Reading

Format compilation

s_init walks the format string character by character. For each format character it looks up the corresponding formatdef entry, checks alignment requirements, and appends a formatcode record to the compiled list. The total size is accumulated so calcsize is O(1) on a compiled Struct. Repeat counts (e.g. 4H) are expanded inline into that many formatcode entries.

pack and pack_into

s_pack allocates a bytes object of Struct.size bytes, then calls s_pack_internal. The internal function iterates the formatcode list and calls each entry's pack function pointer, passing the next Python argument and a pointer into the output buffer. Native-endian formats use direct C assignments; byte-swapped formats apply _Py_bswap16 or _Py_bswap32 after the store.

unpack and iter_unpack

s_unpack obtains a read-only buffer via PyBUF_SIMPLE, verifies its length equals Struct.size, and calls each entry's unpack function pointer in order, building a tuple. iter_unpack wraps the same logic in an iterator object that advances its offset by Struct.size on each __next__ call, stopping when fewer than size bytes remain.

Endian and alignment prefixes

The prefix character is consumed before the format loop and sets two module-global variables: byteorder (little, big, or native) and use_alignment. When use_alignment is set (prefix @ or absent), each format character's natural C alignment is respected by inserting padding bytes. The <, >, and ! prefixes suppress alignment, which is the common case for network and file formats.

calcsize

calcsize on a compiled Struct returns the cached s_object->s_size field in O(1). The one-shot module-level struct.calcsize(fmt) compiles a temporary Struct, reads its size, and discards it. Because compilation is cheap for short formats this is acceptable, but callers that invoke calcsize in a loop should cache a Struct object instead.

/* _struct.c: unpack one unsigned short (abbreviated) */
static PyObject *
bu_ushort(const char *p, const formatdef *f)
{
unsigned short x;
memcpy(&x, p, sizeof x);
/* byte-swap if big-endian format on little-endian host */
return PyLong_FromUnsignedLong(x);
}

gopy mirror

Not yet ported.