Skip to main content

Lib/struct.py

cpython 3.14 @ ab2d84fe1023/Lib/struct.py

Lib/struct.py is a two-line shim:

from _struct import *
from _struct import _clearcache, pack, pack_into, unpack, unpack_from, \
iter_unpack, calcsize, error, Struct

All real logic lives in Modules/_struct.c. The Python file's only job is to make the C module's symbols available under the struct namespace and to ensure error and the private _clearcache function are exported even though they are not in _struct.__all__.

This annotation therefore covers Modules/_struct.c directly, treating the C source as the authoritative implementation.

Struct is a compiled format object. Calling Struct(fmt) parses the format string once and caches the result; subsequent pack/unpack calls reuse the parsed representation. Module-level functions pack, unpack, etc. maintain an internal LRU cache of Struct objects keyed by format string, so repeated calls with the same format are as fast as using an explicit Struct.

Map

LinesSymbolRolegopy
1-50from _struct import *, __all__Full re-export; the Python layer adds no logic. All symbols originate in Modules/_struct.c.(stdlib pending)
_struct.c format parserbyte-order prefixes, format codes@, =, <, >, ! set byte order and alignment; x c b B h H i I l L q Q e f d s p P n N ? are the format codes with defined size and alignment.(stdlib pending)
_struct.c Struct typeStruct.__init__, Struct.pack, Struct.unpack, Struct.pack_into, Struct.unpack_from, Struct.iter_unpack, Struct.size, Struct.formatCompiled format object; size is computed once at compile time; iter_unpack returns an iterator that advances through a buffer in size-byte steps.(stdlib pending)
_struct.c module-levelpack, unpack, pack_into, unpack_from, iter_unpack, calcsizeConvenience wrappers that compile the format string through an internal LRU cache and delegate to the matching Struct method.(stdlib pending)

Reading

Byte-order prefix semantics

cpython 3.14 @ ab2d84fe1023/Lib/struct.py#L1-50

The first character of a format string selects byte order and alignment:

PrefixByte orderSizeAlignment
@ (default)nativenativenative
=nativestandardnone
<little-endianstandardnone
>big-endianstandardnone
!network (big-endian)standardnone

"Native" size means the C compiler's sizeof for the corresponding type. "Standard" size is the fixed width mandated by the format code table. "Native alignment" inserts padding bytes before each field so it falls on its natural boundary. No prefix other than @ inserts padding.

import struct

# native byte order, native sizes, native alignment
struct.pack('@ii', 1, 2) # may have padding; size is platform-dependent

# little-endian, standard sizes (4 bytes each), no padding
struct.pack('<ii', 1, 2) # always 8 bytes

# big-endian (network order), same as >
struct.pack('!H', 0x0102) # b'\x01\x02'

Format code table

CodeC typePython typeStandard size (bytes)
xpad byteno value1
ccharbytes of length 11
bsigned charint1
Bunsigned charint1
?_Boolbool1
hshortint2
Hunsigned shortint2
iintint4
Iunsigned intint4
llongint4
Lunsigned longint4
qlong longint8
Qunsigned long longint8
nssize_tint(native only)
Nsize_tint(native only)
ehalf-floatfloat2
ffloatfloat4
ddoublefloat8
schar[]bytes1 per character
pPascal stringbytes1 per character
Pvoid *int(native only)

A repeat count may precede any code: 4H is four unsigned shorts. For s the count is the byte length of the string, not a repeat: 10s is one 10-byte field.

pack and unpack usage

import struct

# module-level convenience (implemented in _struct.c):
struct.pack('>IH', 0xDEAD, 0xBEEF) # => b'\x00\x00\xde\xad\xbe\xef'
struct.unpack('>IH', b'\x00\x00\xde\xad\xbe\xef') # => (0xDEAD, 0xBEEF)
struct.calcsize('>IH') # => 6

# pack_into writes into an existing writable buffer at offset
buf = bytearray(8)
struct.pack_into('<I', buf, 2, 0xDEADBEEF)
# buf[2:6] = b'\xef\xbe\xad\xde'

# unpack_from reads from offset without slicing
(val,) = struct.unpack_from('<I', buf, 2)
# val = 0xDEADBEEF

pack returns a bytes object whose length equals calcsize(fmt). unpack always returns a tuple, even for a single-field format. pack_into writes into any writable buffer that supports the buffer protocol (bytearray, memoryview). unpack_from accepts an optional offset parameter so callers avoid an intermediate slice.

Struct compiled format cache

s = struct.Struct('>IH')
data = s.pack(0xDEAD, 0xBEEF)
values = s.unpack(data)
s.pack_into(buf, offset, 0xDEAD, 0xBEEF)

Struct.__init__ parses the format string in C and stores an internal list of (code, count, offset) triples plus the total size. Struct.size equals calcsize(fmt). pack, unpack, pack_into, and unpack_from on a Struct instance skip the parse step, making them faster than the module-level functions for hot paths.

iter_unpack streaming

fmt = struct.Struct('<HH') # compile once
data = b'\x01\x00\x02\x00\x03\x00\x04\x00'

for a, b in fmt.iter_unpack(data):
print(a, b)
# 1 2
# 3 4

iter_unpack(fmt, buffer) returns a lazy iterator that yields successive unpack results stepping by calcsize(fmt) bytes. The buffer length must be an exact multiple of the step size; otherwise struct.error is raised on the first call to __next__. The iterator holds a reference to the original buffer, so mutating it between iterations produces undefined results. The module-level struct.iter_unpack compiles the format string through the internal LRU cache before delegating to the same mechanism.

gopy mirror

encoding/binary in Go handles fixed-size integer encoding and decoding with explicit byte-order values (binary.LittleEndian, binary.BigEndian). For a full struct port, gopy needs to implement the format-string parser, the format-code table above, padding/alignment logic for @-prefixed formats, and the Struct compiled cache. iter_unpack maps naturally to a Go iterator over a byte slice. The e (half-float) code requires IEEE 754 binary16 conversion since Go's encoding/binary does not handle float16 natively; a small bit-manipulation helper is needed.