Skip to main content

marshal.c: bytecode serialization

Python/marshal.c is CPython's binary serialization layer for .pyc files and the marshal module. It is intentionally narrow: it handles only the object types that can appear in compiled code objects, not the full Python object graph. Version 4 (the current default) added a reference table that lets shared sub-objects be written once and referenced by index on subsequent encounters.

Map

LinesSymbolRole
60–130WFILE / RFILE structsWriter and reader state; carry buffer, depth counter, and the reference table
131–200w_byte / w_short / w_longPrimitive little-endian write helpers
201–280w_PyLongWrites a Python int as sign + array of 15-bit digits
281–400w_complex_objectHandles TYPE_REF registration for version-4 shared objects
401–620w_objectMain dispatch: writes a type byte (TYPE_*) then the payload for each supported type
621–720r_byte / r_short / r_longPrimitive little-endian read helpers
721–820r_PyLongReads sign + digit array back into a PyLongObject
821–1100r_objectRecursive reader; switches on the type byte and reconstructs each object
1101–1200r_ref_reserve / r_ref_insertManages the reference table for the FLAG_REF bit
1201–1350r_codeDeserializes a PyCodeObject field by field (name, filename, consts, lnotab, etc.)
1351–1450PyMarshal_WriteObjectToFilePublic C API: opens an RFILE and calls w_object
1451–1550PyMarshal_ReadObjectFromFilePublic C API: opens an RFILE and calls r_object
1551–1600marshal_modulePython-level marshal.dumps / marshal.loads / marshal.dump / marshal.load

Reading

w_object type-byte dispatch

Every value is prefixed with a single type byte. For version 4, the high bit (FLAG_REF, value 0x80) is ORed into the type byte when the object is being added to the reference table. The type constants are defined as ASCII characters for readability (TYPE_INT = 'i', TYPE_STRING = 's', TYPE_CODE = 'c', etc.):

// Python/marshal.c:401 w_object
static void
w_object(PyObject *v, WFILE *p)
{
...
if (v == Py_None) { w_byte(TYPE_NONE, p); }
else if (v == Py_False) { w_byte(TYPE_FALSE, p); }
else if (v == Py_True) { w_byte(TYPE_TRUE, p); }
else if (PyLong_CheckExact(v)) {
...
w_byte(flag | TYPE_INT, p);
w_long((long)PyLong_AsLong(v), p);
}
...
else if (PyCode_Check(v)) {
w_byte(flag | TYPE_CODE, p);
w_complex_object(v, flag, p);
}
...
}

The flag variable is either 0 or FLAG_REF depending on whether the object was freshly added to the write-side reference table via w_reserve_ref.

r_object and the FLAG_REF reference table

r_object reads the type byte and checks the FLAG_REF bit before dispatching. If the bit is set, r_ref_reserve pre-allocates a slot in the read-side reference list and r_ref_insert fills it in once the object is fully constructed. This allows circular-free but shared structures (like interned strings appearing many times in a constants tuple) to be stored once:

// Python/marshal.c:821 r_object
static PyObject *
r_object(RFILE *p)
{
int type = r_byte(p);
int flag = type & FLAG_REF;
type &= ~FLAG_REF;

Py_ssize_t idx = -1;
if (flag)
idx = r_ref_reserve(flag, p);

PyObject *retval = NULL;
switch (type) {
case TYPE_NONE: retval = Py_None; Py_INCREF(retval); break;
case TYPE_INT: retval = PyLong_FromLong(r_long(p)); break;
case TYPE_STRING: retval = r_string(p); break;
case TYPE_CODE: retval = r_code(p); break;
...
}
if (flag && retval != NULL)
r_ref_insert(idx, retval, p);
return retval;
}

r_code and code object fields

r_code reads fields in a fixed order that must match w_object's write order for TYPE_CODE. In CPython 3.14 the field order (simplified) is: argcount, posonlyargcount, kwonlyargcount, stacksize, flags, code (bytes), consts (tuple), names, localsplusnames, localspluskinds, filename, name, qualname, firstlineno, lnotab (now linetable), exceptiontable, freevars, cellvars.

In 3.14 the old lnotab field was replaced by linetable (a more compact encoding), and exceptiontable was added to encode the exception handler ranges that were previously embedded in the bytecode stream. Any reader that hard-codes the 3.10 field order will silently misparse 3.14 .pyc files.

// Python/marshal.c:1201 r_code (abbreviated)
argcount = (int)r_long(p);
posonlyargcount = (int)r_long(p);
kwonlyargcount = (int)r_long(p);
stacksize = (int)r_long(p);
flags = (int)r_long(p);
code = r_object(p); /* bytes */
consts = r_object(p); /* tuple */
...
linetable = r_object(p); /* bytes, replaces lnotab */
exceptiontable = r_object(p); /* bytes, new in 3.11 */
...
return (PyObject *)PyCode_NewWithPosOnlyArgs(...);

gopy notes

gopy does not port marshal.c directly. .pyc parsing is handled in the compiler pipeline via compile/compiler.go, which emits bytecode into gopy's own in-memory flowgraph representation rather than writing binary .pyc files. The reference-table mechanism (FLAG_REF) has no counterpart in gopy because code objects are never serialized to disk by the current implementation. If .pyc read/write support is added in a future version, r_code field ordering for 3.14 (especially linetable and exceptiontable) will need to be matched exactly.