Skip to main content

Python/marshal.c

cpython 3.14 @ ab2d84fe1023/Python/marshal.c

Python/marshal.c implements the marshal module: a compact binary serialization format for Python objects. It is not a general-purpose serializer; its purpose is to write compiled bytecode (PyCodeObject) and the constants within it to .pyc files. The format is versioned but not stable across Python releases. importlib._bootstrap_external calls marshal.loads to deserialize .pyc file contents at import time.

Map

LinesSymbolRole
1-80type codes, WFILESingle-byte type tags, write-state struct
81-300w_objectWrite dispatch: int, float, complex, bytes, str, tuple, list, dict, set, code
301-500w_complex_objectHandles reference tracking for shared objects
501-750r_objectRead dispatch mirroring w_object
751-1000r_code_objectPyCodeObject deserialization
1001-1200PyMarshal_WriteObjectToFile, PyMarshal_WriteObjectToStringPublic write API
1201-1400PyMarshal_ReadObjectFromFile, PyMarshal_ReadObjectFromStringPublic read API
1401-1600marshal_module_exec, PyInit_marshalModule registration

Reading

Type codes and write dispatch

Each Python type is assigned a single-byte code (e.g., TYPE_INT = 'i', TYPE_STRING = 's', TYPE_CODE = 'c'). w_object checks the type and calls the appropriate writer.

// CPython: Python/marshal.c:100 w_object
static void
w_object(PyObject *v, WFILE *p)
{
...
if (v == Py_None) { w_byte(TYPE_NONE, p); }
else if (v == Py_Ellipsis){ w_byte(TYPE_ELLIPSIS, p); }
else if (PyLong_Check(v)) { w_PyLong(v, p); }
else if (PyFloat_Check(v)){ w_byte(TYPE_FLOAT, p); ... }
else if (PyBytes_Check(v)){ w_byte(TYPE_STRING, p); w_pstring(...) }
else if (PyCode_Check(v)) { w_complex_object(v, p); }
...

Reference tracking for shared constants

.pyc files can contain the same constant object at multiple locations. The FLAG_REF bit in the type code marks an object that is stored in a reference table; later occurrences are written as a TYPE_REF plus integer index.

// CPython: Python/marshal.c:340 w_complex_object
static void
w_complex_object(PyObject *v, WFILE *p)
{
Py_ssize_t i, n;
if (p->depth > MAX_MARSHAL_STACK_DEPTH) {
p->error = WFERR_NESTEDTOODEEP;
return;
}
...
if (p->ref_dict != NULL) {
idx = PyDict_GET_SIZE(p->ref_dict);
...
w_byte(type | FLAG_REF, p);
}

r_code_object: code object deserialization

r_code_object reads each field of PyCodeObject in a fixed order that must match the write order in w_complex_object. It calls PyCode_NewWithPosOnlyArgs to construct the object from the deserialized fields.

// CPython: Python/marshal.c:830 r_code_object
static PyObject *
r_code_object(RFILE *p)
{
int argcount, posonlyargcount, kwonlyargcount;
int nlocals, stacksize, flags;
...
argcount = (int)r_long(p);
posonlyargcount = (int)r_long(p);
kwonlyargcount = (int)r_long(p);
...
code = PyCode_NewWithPosOnlyArgs(...);
return code;
}

gopy notes

The compile package in gopy writes a Go-equivalent of the marshal format for .pyc files via compile/compiler.go. The PyCodeObject field order in r_code_object must be matched exactly when implementing .pyc read compatibility. Python/marshal.c is the authoritative field ordering reference.

CPython 3.14 changes

3.14 bumped the marshal version to 5, adding serialization for PyCodeObject.co_qualname and co_exceptiontable format changes introduced by the new exception table format. TYPE_SHORT_ASCII was extended to handle strings up to 255 bytes without a length prefix.