Python/marshal.c
cpython 3.14 @ ab2d84fe1023/Python/marshal.c
Python/marshal.c implements the marshal module: a compact binary serialization format
for Python objects. It is not a general-purpose serializer; its purpose is to write
compiled bytecode (PyCodeObject) and the constants within it to .pyc files. The format
is versioned but not stable across Python releases. importlib._bootstrap_external calls
marshal.loads to deserialize .pyc file contents at import time.
Map
| Lines | Symbol | Role |
|---|---|---|
| 1-80 | type codes, WFILE | Single-byte type tags, write-state struct |
| 81-300 | w_object | Write dispatch: int, float, complex, bytes, str, tuple, list, dict, set, code |
| 301-500 | w_complex_object | Handles reference tracking for shared objects |
| 501-750 | r_object | Read dispatch mirroring w_object |
| 751-1000 | r_code_object | PyCodeObject deserialization |
| 1001-1200 | PyMarshal_WriteObjectToFile, PyMarshal_WriteObjectToString | Public write API |
| 1201-1400 | PyMarshal_ReadObjectFromFile, PyMarshal_ReadObjectFromString | Public read API |
| 1401-1600 | marshal_module_exec, PyInit_marshal | Module registration |
Reading
Type codes and write dispatch
Each Python type is assigned a single-byte code (e.g., TYPE_INT = 'i', TYPE_STRING = 's', TYPE_CODE = 'c'). w_object checks the type and calls the appropriate writer.
// CPython: Python/marshal.c:100 w_object
static void
w_object(PyObject *v, WFILE *p)
{
...
if (v == Py_None) { w_byte(TYPE_NONE, p); }
else if (v == Py_Ellipsis){ w_byte(TYPE_ELLIPSIS, p); }
else if (PyLong_Check(v)) { w_PyLong(v, p); }
else if (PyFloat_Check(v)){ w_byte(TYPE_FLOAT, p); ... }
else if (PyBytes_Check(v)){ w_byte(TYPE_STRING, p); w_pstring(...) }
else if (PyCode_Check(v)) { w_complex_object(v, p); }
...
Reference tracking for shared constants
.pyc files can contain the same constant object at multiple locations. The FLAG_REF
bit in the type code marks an object that is stored in a reference table; later
occurrences are written as a TYPE_REF plus integer index.
// CPython: Python/marshal.c:340 w_complex_object
static void
w_complex_object(PyObject *v, WFILE *p)
{
Py_ssize_t i, n;
if (p->depth > MAX_MARSHAL_STACK_DEPTH) {
p->error = WFERR_NESTEDTOODEEP;
return;
}
...
if (p->ref_dict != NULL) {
idx = PyDict_GET_SIZE(p->ref_dict);
...
w_byte(type | FLAG_REF, p);
}
r_code_object: code object deserialization
r_code_object reads each field of PyCodeObject in a fixed order that must match the
write order in w_complex_object. It calls PyCode_NewWithPosOnlyArgs to construct the
object from the deserialized fields.
// CPython: Python/marshal.c:830 r_code_object
static PyObject *
r_code_object(RFILE *p)
{
int argcount, posonlyargcount, kwonlyargcount;
int nlocals, stacksize, flags;
...
argcount = (int)r_long(p);
posonlyargcount = (int)r_long(p);
kwonlyargcount = (int)r_long(p);
...
code = PyCode_NewWithPosOnlyArgs(...);
return code;
}
gopy notes
The compile package in gopy writes a Go-equivalent of the marshal format for .pyc
files via compile/compiler.go. The PyCodeObject field order in r_code_object must be
matched exactly when implementing .pyc read compatibility. Python/marshal.c is the
authoritative field ordering reference.
CPython 3.14 changes
3.14 bumped the marshal version to 5, adding serialization for PyCodeObject.co_qualname
and co_exceptiontable format changes introduced by the new exception table format.
TYPE_SHORT_ASCII was extended to handle strings up to 255 bytes without a length prefix.