Skip to main content

Modules/_pickle.c

cpython 3.14 @ ab2d84fe1023/Modules/_pickle.c

The C implementation of the pickle module. The pure-Python Lib/pickle.py exists as a reference and fallback, but the interpreter imports _pickle (this file) in preference so that serialization is fast enough to be used in the REPL, multiprocessing, and shelve.

The file is organized around two top-level types:

  • Pickler — reads Python objects and writes a byte stream in one of protocol versions 0-5.
  • Unpickler — reads a pickle byte stream and reconstructs Python objects.

Both types are registered as pickle.Pickler and pickle.Unpickler. pickle.dump / pickle.dumps / pickle.load / pickle.loads are thin wrappers that construct a temporary instance and call dump() / load().

Memoization is central to correctness: the memo dict maps id(obj) to (memo_index, obj), preventing infinite loops on recursive structures and enabling the GET/PUT opcodes to share references.

Protocol 5 (Python 3.8+) adds BYTEARRAY8 and the out-of-band buffer opcodes NEXT_BUFFER and READONLY_BUFFER for zero-copy serialization of large buffers across process boundaries.

Map

LinesSymbolRolegopy
1-200includes, opcode constants, Py_OPCODE tableProtocol opcode byte values for all five protocol versions.module/pickle/module.go:opcodeTable
200-600Pdata, Pdata_New, Pdata_free, Pdata_push, Pdata_pop, Pdata_poptupleGrowable value stack used by the Unpickler. Fixed-size array with heap overflow.module/pickle/module.go:Pdata
600-1400PicklerObject, Pickler_New, Pickler_clear, Pickler_deallocPickler type definition and lifecycle. Holds write, memo, dispatch_table, proto.module/pickle/module.go:Pickler
1400-3200save, save_bool, save_long, save_float, save_bytes, save_str, save_tuple, save_list, save_dict, save_set, save_frozenset, save_global, save_pers, save_reducePickler save dispatch chain. Checks dispatch_table first, then __reduce_ex__(4), then __reduce__.module/pickle/module.go:save
3200-3800Pickler_dump, Pickler_clear_memo, Pickler_getstate, Pickler_setstatePublic Pickler methods. dump drives the top-level save call.module/pickle/module.go:Dump
3800-4400UnpicklerObject, Unpickler_New, Unpickler_clear, Unpickler_deallocUnpickler type definition and lifecycle. Holds read, readline, memo, stack (Pdata).module/pickle/module.go:Unpickler
4400-6800load_* handlersOne handler per opcode byte: load_mark, load_stop, load_int, load_binint, load_long, load_float, load_string, load_binstring, load_unicode, load_binunicode, load_list, load_dict, load_tuple, load_build, load_global, load_reduce, load_appends, load_setitems, load_frozenset, load_bytearray8, load_next_buffer, etc.module/pickle/module.go:loadHandlers
6800-7400load, dispatch table for loadMain Unpickler loop: reads one opcode byte at a time and dispatches to the matching load_* handler.module/pickle/module.go:Load
7400-7800Unpickler_load, Unpickler_find_class, Unpickler_memoPublic Unpickler methods. find_class is the override hook for restricting deserialization.module/pickle/module.go:FindClass
7800-8000_picklemodule, PyInit__pickleModule definition, pickle.dump / loads / dumps / load convenience functions, and module entry point.module/pickle/module.go:Module

Reading

save dispatch chain (lines 1400 to 3200)

cpython 3.14 @ ab2d84fe1023/Modules/_pickle.c#L1400-3200

save is the heart of the Pickler. It is called recursively for every Python object that needs to be serialized.

static int
save(PicklerObject *self, PyObject *obj, int pers_save)
{
/* 1. Persistent ID hook */
if (!pers_save && self->pers_func) {
if (save_pers(self, obj) != 0) /* writes PERSID / BINPERSID */
goto error;
...
}

/* 2. Memo check — already serialized? */
if (PyMemoTable_Get(self->memo, obj)) {
if (memo_get(self, obj) < 0) /* writes GET / BINGET / LONG_BINGET */
goto error;
return 0;
}

/* 3. dispatch_table lookup */
reduce_func = PyObject_GetItem(self->dispatch_table, (PyObject *)type);
if (reduce_func != NULL) { ... }
else if (PyErr_Occurred()) { ... }

/* 4. __reduce_ex__(protocol) */
else {
reduce_func = PyObject_GetAttr(obj, &_Py_ID(__reduce_ex__));
reduce_value = PyObject_CallOneArg(reduce_func, protocol_int);
}

/* 5. Interpret the reduce value (string = global name, tuple = callable+args) */
if (PyUnicode_Check(reduce_value))
status = save_global(self, obj, reduce_value);
else
status = save_reduce(self, reduce_value, obj);
...
}

The lookup order is: persistent-ID hook, memo (already emitted), per-type dispatch_table, __reduce_ex__, __reduce__. Types like int, str, list, dict, and tuple have direct fast paths in save_long, save_str, etc., which are reached via a type-keyed C dispatch table before the __reduce_ex__ path is ever tried.

save_reduce interprets the five-tuple (callable, args, state, list_items, dict_items) returned by __reduce_ex__, emitting GLOBAL / REDUCE / BUILD / APPENDS / SETITEMS opcodes as needed.

Pdata stack (lines 200 to 600)

cpython 3.14 @ ab2d84fe1023/Modules/_pickle.c#L200-600

The Unpickler maintains a value stack called Pdata. Most opcodes push objects onto it; REDUCE / BUILD / APPENDS / SETITEMS pop and combine them.

typedef struct {
PyObject_HEAD
PyObject **data; /* pointer to the array of stack entries */
Py_ssize_t allocated; /* slots allocated */
Py_ssize_t length; /* slots in use */
} Pdata;

Pdata_New allocates an initial array of 8 slots. Pdata_push doubles capacity when the array is full. Pdata_poptuple(pdata, start) slices off everything from start to length into a new tuple — this is how TUPLE builds its tuple and APPENDS builds its list batch.

The Pdata stack never uses a Python list object; the raw C array avoids reference-count overhead on the hot path. Objects are reference-counted correctly: each slot holds a strong reference, and Pdata_clear decrefs all live entries.

load_global name resolution (lines 4800 to 4950)

cpython 3.14 @ ab2d84fe1023/Modules/_pickle.c#L4800-4950

GLOBAL and STACK_GLOBAL reconstitute a reference to a callable by module name and qualified name:

static int
load_global(UnpicklerObject *self)
{
/* Read "module\nqualname\n" from the stream */
PyObject *global_name, *module_name, *module, *obj;

module_name = _Unpickler_ReadLine(self);
global_name = _Unpickler_ReadLine(self);

/* Call find_class which defaults to __import__ + getattr walk */
obj = find_class(self, module_name, global_name);
PDATA_PUSH(self->stack, obj, -1);
return 0;
}

find_class is the override point for sandboxing. The default implementation calls sys.modules[module_name] then walks the dotted qualname with getattr, so builtins.dict correctly resolves to the built-in dict type. The Unpickler.find_class Python-level method wraps this hook.

STACK_GLOBAL (protocol 2+) pops the module and name directly from the Pdata stack instead of reading ASCII lines, saving encoding overhead for binary protocols.

Protocol v5 buffer objects (lines 6500 to 6800)

cpython 3.14 @ ab2d84fe1023/Modules/_pickle.c#L6500-6800

Protocol 5 adds three new opcodes for out-of-band buffer transfer:

static int
load_next_buffer(UnpicklerObject *self)
{
if (self->buffers == NULL) {
PyErr_SetString(PyExc_AttributeError,
"pickle stream refers to out-of-band buffers, "
"but no *buffers* argument was given");
return -1;
}
PyObject *buf = PyIter_Next(self->buffers);
...
PDATA_PUSH(self->stack, buf, -1);
return 0;
}

BYTEARRAY8 serializes a bytearray inline with an 8-byte little-endian length prefix (up to 2^64 bytes). NEXT_BUFFER pops the next object from the buffers iterator supplied to Unpickler at construction time. READONLY_BUFFER wraps the top-of-stack object in a read-only memoryview, signaling that the receiver must not modify the buffer. These opcodes enable zero-copy sharing of numpy arrays between processes.

CPython 3.14 changes

The dispatch_table lookup was refactored in 3.12 to consult copyreg.dispatch_table as a fallback when the Pickler's own dispatch_table attribute is absent, matching the documented behavior. Protocol 5 (BYTEARRAY8, NEXT_BUFFER, READONLY_BUFFER) was added in 3.8 and is unchanged in 3.14. The Argument Clinic annotations throughout were regenerated for the 3.14 clinic tool but the C logic is stable.