Skip to main content

Modules/_pickle.c

Source:

cpython 3.14 @ ab2d84fe1023/Modules/_pickle.c

_pickle accelerates pickle.Pickler and pickle.Unpickler. Protocol 5 (3.8+) adds out-of-band buffers. The pure-Python fallback is Lib/pickle.py.

Map

LinesSymbolRole
1-300Opcode constantsMARK, STOP, POP, INT, LONG, STRING, UNICODE, DICT, etc.
301-800Pickler typeState: memo, write, dispatch_table, fast mode
801-1500save_* functionsSerialize each type: int, float, bytes, str, list, dict, tuple
1501-2000save_reduceGeneral serialization via __reduce__ / __reduce_ex__
2001-2500save_newobj, save_pers__new__ reconstruction, persistent id
2501-3500Unpickler typeState: stack, memo, read, readline
3501-5000load_* functionsDeserialize each opcode
5001-6000Protocol 5BYTEARRAY8, NEXT_BUFFER, READONLY_BUFFER opcodes
6001-8000Module initPickleError, PicklingError, UnpicklingError

Reading

Memo table

// CPython: Modules/_pickle.c:480 Pickler_memo
/* The memo maps id(obj) → (memo_id, obj) */
/* Used to detect and handle object cycles and shared references */
static int
memo_put(PicklerObject *self, PyObject *obj)
{
PyObject *key = PyLong_FromVoidPtr(Py_ID(obj));
Py_ssize_t idx = PyDict_GET_SIZE(self->memo);
PyObject *memo_id = PyLong_FromSsize_t(idx);
PyDict_SetItem(self->memo, key, PyTuple_Pack(2, memo_id, obj));
/* Emit GET opcode so future references use memo_id */
...
}

If an object has already been pickled, GET emits a back-reference rather than re-serializing.

save_int

// CPython: Modules/_pickle.c:910 save_long
static int
save_long(PicklerObject *self, PyObject *obj)
{
PyObject *repr = _PyLong_Format(obj, 10);
/* Protocol 0: 'L' + decimal digits + '\n' */
/* Protocol 2+: LONG1 or LONG4 opcode + little-endian byte array */
if (self->proto >= 2) {
/* Encode as LONG1 (1-byte length) or LONG4 (4-byte length) */
unsigned char *pdata = (unsigned char *)PyBytes_AS_STRING(repr);
Py_ssize_t nbytes = ...; /* two's complement byte count */
...
}
...
}

save_reduce

// CPython: Modules/_pickle.c:1620 save_reduce
static int
save_reduce(PicklerObject *self, PyObject *args, PyObject *obj)
{
/* args = (callable, args[, state[, list_items[, dict_items[, state_setter]]]]) */
/* Emit: GLOBAL callable \n MARK args... TUPLE REDUCE */
/* If state: MARK ... BUILD */
...
}

__reduce__ must return (callable, args) minimum. __reduce_ex__ accepts a protocol argument and can return a 6-tuple.

load_short_binstring

// CPython: Modules/_pickle.c:3700 load_short_binstring
static int
load_short_binstring(UnpicklerObject *self)
{
char *s;
Py_ssize_t x = (unsigned char)self->read_func(self, &s, 1)[0];
self->read_func(self, &s, x);
PyObject *obj = PyBytes_FromStringAndSize(s, x);
PDATA_PUSH(self->stack, obj, -1);
return 0;
}

Protocol 5 — out-of-band buffers

// CPython: Modules/_pickle.c:5100 save_bytearray
static int
save_bytearray(PicklerObject *self, PyObject *obj)
{
if (self->proto >= 5 && self->buffer_callback != NULL) {
/* Emit NEXT_BUFFER: tell the consumer to get this data out-of-band */
...
}
/* Fall back to BYTEARRAY8 in-band */
...
}

Protocol 5 lets the receiver reconstruct large buffers (NumPy arrays) without copying them through the pickle stream.

gopy notes

pickle is not yet in gopy's standard module set. The annotation documents the C implementation for reference when the port is done. The key data structures are the memo dict (id-keyed), the stack list (operand stack), and the read/write callbacks (support for in-memory and file-backed pickling). Protocol 2+ uses binary opcodes; protocol 0 is human-readable text.