Modules/_pickle.c
Source:
cpython 3.14 @ ab2d84fe1023/Modules/_pickle.c
_pickle accelerates pickle.Pickler and pickle.Unpickler. Protocol 5 (3.8+) adds out-of-band buffers. The pure-Python fallback is Lib/pickle.py.
Map
| Lines | Symbol | Role |
|---|---|---|
| 1-300 | Opcode constants | MARK, STOP, POP, INT, LONG, STRING, UNICODE, DICT, etc. |
| 301-800 | Pickler type | State: memo, write, dispatch_table, fast mode |
| 801-1500 | save_* functions | Serialize each type: int, float, bytes, str, list, dict, tuple |
| 1501-2000 | save_reduce | General serialization via __reduce__ / __reduce_ex__ |
| 2001-2500 | save_newobj, save_pers | __new__ reconstruction, persistent id |
| 2501-3500 | Unpickler type | State: stack, memo, read, readline |
| 3501-5000 | load_* functions | Deserialize each opcode |
| 5001-6000 | Protocol 5 | BYTEARRAY8, NEXT_BUFFER, READONLY_BUFFER opcodes |
| 6001-8000 | Module init | PickleError, PicklingError, UnpicklingError |
Reading
Memo table
// CPython: Modules/_pickle.c:480 Pickler_memo
/* The memo maps id(obj) → (memo_id, obj) */
/* Used to detect and handle object cycles and shared references */
static int
memo_put(PicklerObject *self, PyObject *obj)
{
PyObject *key = PyLong_FromVoidPtr(Py_ID(obj));
Py_ssize_t idx = PyDict_GET_SIZE(self->memo);
PyObject *memo_id = PyLong_FromSsize_t(idx);
PyDict_SetItem(self->memo, key, PyTuple_Pack(2, memo_id, obj));
/* Emit GET opcode so future references use memo_id */
...
}
If an object has already been pickled, GET emits a back-reference rather than re-serializing.
save_int
// CPython: Modules/_pickle.c:910 save_long
static int
save_long(PicklerObject *self, PyObject *obj)
{
PyObject *repr = _PyLong_Format(obj, 10);
/* Protocol 0: 'L' + decimal digits + '\n' */
/* Protocol 2+: LONG1 or LONG4 opcode + little-endian byte array */
if (self->proto >= 2) {
/* Encode as LONG1 (1-byte length) or LONG4 (4-byte length) */
unsigned char *pdata = (unsigned char *)PyBytes_AS_STRING(repr);
Py_ssize_t nbytes = ...; /* two's complement byte count */
...
}
...
}
save_reduce
// CPython: Modules/_pickle.c:1620 save_reduce
static int
save_reduce(PicklerObject *self, PyObject *args, PyObject *obj)
{
/* args = (callable, args[, state[, list_items[, dict_items[, state_setter]]]]) */
/* Emit: GLOBAL callable \n MARK args... TUPLE REDUCE */
/* If state: MARK ... BUILD */
...
}
__reduce__ must return (callable, args) minimum. __reduce_ex__ accepts a protocol argument and can return a 6-tuple.
load_short_binstring
// CPython: Modules/_pickle.c:3700 load_short_binstring
static int
load_short_binstring(UnpicklerObject *self)
{
char *s;
Py_ssize_t x = (unsigned char)self->read_func(self, &s, 1)[0];
self->read_func(self, &s, x);
PyObject *obj = PyBytes_FromStringAndSize(s, x);
PDATA_PUSH(self->stack, obj, -1);
return 0;
}
Protocol 5 — out-of-band buffers
// CPython: Modules/_pickle.c:5100 save_bytearray
static int
save_bytearray(PicklerObject *self, PyObject *obj)
{
if (self->proto >= 5 && self->buffer_callback != NULL) {
/* Emit NEXT_BUFFER: tell the consumer to get this data out-of-band */
...
}
/* Fall back to BYTEARRAY8 in-band */
...
}
Protocol 5 lets the receiver reconstruct large buffers (NumPy arrays) without copying them through the pickle stream.
gopy notes
pickle is not yet in gopy's standard module set. The annotation documents the C implementation for reference when the port is done. The key data structures are the memo dict (id-keyed), the stack list (operand stack), and the read/write callbacks (support for in-memory and file-backed pickling). Protocol 2+ uses binary opcodes; protocol 0 is human-readable text.