Skip to main content

Lib/pickle.py

Source:

cpython 3.14 @ ab2d84fe1023/Lib/pickle.py

pickle serializes Python objects to a byte stream and deserializes them back. It handles arbitrary object graphs including cycles. The pure-Python implementation is replaced at import time by _pickle (written in C) when available. Protocol 5 (Python 3.8+) is the highest protocol and adds out-of-band buffer support.

Map

LinesSymbolRole
1-100Constants, opcodesProtocol numbers, opcode bytes
101-300PicklerSerialization; dispatch table
301-600Pickler dispatch methodssave_none, save_bool, save_int, save_float, save_bytes, save_str, save_list, save_dict, save_reduce
601-900Reduction protocolsave_reduce, __reduce__, __reduce_ex__
901-1400UnpicklerDeserialization; opcode dispatch
1401-1600Unpickler dispatchload_* methods per opcode
1601-1800Helper functions, dump/dumps/load/loadsTop-level convenience API

Reading

Pickle protocol opcodes

The pickle format is a stack-based bytecode. Each opcode pushes, pops, or manipulates items on a virtual stack. The Unpickler executes opcodes until it hits STOP and returns the top of the stack.

Key opcodes include GLOBAL (push a class/function by module and name), REDUCE (call a callable with a tuple of args), BUILD (call __setstate__ on the top object), MARK (push a marker for variable-length collections), DICT (pop items down to the last mark and build a dict), and SETITEMS (update a dict).

Pickler.save_reduce

The central serialization method. Most complex objects are pickled by calling obj.__reduce_ex__(protocol) which returns a tuple (callable, args[, state[, list_items[, dict_items]]]).

# CPython: Lib/pickle.py:420 Pickler.save_reduce
def save_reduce(self, func, args, state=None, listitems=None,
dictitems=None, state_setter=None, obj=None):
...
save(func)
save(args)
write(REDUCE)
if obj is not None:
if id(obj) not in self.memo:
self.memoize(obj)
if state is not None:
self._save_reduce_override_state(state, obj, state_setter)
...

Memoization and cycles

Pickler.memo is a dict mapping id(obj) to (memo_id, obj). When an object is serialized, it is memoized. If the same object appears again, a GET opcode references the memo slot instead of re-serializing, enabling cycle handling.

# CPython: Lib/pickle.py:247 Pickler.memoize
def memoize(self, obj):
assert id(obj) not in self.memo
idx = len(self.memo)
self.memo[id(obj)] = idx, obj
self.write(self.put(idx))

__reduce_ex__ protocol

Objects implement pickling by defining __reduce__ or __reduce_ex__. The default object.__reduce_ex__ uses copyreg.__newobj__ and __getstate__/__setstate__ for new-style classes, or object.__reduce__ for the fallback.

Protocol 5: out-of-band buffers

Protocol 5 adds PickleBuffer for zero-copy serialization of large binary data (e.g., NumPy arrays). The pickler calls the buffer_callback instead of inlining the data, allowing the caller to transmit the buffer via a separate channel.

gopy notes

Status: not yet ported. pickle is a critical dependency for shelve, multiprocessing, and many frameworks. A gopy port requires implementing the full opcode dispatch for both Pickler and Unpickler. Protocol 2 compatibility is the minimum viable target since it supports __reduce_ex__ with __newobj__. Protocols 4 and 5 require FRAME and PickleBuffer support.