Skip to main content

Modules/_codecsmodule.c

Source:

cpython 3.14 @ ab2d84fe1023/Modules/_codecsmodule.c

_codecs is the low-level C module for the codec registry. codecs (in Lib/codecs.py) provides the Python-facing API. The actual encoding/decoding is in Objects/unicodeobject.c for built-in codecs and in C extension modules for others.

Map

LinesSymbolRole
1-100_codecs_lookup_implLook up a codec by name
101-250_codecs_encode_impl, _codecs_decode_implEncode/decode a string
251-450_codecs_register_implRegister a search function
451-700charmap_encode, charmap_decodeCharmap codec entry
701-800Error handler registryregister_error, lookup_error

Reading

Codec lookup

// CPython: Modules/_codecsmodule.c:42 _codecs_lookup_impl
static PyObject *
_codecs_lookup_impl(PyObject *module, const char *encoding)
{
PyObject *codec = _PyCodec_Lookup(encoding);
return codec;
}

_PyCodec_Lookup normalizes the encoding name (lowercase, replace -/ with _) and searches the codec registry. Built-in codecs like utf-8, latin-1, and ascii are found immediately; others invoke registered search functions.

Error handlers

// CPython: Modules/_codecsmodule.c:320 _codecs_register_error_impl
static PyObject *
_codecs_register_error_impl(PyObject *module, const char *errors,
PyObject *handler)
{
/* Register a callable for 'errors' mode */
return PyCodec_RegisterError(errors, handler);
}

The standard error handlers are: strict (raise), ignore, replace, xmlcharrefreplace, backslashreplace, surrogateescape, surrogatepass. Custom handlers are registered here.

surrogatepass

The surrogatepass error handler allows encoding lone surrogate characters (U+D800–U+DFFF) that appear in Python strings (via sys.stdin.buffer.read().decode('utf-8', 'surrogatepass')). This is the only safe way to round-trip arbitrary bytes through a str.

gopy notes

codecs is used by io.TextIOWrapper for all text encoding/decoding. In gopy the codec registry is in module/codecs/. Built-in codecs (utf-8, ascii, latin-1, utf-16, utf-32) wrap Go's unicode/utf8 and golang.org/x/text/encoding. The surrogatepass and surrogateescape error handlers are needed for correct sys.stdin/sys.stdout behavior.