Skip to main content

Modules/_json.c

Source:

cpython 3.14 @ ab2d84fe1023/Modules/_json.c

Modules/_json.c is the C accelerator for the json module. It provides two entry points used by json/encoder.py and json/decoder.py: encode_basestring_ascii (and its non-ASCII variant) for output encoding, and scanstring for input decoding. The JSONEncoder and JSONDecoder Python classes delegate to these functions when the C extension is available, falling back to pure Python otherwise.

Map

SymbolKindLines (approx)Purpose
encoder_encode_stringfunction120-230Emit a JSON string with ASCII-safe escaping
encoder_encode_floatfunction250-290Emit a float, checking for NaN/Inf
encoder_listencode_listfunction310-420Recursively encode a list, tracking indent level
encoder_listencode_dictfunction440-580Recursively encode a dict, detecting circular refs
encoder_listencode_objfunction590-700Top-level dispatch: list/dict/str/int/float/bool/None
scanstring_unicodefunction750-950Decode a JSON string literal, handling all escape forms
scan_once_unicodefunction970-1100Advance past one JSON value, return (value, end_index)
JSONDecoder_newfunction1150-1200Allocate decoder state, cache scanner callable
_markers checkinline450, 490Circular-reference detection via id-keyed dict
escape_tabletable60-110256-entry lookup: 0 = pass-through, else escape char

Reading

encode_string C fast path

The encoder walks the input PyUnicode object one code point at a time using a pre-built 256-entry escape_table. Code points below 0x20 or equal to ", \, or / are replaced with their \uXXXX or short-escape equivalents. The fast path stays entirely in C for all-ASCII strings with no special characters, appending directly to a _PyUnicodeWriter.

// CPython: Modules/_json.c:120 encoder_encode_string
static PyObject *
encoder_encode_string(PyEncoderObject *s, PyObject *obj)
{
/* Use PyUnicode_DATA / PyUnicode_KIND to get raw buffer */
Py_ssize_t i, length;
int kind;
const void *data;
_PyUnicodeWriter writer;

_PyUnicodeWriter_Init(&writer);
/* opening quote */
if (_PyUnicodeWriter_WriteChar(&writer, '"') < 0) goto error;

length = PyUnicode_GET_LENGTH(obj);
kind = PyUnicode_KIND(obj);
data = PyUnicode_DATA(obj);

for (i = 0; i < length; i++) {
Py_UCS4 c = PyUnicode_READ(kind, data, i);
/* escape_table[c] == 0 means emit c directly */
if (c > 0x7f || escape_table[c] == 0) {
if (_PyUnicodeWriter_WriteChar(&writer, c) < 0) goto error;
} else {
/* emit \n, \t, \uXXXX, etc. */
...
}
}
if (_PyUnicodeWriter_WriteChar(&writer, '"') < 0) goto error;
return _PyUnicodeWriter_Finish(&writer);
error:
_PyUnicodeWriter_Dealloc(&writer);
return NULL;
}

Surrogate pairs (lone surrogates in the string) are handled by the ensure_ascii=False variant, which re-encodes them as \uD800-style escape sequences to produce valid JSON when ensure_ascii=True.

scanstring_unicode C decoder

scanstring_unicode decodes a single JSON string starting just after the opening ". It advances a character pointer and handles all six short escapes (\", \\, \/, \b, \f, \n, \r, \t) plus \uXXXX surrogate pairs inline.

// CPython: Modules/_json.c:750 scanstring_unicode
static PyObject *
scanstring_unicode(PyObject *pystr, Py_ssize_t end, int strict, Py_ssize_t *next_end_ptr)
{
_PyUnicodeWriter writer;
Py_ssize_t len = PyUnicode_GET_LENGTH(pystr);
int kind = PyUnicode_KIND(pystr);
const void *data = PyUnicode_DATA(pystr);

_PyUnicodeWriter_Init(&writer);

while (end < len) {
Py_UCS4 c = PyUnicode_READ(kind, data, end++);
if (c == '"') break; /* end of string */
if (c != '\\') {
/* fast path: emit c directly */
if (strict && c < 0x20) goto invalid_char;
if (_PyUnicodeWriter_WriteChar(&writer, c) < 0) goto error;
continue;
}
/* handle escape sequence */
c = PyUnicode_READ(kind, data, end++);
switch (c) {
case 'u': {
/* decode 4 hex digits, check for surrogate pair */
Py_UCS4 uni = scanstring_parse_hex4(...);
if (0xD800 <= uni && uni <= 0xDBFF) {
/* high surrogate: expect \uDC00-\uDFFF */
...
}
...
}
...
}
}
*next_end_ptr = end;
return _PyUnicodeWriter_Finish(&writer);
...
}

When strict=True, control characters below 0x20 raise JSONDecodeError. Lone surrogates are passed through as-is when ensure_ascii=False to preserve round-trip fidelity with Python strings that contain surrogates.

encode_obj dispatch and circular reference detection

encoder_listencode_obj is the recursive dispatch hub. It checks the object's type in order: None, True, False, int, float, str, list/tuple, dict. Unknown types call the encoder's default() method.

// CPython: Modules/_json.c:590 encoder_listencode_obj
static int
encoder_listencode_obj(PyEncoderObject *s, _PyUnicodeWriter *writer,
PyObject *obj, Py_ssize_t indent_level)
{
if (obj == Py_None) return encoder_encode_key_value(..., "null", 4);
if (obj == Py_True) return encoder_encode_key_value(..., "true", 4);
if (obj == Py_False) return encoder_encode_key_value(..., "false", 5);
if (PyLong_Check(obj)) return encoder_encode_long(s, writer, obj);
if (PyFloat_Check(obj)) return encoder_encode_float(s, writer, obj);
if (PyUnicode_Check(obj)) return encoder_encode_string(s, obj); /* appends to writer */
if (PyList_Check(obj) || PyTuple_Check(obj))
return encoder_listencode_list(s, writer, obj, indent_level);
if (PyDict_Check(obj))
return encoder_listencode_dict(s, writer, obj, indent_level);
/* call s->defaultfn(obj), recurse on result */
...
}

Circular reference detection uses _markers, a PyDict mapping id(container) to the container itself. encoder_listencode_list and encoder_listencode_dict both insert the container before recursing and delete it after.

// CPython: Modules/_json.c:450 encoder_listencode_dict (marker insert)
if (s->markers != Py_None) {
int has_key;
PyObject *id = PyLong_FromVoidPtr(dct);
has_key = PyDict_Contains(s->markers, id);
if (has_key) {
PyErr_SetString(PyExc_ValueError, "Circular reference detected");
Py_DECREF(id);
return -1;
}
if (PyDict_SetItem(s->markers, id, dct) < 0) { Py_DECREF(id); return -1; }
...
PyDict_DelItem(s->markers, id);
}

JSONDecoder._scanner is set to scan_once_unicode (wrapped as a Python callable) by JSONDecoder_new. The Python decode() method calls self._scanner(s, 0) directly, so the C fast path is taken on every decode without any Python-level loop overhead.

gopy notes

Status: not yet ported.

Planned package path: module/json/.

The encoder side maps cleanly onto a strings.Builder (or bytes.Buffer) passed through recursive calls. The escape_table becomes a [256]byte package-level variable. scanstring_unicode maps to a function that walks a []rune slice (or the raw string bytes after UTF-8 validation). The circular-reference _markers dict becomes a map[uintptr]struct{} keyed on reflect.ValueOf(v).Pointer().

The main integration point is encoder_listencode_obj's default() fallback: in gopy this dispatches through the objects/protocol.go call machinery, which must be wired up before the module can handle user-defined types.