`Modules/_json.c`

cpython 3.14 @ ab2d84fe1023/Modules/_json.c

_json.c is the C accelerator for json.encoder and json.decoder. The pure-Python module in Lib/json/ imports _json and replaces its slow Python implementations with the C versions when available.

The file has two main halves:

The scanner side: scanstring_str / scanstring_unicode, which parse a JSON string literal from a Python str object, handling all \uXXXX escapes including surrogate pairs.
The encoder side: py_encode_basestring, py_encode_basestring_ascii, encoder_listencode_obj, encoder_listencode_dict, and encoder_listencode_list, which recursively serialize Python objects to JSON text and accumulate chunks in a Python list.

Neither half touches files. Both operate on Python str objects (or list buffers for encoder output) and are called from the Python layer in Lib/json/decoder.py and Lib/json/encoder.py.

Map

Lines	Symbol	Role	gopy
1-300	`scanstring_unicode`, `scanstring_str`	JSON string scanner: fast chunk copy, backslash dispatch, `\uXXXX` decode, surrogate-pair join.	`module/json/`
300-700	`py_encode_basestring`, `py_encode_basestring_ascii`	Encode a Python `str` as a JSON string literal; ASCII mode escapes all code points above U+007F.	`module/json/`
700-1200	`encoder_listencode_obj`, `encoder_listencode_dict`, `encoder_listencode_list`	Recursive object encoder; appends string chunks to a `list` accumulator.	`module/json/`
1200-1500	`Scanner` type, `Encoder` type, `_jsonmodule`, `PyInit__json`	Python-level scanner and encoder objects, module definition, and entry point.	`module/json/`

Reading

`scanstring_unicode` inner loop (lines 1 to 300)

cpython 3.14 @ ab2d84fe1023/Modules/_json.c#L1-300

scanstring_unicode(pystr, end, strict) scans a JSON string starting at position end (the character after the opening "). It returns a (str, new_end) pair.

The fast path copies contiguous non-escape characters into a _PyUnicodeWriter. A \ causes a character-dispatch switch:

switch (c) {
case '"':  c = '"';  break;
case '\\': c = '\\'; break;
case '/':  c = '/';  break;
case 'b':  c = '\b'; break;
case 'f':  c = '\f'; break;
case 'n':  c = '\n'; break;
case 'r':  c = '\r'; break;
case 't':  c = '\t'; break;
case 'u':
    /* Read four hex digits. */
    c  = (Py_UCS4)digit[0] << 12;
    c |= (Py_UCS4)digit[1] << 8;
    c |= (Py_UCS4)digit[2] << 4;
    c |= (Py_UCS4)digit[3];
    if (Py_UNICODE_IS_HIGH_SURROGATE(c)) {
        /* Peek for a following \uDCxx low surrogate. */
        if (s[0] == '\\' && s[1] == 'u') {
            Py_UCS4 c2 = /* next \uXXXX */;
            if (Py_UNICODE_IS_LOW_SURROGATE(c2)) {
                c = Py_UNICODE_JOIN_SURROGATES(c, c2);
                s += 6;
            }
        }
    }
    break;
default:
    if (strict) {
        PyErr_Format(PyExc_JSONDecodeError,
                     "Invalid \\escape: line %d column %d (char %d)", ...);
        goto bail;
    }
    break;
}

When strict=True (the default), a bare surrogate that is not followed by a valid low surrogate raises JSONDecodeError. Control characters U+0000 through U+001F that appear literally without escaping also raise JSONDecodeError in strict mode.

`py_encode_basestring_ascii` (lines 300 to 700)

cpython 3.14 @ ab2d84fe1023/Modules/_json.c#L300-700

py_encode_basestring_ascii(s) is the ensure_ascii=True path. It scans the input str and emits \uXXXX for every code point above U+007F, and a \uXXXX\uYYYY surrogate pair for code points above U+FFFF:

static PyObject *
encoder_encode_string(PyEncoderObject *s, PyObject *obj)
{
    /* ... */
    if (s->ensure_ascii) {
        while (/* chars remain */) {
            Py_UCS4 c = PyUnicode_READ(kind, data, i);
            if (c >= 0x10000) {
                /* Supplementary plane: emit surrogate pair. */
                Py_UCS4 v = c - 0x10000;
                *p++ = '\\'; *p++ = 'u';
                emit_hex4(p, 0xD800 | (v >> 10));  p += 4;
                *p++ = '\\'; *p++ = 'u';
                emit_hex4(p, 0xDC00 | (v & 0x3FF)); p += 4;
            } else if (c >= 0x80) {
                /* BMP non-ASCII: emit \uXXXX. */
                *p++ = '\\'; *p++ = 'u';
                emit_hex4(p, c); p += 4;
            } else {
                /* ASCII: emit directly or as two-char escape. */
                *p++ = (char)c;
            }
        }
    }
}

Control characters U+0000 through U+001F are always escaped, even in non-ASCII mode. The structural characters " and \ are escaped as \" and \\ respectively.

`encoder_listencode_dict` recursive encoding (lines 700 to 1200)

cpython 3.14 @ ab2d84fe1023/Modules/_json.c#L700-1200

The dict encoder is the recursive core for JSON object serialization. It appends string chunks to a Python list accumulator (chunks) rather than building one large string. The caller joins the list with ''.join(chunks).

Keys must be str (or types coercible via a key callable). Non-string keys are silently skipped when skipkeys=True, otherwise they raise TypeError:

static int
encoder_listencode_dict(PyEncoderObject *s, _PyUnicodeWriter *writer,
                        PyObject *dct, Py_ssize_t indent_level)
{
    PyObject *it = PyObject_GetIter(dct); /* iterate over keys */
    while ((key = PyIter_Next(it)) != NULL) {
        if (!PyUnicode_Check(key)) {
            if (s->skipkeys) { Py_DECREF(key); continue; }
            PyErr_Format(PyExc_TypeError,
                         "keys must be strings, not %.100s",
                         Py_TYPE(key)->tp_name);
            goto bail;
        }
        if (encoder_encode_string(s, writer, key) < 0) goto bail;
        /* separator */
        value = PyObject_GetItem(dct, key);
        if (encoder_listencode_obj(s, writer, value, indent_level) < 0)
            goto bail;
        Py_DECREF(value);
        Py_DECREF(key);
    }
    ...
}

Circular reference detection uses a markers dict keyed on id(obj). Before encoding any container, its id is inserted; after encoding, the id is removed. A duplicate id raises ValueError: Circular reference detected.

NaN and Infinity float values are encoded as NaN, Infinity, and -Infinity when allow_nan=True (the default). When allow_nan=False they raise ValueError, because those literals are not valid JSON.

gopy mirror

module/json/ (pending). ScanString mirrors scanstring_unicode character-by-character with the same surrogate-pair join logic. EncodeString mirrors the ensure_ascii branch. EncodeDictChunks and EncodeListChunks mirror the list-accumulator pattern with the same skipkeys / sort_keys knobs. Circular-reference detection uses a Go map[uintptr]struct{} keyed on object pointer identity.

CPython 3.14 changes

_json.c has been largely stable since 3.1. The _PyUnicodeWriter fast path replaced the older PyObject_CallMethod approach in 3.3. The JSONDecodeError exception (a subclass of ValueError) was introduced in 3.5. Per-interpreter state cleanup arrived in 3.12. No structural changes between 3.12 and 3.14.

Map​

Reading​

scanstring_unicode inner loop (lines 1 to 300)​

py_encode_basestring_ascii (lines 300 to 700)​

encoder_listencode_dict recursive encoding (lines 700 to 1200)​

gopy mirror​

CPython 3.14 changes​

Map