Modules/_io/textio.c
cpython 3.14 @ ab2d84fe1023/Modules/_io/textio.c
textio.c implements TextIOWrapper, the text-mode layer of Python's
I/O stack. A TextIOWrapper sits on top of a BufferedIOBase object and
adds three services: character encoding and decoding via an incremental
codec, newline translation (universal or strict mode), and a seekable tell
position encoded as a "cookie" integer that captures both the underlying
byte position and the incremental codec state.
The file also contains IncrementalNewlineDecoder, a helper type that
wraps any codec and applies newline normalization before handing decoded
text back to TextIOWrapper.
Key design points:
- The
pending_bytesbuffer accumulates small writes and flushes them in one codec call to avoid per-character encoding overhead. readlineoperates entirely on the decoded string produced by_textiowrapper_decode; it does not call the underlying binary layer one byte at a time.tellencodes five pieces of state into a single Pythonintcookie so thatseekcan reconstruct the exact codec state at any file position.- Newline modes:
Noneenables universal newlines (translate\r\nand\rto\non read);""enables universal newlines without translation;"\r","\n", or"\r\n"selects strict mode.
Map
| Lines | Symbol | Role | gopy |
|---|---|---|---|
| 1-400 | textiowrapper struct, IncrementalNewlineDecoder, _io_TextIOWrapper___init___impl | Type struct and constructor: stash the buffer, codec, and newline mode. | module/io/ |
| 400-900 | _textiowrapper_decode, _io_TextIOWrapper_read_impl, _io_TextIOWrapper_readline_impl | Read path: decode a chunk from the buffer, universal newline scanning, readline with look-ahead. | module/io/ |
| 900-1400 | _io_TextIOWrapper_write_impl, textiowrapper_flush_unlocked, textiowrapper_write_chunk | Write path: accumulate in pending_bytes, encode on flush or size threshold. | module/io/ |
| 1400-2000 | _io_TextIOWrapper_tell_impl, _io_TextIOWrapper_seek_impl, textiowrapper_parse_cookie, textiowrapper_build_cookie | Seek/tell: cookie integer encoding and codec-state snapshot/restore. | module/io/ |
| 2000-3200 | IncrementalNewlineDecoder type, nldecoder_decode, nldecoder_reset, property accessors, TextIOWrapper method table, PyInit__io glue | Newline decoder type and remaining TextIOWrapper methods (readable, writable, seekable, truncate, close, name, encoding). | module/io/ |
Reading
Tell cookie encoding (lines 1400 to 2000)
cpython 3.14 @ ab2d84fe1023/Modules/_io/textio.c#L1400-2000
tell must return a value that seek can later use to restore the stream
to the exact same state, including the incremental codec's internal
buffering. CPython encodes this as a single large integer whose bits carry
five fields:
/* Cookie layout (from textiowrapper_build_cookie):
*
* Field Bits Description
* start_pos 64 Byte offset in the underlying binary stream
* at the start of the last decoded chunk.
* dec_flags 32 Codec state flags from getstate().
* bytes_to_feed 32 Bytes fed to codec to reproduce the chunk.
* chars_to_skip 16 Characters from the chunk to skip over.
* need_eof 1 Whether EOF was passed to the codec.
*/
static PyObject *
textiowrapper_build_cookie(cookie_type *cookie)
{
unsigned char buffer[COOKIE_BUF_LEN];
PyObject *result;
memcpy(buffer, &cookie->start_pos, sizeof(cookie->start_pos));
memcpy(buffer + 8, &cookie->dec_flags, sizeof(cookie->dec_flags));
memcpy(buffer + 12, &cookie->bytes_to_feed, sizeof(cookie->bytes_to_feed));
memcpy(buffer + 16, &cookie->chars_to_skip, sizeof(cookie->chars_to_skip));
memcpy(buffer + 18, &cookie->need_eof, sizeof(cookie->need_eof));
result = _PyLong_FromByteArray(buffer, sizeof(buffer),
PY_LITTLE_ENDIAN, 0 /* unsigned */);
return result;
}
To seek back to a cookie position, _io_TextIOWrapper_seek_impl:
- Seeks the underlying binary stream to
start_pos. - Resets the codec and feeds it
bytes_to_feedbytes with thedec_flagsstate restored viasetstate. - Discards
chars_to_skipcharacters from the decoded output.
This restores the codec to the precise state it was in after tell was
called, so a subsequent read produces exactly the right characters even
when the codec's internal buffer was non-empty (as happens with UTF-16 on
an odd boundary, for example).
readline with newline translation (lines 400 to 900)
cpython 3.14 @ ab2d84fe1023/Modules/_io/textio.c#L400-900
readline does not call read(1) in a loop. Instead it reads chunks from
the underlying buffer, decodes them via _textiowrapper_decode, and then
scans the resulting Python str for the line terminator:
static PyObject *
_io_TextIOWrapper_readline_impl(textio *self, int limit)
{
PyObject *result = NULL;
...
for (;;) {
/* Drain self->decoded_chars first (leftover from a previous read). */
if (self->decoded_chars) {
Py_ssize_t n = _PyUnicode_FindChar(
self->decoded_chars, '\n',
self->decoded_chars_used, ..., 1);
if (n >= 0) {
/* Found a newline: slice and return. */
...
break;
}
/* No newline yet: append to result and keep reading. */
}
/* Read another chunk from the underlying buffer. */
input_chunk = PyObject_CallMethodObjArgs(
self->buffer, _PyIO_str_read1, size_obj, NULL);
...
decoded = _textiowrapper_decode(self, input_chunk, 0 /* not eof */);
...
}
return result;
}
When universal newlines are enabled (newlines=None), _textiowrapper_decode
routes the raw bytes through IncrementalNewlineDecoder.decode before
returning the string. That decoder normalizes \r\n and bare \r to \n
so that readline only ever needs to scan for \n.
In strict newline mode (e.g. newline="\r\n"), the translation step is
skipped and readline scans for the literal "\r\n" sequence.
gopy mirror
module/io/ (pending). The Go port wraps a BufferedReader Go value and
holds the incremental codec state as a Go struct matching the five
cookie_type fields. readline and read follow the same chunk-then-scan
loop. The cookie integer is encoded/decoded with encoding/binary using
little-endian byte order to match CPython's _PyLong_FromByteArray output.
CPython 3.14 changes
The newline="" mode (universal newlines without translation) has been
available since Python 3.0. The cookie format has not changed since 3.1.
Multi-phase module init (Py_mod_exec) was adopted for _io in 3.12.
TextIOWrapper gained a write_through constructor argument in 3.3, which
bypasses the pending_bytes accumulation on every write call.