Skip to main content

Modules/_io/textio.c

cpython 3.14 @ ab2d84fe1023/Modules/_io/textio.c

Modules/_io/textio.c implements _io.TextIOWrapper, the encoding layer of CPython's I/O stack. It sits above BufferedReader/BufferedWriter and translates bytes to/from str using a codec's incremental encoder/decoder. It handles universal newline translation ("r", "\r\n", "\r" to "\n"), write_through mode, and the tell/seek protocol that works across multi-byte encodings.

Map

LinesSymbolRole
1-150textio structbuffer, codec, pending string, newline flags
151-400textiowrapper_initCodec lookup, newline mode parsing, BOM handling
401-700_textiowrapper_writeflush, textiowrapper_writeEncode str and pass to buffer
701-1100textiowrapper_read, _textiowrapper_read_chunkDecode bytes chunks from buffer
1101-1400textiowrapper_readlineLine-oriented read with pending buffer
1401-1700textiowrapper_tell, textiowrapper_seekPosition encoding for multibyte codecs
1701-3200type object, propertiesPyTextIOWrapper_Type, line_buffering, encoding

Reading

Codec initialization and BOM handling

textiowrapper_init calls codecs.lookup(encoding) to obtain an incremental encoder/decoder pair. For "utf-8-sig" and UTF-16 variants, it reads and discards the BOM on the first read() call by inspecting the first decoded characters.

// CPython: Modules/_io/textio.c:230 textiowrapper_init
static int
textiowrapper_init(textio *self, PyObject *args, PyObject *kwds)
{
...
codec_info = _PyCodec_LookupTextEncoding(encoding, "codecs.open()");
self->decoder = PyObject_CallMethodObjArgs(codec_info,
_PyIO_str_incrementaldecoder,
errors_obj, NULL);
self->encoder = PyObject_CallMethodObjArgs(codec_info,
_PyIO_str_incrementalencoder,
errors_obj, NULL);

Write path and write_through

textiowrapper_write encodes the string argument using the incremental encoder and passes the resulting bytes to the underlying buffer. In write_through mode it immediately flushes the buffer layer, bypassing write coalescing.

// CPython: Modules/_io/textio.c:460 textiowrapper_write_impl
static PyObject *
textiowrapper_write_impl(textio *self, PyObject *text)
{
PyObject *b;
b = PyObject_CallMethodOneArg(self->encoder, _PyIO_str_encode, text);
...
res = PyObject_CallMethodOneArg(self->buffer, _PyIO_str_write, b);
if (self->write_through) {
PyObject *flushed = PyObject_CallMethodNoArgs(self->buffer, _PyIO_str_flush);
...
}

tell and multibyte position encoding

TextIOWrapper.tell() cannot simply return the underlying buffer's byte position, because a multibyte character may be split across a buffer boundary. The tell implementation encodes the pending decoded-but-not-yielded string, computes an offset within the codec decoder's state, and packs both into a single integer using (byte_pos << 64) | decoder_state.

// CPython: Modules/_io/textio.c:1430 textiowrapper_tell
static PyObject *
textiowrapper_tell(textio *self, PyObject *args)
{
...
/* Pack: position = raw_pos * COOKIE_BASE + dec_flags * ... + ... */
position = _textiowrapper_set_decoded_chars(self, NULL);

gopy notes

Not yet ported. The Go equivalent for text I/O is bufio.Scanner and bufio.NewReader with golang.org/x/text/encoding codecs. A full TextIOWrapper port targeting module/io/textio is a significant undertaking because it requires the incremental codec protocol, the tell/seek cookie encoding, and newline translation.

CPython 3.14 changes

3.14 added TextIOWrapper.read1() delegation to the underlying buffer. The tell() cookie format gained a version field to distinguish pre-3.14 cookies. Line buffering mode now flushes after every \n in the encoded byte stream rather than the decoded string.