Skip to main content

Modules/_io/ (part 2)

Source:

cpython 3.14 @ ab2d84fe1023/Modules/_io/bufferedio.c

This annotation covers the buffered I/O layer. See modules_io_detail for FileIO, RawIOBase, IOBase.__init__, and open().

Map

LinesSymbolRole
1-120BufferedReader.readRead n bytes; fill from the buffer or raw stream
121-280BufferedReader.readlineRead up to \n using the internal buffer
281-450BufferedWriter.writeWrite bytes; flush when buffer full
451-620BufferedWriter.flushWrite the buffer to the raw stream
621-800BufferedRWPairCombine reader and writer for socket-like bidirectional I/O
801-1100TextIOWrapper.readlineDecode bytes, handle universal newlines, track line number
1101-1400TextIOWrapper.seek / tellPosition in a text file with encoding-aware accounting

Reading

BufferedReader.read

// CPython: Modules/_io/bufferedio.c:680 _bufferedreader_read_generic
static PyObject *
_bufferedreader_read_generic(buffered *self, Py_ssize_t n)
{
/* Case 1: all data is in the buffer */
if (self->readable_pos + n <= self->read_end) {
PyObject *res = PyBytes_FromStringAndSize(
self->buffer + self->readable_pos, n);
self->readable_pos += n;
return res;
}
/* Case 2: partial data in buffer + need more from raw */
Py_ssize_t current_size = self->read_end - self->readable_pos;
PyObject *chunks[2];
chunks[0] = PyBytes_FromStringAndSize(self->buffer + self->readable_pos,
current_size);
/* Refill buffer from raw */
_bufferedreader_fill_buffer(self);
Py_ssize_t remaining = n - current_size;
chunks[1] = PyBytes_FromStringAndSize(self->buffer, remaining);
self->readable_pos = remaining;
return PyBytes_Join(chunks[0], chunks[1]);
}

The buffer size default is 8192 bytes. read(-1) reads until EOF, accumulating chunks.

BufferedWriter.write

// CPython: Modules/_io/bufferedio.c:1080 _bufferedwriter_write
static PyObject *
_bufferedwriter_write(buffered *self, PyObject *args)
{
Py_buffer data;
PyArg_ParseTuple(args, "y*", &data);
Py_ssize_t written = 0;
if (self->write_pos + data.len <= self->buffer_size) {
/* Fast path: fits in buffer */
memcpy(self->buffer + self->write_pos, data.buf, data.len);
self->write_pos += data.len;
written = data.len;
} else {
/* Flush, then write (or write directly for large data) */
_bufferedwriter_flush_unlocked(self);
if (data.len > self->buffer_size) {
written = self->raw->tp_as_buffer->...write(data);
} else {
memcpy(self->buffer, data.buf, data.len);
self->write_pos = data.len;
written = data.len;
}
}
PyBuffer_Release(&data);
return PyLong_FromSsize_t(written);
}

Large writes bypass the buffer and go directly to the raw stream. The buffer is flushed first if it contains pending data.

TextIOWrapper.readline

// CPython: Modules/_io/textio.c:1380 _textiowrapper_readline
/* Read a line, decoding bytes chunk by chunk:
1. Read chunk from buffer
2. Decode with self->decoder (a codec IncrementalDecoder)
3. Search decoded string for newline
4. Handle CR/LF/CRLF via universal newlines
5. Return line including the newline character
*/

readline() is the bottleneck for line-oriented text protocols. The decoder is called incrementally; multi-byte encodings may split a character across chunk boundaries.

TextIOWrapper.tell

// CPython: Modules/_io/textio.c:1680 textiowrapper_tell
/* Return a "cookie" that encodes:
- raw stream position (bytes)
- decoder state (for encodings like UTF-16 with BOM)
- number of chars decoded from partial chunk
- pending CR (for CRLF mode)
The cookie can be passed back to seek() to resume exactly. */

Text mode tell() returns an opaque integer encoding both the byte offset and the decoder state. This allows seeking back to exact positions even in variable-width encodings.

gopy notes

BufferedReader.read is module/io.BufferedReader.Read in module/io/bufferedreader.go. BufferedWriter.write is module/io.BufferedWriter.Write. TextIOWrapper.readline uses Go's bufio.Reader.ReadLine. TextIOWrapper.tell encodes the cookie using the same bit-packing as CPython.