Skip to main content

Modules/_csv.c

cpython 3.14 @ ab2d84fe1023/Modules/_csv.c

The C implementation of the csv module. Lib/csv.py imports _csv (this module) and re-exports reader, writer, Dialect, and the QUOTE_* constants as the public csv API.

Three facilities are implemented here:

  • Dialect — a configuration object holding delimiter, quotechar, escapechar, and quoting mode. Both reader and writer accept a dialect instance or keyword arguments that are merged into a dialect.
  • reader — an iterator that parses rows from a line-iterable using a character-level state machine. Each __next__ call advances through one CSV row and returns a list of field strings.
  • writer — a callable whose writerow method escapes a sequence of fields and writes the formatted row to a file-like object.

Map

LinesSymbolRolegopy
1-80QuoteStyle, QUOTE_MINIMAL, QUOTE_ALL, QUOTE_NONNUMERIC, QUOTE_NONEQuoting-mode enum and constants.module/csv/module.go:QuoteStyle
80-350DialectObj, dialect_new, dialect_check_*Dialect type: delimiter, quotechar, escapechar, doublequote, skipinitialspace, lineterminator, quoting.module/csv/module.go:Dialect
350-550ReaderObj, Reader_iternext (a.k.a. csv_iternext)reader.__next__: state-machine parser producing one row per call.module/csv/module.go:Reader
550-800WriterObj, csv_writerow, csv_writerowswriter.writerow: escapes fields and writes to the output file.module/csv/module.go:Writer
800-950csv_reader, csv_writerFactory functions: validate dialect, allocate reader/writer objects.module/csv/module.go:NewReader
950-1100_csv_module_execModule init: register Dialect, reader, writer types; define QUOTE_* constants.module/csv/module.go:Module
1100-1200_csvmodule, PyInit__csvModule definition struct and Python entry point.module/csv/module.go:Module

Reading

Dialect field defaults (lines 80 to 350)

cpython 3.14 @ ab2d84fe1023/Modules/_csv.c#L80-350

DialectObj holds eight fields. Each has a hard-coded default:

typedef struct {
PyObject_HEAD
int doublequote; /* default: 1 */
int skipinitialspace; /* default: 0 */
int strict; /* default: 0 */
Py_UCS4 delimiter; /* default: ',' */
Py_UCS4 quotechar; /* default: '"' */
Py_UCS4 escapechar; /* default: 0 (none) */
int quoting; /* default: QUOTE_MINIMAL */
PyObject *lineterminator; /* default: "\r\n" */
} DialectObj;

dialect_new merges three sources in priority order: an explicit dialect positional argument, then keyword arguments, then the built-in defaults. dialect_check_quoting validates that quoting is one of the four legal QUOTE_* values and raises csv.Error otherwise.

When quoting == QUOTE_NONE and no escapechar is set, dialect_new raises csv.Error immediately because unescaped quotes would make the output unparseable.

csv_iternext state machine (lines 350 to 550)

cpython 3.14 @ ab2d84fe1023/Modules/_csv.c#L350-550

reader.__next__ processes one CSV row character by character. The parser lives in a while loop over the input line; a state variable drives transitions. States:

StateMeaning
START_FIELDBeginning of a new field; no characters consumed yet.
IN_FIELDInside an unquoted field.
IN_QUOTED_FIELDInside a quoted field (between opening and closing quotechar).
ESCAPED_CHARThe previous character was escapechar in an unquoted field.
ESCAPE_IN_QUOTED_FIELDThe previous character was escapechar inside a quoted field.
QUOTE_IN_FIELDJust saw a quotechar inside a quoted field (possible end or doublequote).
static int
parse_process_char(ReaderObj *self, _csvstate *module_state,
Py_UCS4 c, PyObject **field)
{
switch (self->state) {
case START_FIELD:
if (c == '\n' || c == '\r' || c == 0) {
/* empty field at end of row */
if (parse_save_field(self) < 0) return -1;
self->state = EAT_CRNL;
} else if (c == dialect->quotechar) {
self->state = IN_QUOTED_FIELD;
} else if (c == dialect->escapechar) {
self->state = ESCAPED_CHAR;
} else {
if (parse_add_char(self, c) < 0) return -1;
self->state = IN_FIELD;
}
break;

case IN_QUOTED_FIELD:
if (c == dialect->escapechar) {
self->state = ESCAPE_IN_QUOTED_FIELD;
} else if (c == dialect->quotechar) {
if (dialect->doublequote)
self->state = QUOTE_IN_FIELD;
else
self->state = IN_FIELD;
} else {
if (parse_add_char(self, c) < 0) return -1;
}
break;
/* ... other states ... */
}
return 0;
}

Each call to Reader_iternext fetches one line from the underlying iterator, feeds it character by character to parse_process_char, and then calls parse_save_field to append the accumulated field to the output list. Multi-line fields (a newline inside a quoted field) are handled by continuing across __next__ calls until the closing quotechar is found.

csv_writerow quoting logic (lines 550 to 800)

cpython 3.14 @ ab2d84fe1023/Modules/_csv.c#L550-800

writer.writerow(row) iterates over the fields in row, decides for each field whether quoting is needed, assembles a string, and writes it to the output file:

static int
csv_join_append(WriterObj *self, PyObject *field, int quoted)
{
Py_UCS4 c;
/* ... iterate over characters in field ... */
for (i = 0; i < field_len; i++) {
c = PyUnicode_READ(kind, data, i);
if (!quoted && (c == dialect->delimiter ||
c == dialect->escapechar ||
c == '\n' || c == '\r')) {
/* Must quote or escape. */
if (dialect->quoting == QUOTE_NONE) {
if (dialect->escapechar == 0) {
PyErr_Format(..., "need to escape, but no escapechar set");
return -1;
}
csv_join_append_data(self, dialect->escapechar);
} else {
quoted = 1; /* will wrap with quotechar at the end */
}
}
/* Handle embedded quotechar via doublequote or escapechar. */
if (c == dialect->quotechar) {
if (dialect->doublequote)
csv_join_append_data(self, dialect->quotechar);
else if (dialect->escapechar)
csv_join_append_data(self, dialect->escapechar);
else { /* error */ }
}
csv_join_append_data(self, c);
}
if (quoted) {
/* Wrap field with quotechar. */
...
}
}

QUOTE_ALL sets quoted = 1 unconditionally before the loop. QUOTE_NONNUMERIC sets quoted = 1 for any field that is not a Python int or float. QUOTE_MINIMAL (the default) only quotes when a special character is encountered. The final assembled row is written to the output object via PyObject_CallMethodOneArg(self->writeline, &_Py_ID(write), str).

gopy mirror

module/csv/module.go. Dialect is a Go struct with the same eight fields. Reader wraps a Go string iterator and reproduces the six-state machine. Writer buffers field output in a strings.Builder and follows the same quoting precedence. csv.Error maps to a gopy exception type.

CPython 3.14 changes

The csv module has been in C since Python 2.3. QUOTE_NOTNULL (skip quoting None as an empty string) was added in 3.12. QUOTE_STRINGS (always quote string fields) was added in 3.12. Multi-phase module init was adopted in 3.12. The strict dialect field (raise on bad input) has been present since the module's introduction.