Skip to main content

Modules/_sre.c

Source:

cpython 3.14 @ ab2d84fe1023/Modules/_sre.c

Modules/_sre.c implements the SRE (Simple Regular Expression) engine that backs Python's re module. re compiles patterns in Python (Lib/re/) into an SRE bytecode, and _sre.c executes that bytecode. The C engine handles matching; group capture, backtracking, and lookahead are all in this file.

Map

LinesSymbolRole
1-200SRE opcode definitions, SRE_STATEMatch state struct
201-800sre_match, sre_searchCore match and search entry points
801-1500SRE_MATCH bytecode VMPer-opcode dispatch loop
1501-2000Backtracking stackSRE_REPEAT, SRE_BRANCH, SRE_AT
2001-2800Group capture: SRE_GROUP, SRE_GROUPREFGroup recording and backreferences
2801-3500Pattern_match, Pattern_search, Pattern_findall, Pattern_subPython-visible methods
3501-4200Match object, match.group, match.span, match.groupsMatch result object

Reading

SRE state machine

The SRE_STATE struct holds the match context: the string buffer, current position, pattern program counter, group capture array, and backtracking stack.

// CPython: Modules/_sre.c:120 SRE_STATE
typedef struct SRE_STATE {
void* ptr; /* current position */
void* beginning;
void* end;
void* string;
Py_ssize_t string_offset;
Py_ssize_t pos;
Py_ssize_t endpos;
Py_ssize_t lastindex;
Py_ssize_t lastmark;
...
SRE_MARK mark[SRE_MARK_SIZE]; /* group capture array */
...
} SRE_STATE;

Core dispatch loop

The bytecode VM dispatches on opcode integers. Key opcodes:

OpcodeMeaning
SRE_OP_SUCCESSMatch succeeded
SRE_OP_FAILUREBacktrack
SRE_OP_LITERALMatch a single character
SRE_OP_NOT_LITERALMatch any char except
SRE_OP_ATAnchor (start, end, word boundary)
SRE_OP_CATEGORYCharacter class (\d, \w, \s)
SRE_OP_ANY. — match any non-newline
SRE_OP_BRANCHTry alternative (`
SRE_OP_REPEATGreedy/lazy repetition (*, +, ?)
SRE_OP_MARKRecord group start/end position
SRE_OP_GROUPREFBack-reference to a group
SRE_OP_SUBPATTERNNamed/numbered group

Group capture

Each (...) group occupies two mark slots: the start and end positions. After a successful match, match.group(n) slices the original string from mark[2*n] to mark[2*n+1].

Pattern_sub and replacement

re.sub(pattern, repl, string) calls Pattern_sub, which iterates over all matches and calls a C-level replacement function. If repl is a string, \1/\g<name> references are expanded; if it is a callable, it is called with the Match object.

gopy notes

Status: the re module is not yet ported. The SRE engine is complex enough that the recommended approach is to compile the existing _sre.c as a Go CGo module or to use Go's regexp/syntax package and translate the Python regex AST to a Go regexp. Go's regexp package uses RE2 semantics (no backtracking, no backreferences), which is not fully compatible with Python's re. A compatibility shim using github.com/dlclark/regexp2 is one option for Python-compatible semantics.