Skip to main content

Modules/_sre.c

Source:

cpython 3.14 @ ab2d84fe1023/Modules/_sre.c

Modules/_sre.c is the C engine that executes compiled regular expressions. The re module's _compiler.py translates Python regex patterns into SRE opcode bytecode; this file interprets that bytecode using a backtracking NFA. The main dispatch loop is in sre_match.

Map

LinesSymbolRole
1-200SRE_STATE, SRE_PATTERNMatch state struct; compiled pattern object
201-500sre_matchMain recursive backtracking match loop
501-900sre_searchAnchoring scan over the string
901-1300opcodesSRE_OP_LITERAL, SRE_OP_ANY, SRE_OP_BRANCH, SRE_OP_REPEAT dispatch
1301-1800group capture, backreferencesSRE_OP_MARK, group extraction
1801-2500Pattern_match, Pattern_search, Pattern_findallPython-visible methods
2501-3000Match objectgroup(), span(), groupdict(), expand()

Reading

SRE_STATE and mark array

SRE_STATE holds the input string bounds, the current position, a stack of backtrack points, and a mark[] array. Each group occupies two consecutive mark entries (start and end position). SRE_OP_MARK n sets state->mark[n] = ptr.

// Modules/_sre.c:1 SRE_STATE
typedef struct {
void *beginning, *start, *end; /* string bounds */
Py_ssize_t *mark; /* group captures */
SRE_REPEAT *repeat; /* current repeat context */
Py_ssize_t lastmark, lastindex;
...
} SRE_STATE;

Backtracking dispatch

sre_match is a large switch loop over opcodes. Branches (SRE_OP_BRANCH) push backtrack entries; on failure the engine pops the stack and tries the next alternative. SRE_OP_REPEAT and SRE_OP_MAX_REPEAT handle greedy and non-greedy quantifiers.

// Modules/_sre.c:201 sre_match (excerpt)
static Py_ssize_t
sre_match(SRE_STATE *state, SRE_CODE *pattern)
{
for (;;) {
switch (*pattern++) {
case SRE_OP_LITERAL:
if (ptr >= end || *ptr != pattern[0]) goto failure;
ptr++; pattern++;
break;
case SRE_OP_BRANCH:
/* push backtrack; try first branch */
...
}
}
failure:
/* pop backtrack stack */
...
}

Match object group extraction

Match.group(index) reads state.mark[2*index] and state.mark[2*index + 1] to get the start and end positions of the captured group, then slices the original string.

// Modules/_sre.c:2501 match_getslice
static PyObject *
match_getslice(MatchObject *self, PyObject *index, PyObject *def)
{
Py_ssize_t i = match_getindex(self, index);
if (self->mark[i] < 0) return Py_NewRef(def);
return PySequence_GetSlice(self->string,
self->mark[i], self->mark[i+1]);
}

gopy notes

Not yet ported. The planned package path is module/re/. A faithful port of the SRE opcode interpreter in Go would require the same backtracking loop and mark array. Alternatively, Go's regexp (RE2) package could be used with documented differences in supported syntax.