Skip to main content

Modules/_sre.c (part 7)

Source:

cpython 3.14 @ ab2d84fe1023/Modules/_sre.c

This annotation covers the backtracking core of the SRE regex engine. See modules_re6_detail for re.compile, pattern opcodes, character class matching, and Match object methods.

Map

LinesSymbolRole
1-80SRE_match entryInitialize state, call SRE_MATCH
81-180MAX_REPEAT (greedy)Greedy quantifier: match as many as possible
181-280MIN_REPEAT (lazy)Lazy quantifier: match as few as possible
281-380SUBPATTERN / groupsCapture group saving
381-600Backtrack stackMark/unmark, backjump

Reading

SRE_match

// CPython: Modules/_sre.c:820 SRE_match
static Py_ssize_t
SRE_match(SRE_STATE *state, const SRE_CODE *pattern)
{
state->ptr = state->start;
do {
state->repeat = NULL;
Py_ssize_t res = SRE_MATCH(state, pattern);
if (res >= 0 || res == SRE_ERROR_FAILURE) break;
/* SRE_ERROR_RETRY: try next position */
state->start++;
} while (...);
return res;
}

SRE_match drives the inner SRE_MATCH loop. When the pattern doesn't match at the current position, it increments start and retries. re.search relies on this retry loop; re.match only tries position 0.

MAX_REPEAT (greedy)

// CPython: Modules/_sre.c:1040 MAX_REPEAT
case SRE_OP_MAX_REPEAT: {
SRE_REPEAT *rep = ...;
rep->count = 0;
/* Greedy: match as many times as possible, then try rest */
while (rep->count < max) {
Py_ssize_t res = SRE_MATCH(state, pattern);
if (res < 0) break;
rep->count++;
state->ptr = state->ptr_after_match;
}
/* Now try the rest of the pattern; backtrack count if needed */
while (rep->count >= min) {
Py_ssize_t res = SRE_MATCH(state, next_pattern);
if (res >= 0) return res;
rep->count--;
state->ptr = ...; /* step back one match */
}
return SRE_ERROR_FAILURE;
}

a*b with aaab first matches aaa greedily, then tries b. If b fails, it backs off to aa and retries ab, then a then b, etc. This is the classic NFA backtracking algorithm.

MIN_REPEAT (lazy)

// CPython: Modules/_sre.c:1120 MIN_REPEAT
case SRE_OP_MIN_REPEAT: {
SRE_REPEAT *rep = ...;
rep->count = 0;
/* Lazy: match minimum first, then try rest */
while (rep->count < min) {
SRE_MATCH(state, pattern);
rep->count++;
}
/* Try rest at min count; if fail, extend by one */
while (1) {
Py_ssize_t res = SRE_MATCH(state, next_pattern);
if (res >= 0) return res;
if (rep->count >= max) return SRE_ERROR_FAILURE;
SRE_MATCH(state, pattern);
rep->count++;
}
}

a*?b with aaab first tries to match b immediately (0 as). Failing that, matches one a and tries b again. Continues until b matches or max repetitions are exhausted.

SUBPATTERN (capture groups)

// CPython: Modules/_sre.c:920 SUBPATTERN
case SRE_OP_MARK: {
/* Save current position as group start/end */
int group_id = *pattern++;
state->mark[group_id] = state->ptr;
DISPATCH;
}

MARK opcodes surround each group's pattern. Two MARK opcodes per group (start and end). On backtracking, the mark positions are restored from the backtrack stack, so groups correctly reflect what the successful match captured.

gopy notes

The SRE engine is in module/re/sre.go in gopy. SRE_MATCH is sre.match. MAX_REPEAT is compiled to a Go loop with a deferred backtrack stack entry. Groups are stored in state.marks []int. MIN_REPEAT uses the same stack but tries the continuation first.