Skip to main content

Lib/re/ (part 3)

Source:

cpython 3.14 @ ab2d84fe1023/Lib/re/__init__.py

This annotation covers the replacement and splitting functions. See lib_re2_detail for re.compile, Pattern.match/search/fullmatch, and the _sre C module.

Map

LinesSymbolRole
1-80re.subReplace matches with a string or function
81-160re.subnLike sub but also returns the replacement count
161-260re.splitSplit string on pattern matches
261-380re.findallReturn all non-overlapping matches as a list
381-500re.finditerIterator of Match objects

Reading

re.sub

# CPython: Lib/re/__init__.py:210 sub
def sub(pattern, repl, string, count=0, flags=0):
"""Return the string obtained by replacing the leftmost non-overlapping
occurrences of pattern in string by the replacement repl."""
return _compile(pattern, flags).sub(repl, string, count)

re.sub(r'\d+', '#', 'a1b2c3') returns 'a#b#c#'. When repl is a function, it receives each Match object and its return value is used as the replacement.

Pattern.sub implementation

# CPython: Lib/re/_compiler.py:280 (effective logic in _sre C module)
# Pattern.sub calls the C-level _sre.SRE_Pattern.sub:
# 1. Use finditer to iterate matches
# 2. Collect interleaved non-match spans and substitutions
# 3. join into a single string
#
# For function repls:
# result = repl(match) # call for each match
# For string repls:
# result = match.expand(repl) # handles \1, \g<name>, etc.

match.expand(template) substitutes group references: \1 or \g<1> for the first group, \g<name> for named groups. The C implementation in _sre does this substitution during the sub call.

re.split

# CPython: Lib/re/__init__.py:260 split
def split(pattern, string, maxsplit=0, flags=0):
"""Split the source string by the occurrences of the pattern."""
return _compile(pattern, flags).split(string, maxsplit)

re.split(r'\s+', 'a b c') returns ['a', 'b', 'c']. If the pattern has groups, the matched text is also included: re.split(r'(\s+)', 'a b') returns ['a', ' ', 'b']. maxsplit limits the number of splits.

re.findall

# CPython: Lib/re/__init__.py:240 findall
def findall(pattern, string, flags=0):
"""Return a list of all non-overlapping matches in the string."""
return _compile(pattern, flags).findall(string)

re.findall(r'\d+', 'a1b22c333') returns ['1', '22', '333']. With groups: re.findall(r'(\d+)-(\d+)', 'a1-2b3-4') returns [('1', '2'), ('3', '4')]. Single group: returns list of strings, not list of 1-tuples.

re.finditer

# CPython: Lib/re/__init__.py:248 finditer
def finditer(pattern, string, flags=0):
"""Return an iterator yielding Match objects over all non-overlapping matches."""
return _compile(pattern, flags).finditer(string)

finditer is memory-efficient: it yields Match objects one at a time. for m in re.finditer(r'\w+', text): process(m) avoids building the full list. Each Match object holds the matched string and group information.

gopy notes

re.sub is module/re.Sub in module/re/module.go. The replacement template expansion (\1, \g<name>) is module/re.ExpandTemplate. re.split is module/re.Split. re.findall collects all match results. re.finditer returns module/re.MatchIterator which wraps Go's regexp.FindAllStringSubmatchIndex.