Lib/re/ (part 3)
Source:
cpython 3.14 @ ab2d84fe1023/Lib/re/__init__.py
This annotation covers the replacement and splitting functions. See lib_re2_detail for re.compile, Pattern.match/search/fullmatch, and the _sre C module.
Map
| Lines | Symbol | Role |
|---|---|---|
| 1-80 | re.sub | Replace matches with a string or function |
| 81-160 | re.subn | Like sub but also returns the replacement count |
| 161-260 | re.split | Split string on pattern matches |
| 261-380 | re.findall | Return all non-overlapping matches as a list |
| 381-500 | re.finditer | Iterator of Match objects |
Reading
re.sub
# CPython: Lib/re/__init__.py:210 sub
def sub(pattern, repl, string, count=0, flags=0):
"""Return the string obtained by replacing the leftmost non-overlapping
occurrences of pattern in string by the replacement repl."""
return _compile(pattern, flags).sub(repl, string, count)
re.sub(r'\d+', '#', 'a1b2c3') returns 'a#b#c#'. When repl is a function, it receives each Match object and its return value is used as the replacement.
Pattern.sub implementation
# CPython: Lib/re/_compiler.py:280 (effective logic in _sre C module)
# Pattern.sub calls the C-level _sre.SRE_Pattern.sub:
# 1. Use finditer to iterate matches
# 2. Collect interleaved non-match spans and substitutions
# 3. join into a single string
#
# For function repls:
# result = repl(match) # call for each match
# For string repls:
# result = match.expand(repl) # handles \1, \g<name>, etc.
match.expand(template) substitutes group references: \1 or \g<1> for the first group, \g<name> for named groups. The C implementation in _sre does this substitution during the sub call.
re.split
# CPython: Lib/re/__init__.py:260 split
def split(pattern, string, maxsplit=0, flags=0):
"""Split the source string by the occurrences of the pattern."""
return _compile(pattern, flags).split(string, maxsplit)
re.split(r'\s+', 'a b c') returns ['a', 'b', 'c']. If the pattern has groups, the matched text is also included: re.split(r'(\s+)', 'a b') returns ['a', ' ', 'b']. maxsplit limits the number of splits.
re.findall
# CPython: Lib/re/__init__.py:240 findall
def findall(pattern, string, flags=0):
"""Return a list of all non-overlapping matches in the string."""
return _compile(pattern, flags).findall(string)
re.findall(r'\d+', 'a1b22c333') returns ['1', '22', '333']. With groups: re.findall(r'(\d+)-(\d+)', 'a1-2b3-4') returns [('1', '2'), ('3', '4')]. Single group: returns list of strings, not list of 1-tuples.
re.finditer
# CPython: Lib/re/__init__.py:248 finditer
def finditer(pattern, string, flags=0):
"""Return an iterator yielding Match objects over all non-overlapping matches."""
return _compile(pattern, flags).finditer(string)
finditer is memory-efficient: it yields Match objects one at a time. for m in re.finditer(r'\w+', text): process(m) avoids building the full list. Each Match object holds the matched string and group information.
gopy notes
re.sub is module/re.Sub in module/re/module.go. The replacement template expansion (\1, \g<name>) is module/re.ExpandTemplate. re.split is module/re.Split. re.findall collects all match results. re.finditer returns module/re.MatchIterator which wraps Go's regexp.FindAllStringSubmatchIndex.