Lib/re/init.py
cpython 3.14 @ ab2d84fe1023/Lib/re/__init__.py
re/__init__.py is the pure-Python shell around CPython's _sre C extension. Every public function (match, search, sub, findall, etc.) compiles its pattern argument through the internal _compile() helper, which manages a two-tier cache. Compiled Pattern objects live in _cache (LRU, capacity 512) and in _cache2 (FIFO, capacity 256). Replacement template expansion for sub/subn is handled by a separate @lru_cache on _compile_template.
Map
| Lines | Symbol | Role |
|---|---|---|
| 142-156 | RegexFlag | IntFlag enum exposing ASCII, IGNORECASE, LOCALE, MULTILINE, DOTALL, VERBOSE, DEBUG, UNICODE, NOFLAG |
| 159 | PatternError / error | Public exception, aliased from _compiler.PatternError |
| 164-167 | match() | Delegates to _compile(pattern, flags).match(string) |
| 169-172 | fullmatch() | Delegates to _compile(pattern, flags).fullmatch(string) |
| 174-177 | search() | Delegates to _compile(pattern, flags).search(string) |
| 183-208 | sub() | Positional-arg deprecation shim, then delegates to Pattern.sub |
| 211-239 | subn() | Same shim as sub(), returns (new_string, count) |
| 241-268 | split() | Positional-arg deprecation shim, delegates to Pattern.split |
| 270-278 | findall() | Returns list of all non-overlapping matches |
| 280-285 | finditer() | Returns iterator of Match objects |
| 287-289 | compile() | Public wrapper for _compile() |
| 291-295 | purge() | Clears _cache, _cache2, and _compile_template lru_cache |
| 305-313 | escape() | Backslash-escapes every non-alphanumeric in the pattern string |
| 322-372 | _compile() | Two-tier LRU/FIFO cache lookup and pattern compilation |
| 374-377 | _compile_template() | @lru_cache(512) for replacement template objects |
| 391-428 | Scanner | Experimental lexical scanner built from a list of (phrase, action) pairs |
Reading
RegexFlag enum
RegexFlag uses the @enum._simple_enum decorator rather than the standard EnumMeta machinery. This shaves import time by skipping metaclass overhead. The @enum.global_enum decorator injects every member into the re module namespace directly, so re.IGNORECASE and re.I are both attributes of the module and members of the flag.
# CPython: Lib/re/__init__.py:142 RegexFlag
@enum.global_enum
@enum._simple_enum(enum.IntFlag, boundary=enum.KEEP)
class RegexFlag:
NOFLAG = 0
ASCII = A = _compiler.SRE_FLAG_ASCII
IGNORECASE = I = _compiler.SRE_FLAG_IGNORECASE
LOCALE = L = _compiler.SRE_FLAG_LOCALE
UNICODE = U = _compiler.SRE_FLAG_UNICODE
MULTILINE = M = _compiler.SRE_FLAG_MULTILINE
DOTALL = S = _compiler.SRE_FLAG_DOTALL
VERBOSE = X = _compiler.SRE_FLAG_VERBOSE
DEBUG = _compiler.SRE_FLAG_DEBUG
__str__ = object.__str__
_numeric_repr_ = hex
Two-tier pattern cache
_compile() checks the FIFO cache (_cache2, 256 slots) first for the cheapest possible hit. On a miss it falls back to the LRU dict (_cache, 512 slots). Python dicts preserve insertion order, so LRU eviction is done by deleting next(iter(_cache)), the oldest key. The FIFO cache mirrors every successful compilation, giving hot recently-used patterns a fast path that avoids the LRU pop/re-insert.
# CPython: Lib/re/__init__.py:330 _compile
def _compile(pattern, flags):
if isinstance(flags, RegexFlag):
flags = flags.value
try:
return _cache2[type(pattern), pattern, flags]
except KeyError:
pass
key = (type(pattern), pattern, flags)
p = _cache.pop(key, None)
if p is None:
if isinstance(pattern, Pattern):
if flags:
raise ValueError(
"cannot process flags argument with a compiled pattern")
return pattern
p = _compiler.compile(pattern, flags)
if flags & DEBUG:
return p
if len(_cache) >= _MAXCACHE:
try:
del _cache[next(iter(_cache))]
except (StopIteration, RuntimeError, KeyError):
pass
_cache[key] = p
if len(_cache2) >= _MAXCACHE2:
try:
del _cache2[next(iter(_cache2))]
except (StopIteration, RuntimeError, KeyError):
pass
_cache2[key] = p
return p
purge() and cache invalidation
purge() must clear all three caches in one call. The _compile_template function is decorated with @functools.lru_cache, so it exposes a .cache_clear() method. The pattern caches are plain dicts and are cleared with .clear().
# CPython: Lib/re/__init__.py:291 purge
def purge():
"Clear the regular expression caches"
_cache.clear()
_cache2.clear()
_compile_template.cache_clear()
sub() positional-argument deprecation shim
sub() accepts count and flags as keyword-only in the canonical signature but also as positional arguments for backward compatibility. It uses *args to capture any positional extras, then unpacks and warns via DeprecationWarning. The _ZeroSentinel type (a subclass of int) lets the function distinguish "caller passed 0 explicitly" from "caller omitted the argument."
# CPython: Lib/re/__init__.py:183 sub
def sub(pattern, repl, string, *args, count=_zero_sentinel, flags=_zero_sentinel):
if args:
if count is not _zero_sentinel:
raise TypeError("sub() got multiple values for argument 'count'")
count, *args = args
if args:
if flags is not _zero_sentinel:
raise TypeError("sub() got multiple values for argument 'flags'")
flags, *args = args
if args:
raise TypeError("sub() takes from 3 to 5 positional arguments "
"but %d were given" % (5 + len(args)))
import warnings
warnings.warn(
"'count' is passed as positional argument",
DeprecationWarning, stacklevel=2
)
return _compile(pattern, flags).sub(repl, string, count)
gopy notes
RegexFlagmaps cleanly to a Goconstblock backed byint. The@enum.global_enuminjection is replicated by exporting each constant from therepackage directly.- The two-tier cache should be implemented with a read-write mutex. The FIFO tier can be a fixed-size ring buffer; the LRU tier can use an
lru.Cachefrom a standard Go package or a hand-rolled doubly-linked-list map. _ZeroSentinelsentinel logic forsub/subn/splitneeds a Go analog. A*intoptional parameter or a dedicated sentinel value works._compile_templatewraps_sre.template, which is the C-level template compiler. In gopy that maps to the compiled-replacement path in thesreport.- The
Scannerclass is marked experimental upstream and can be deferred.