Skip to main content

Lib/re/__init__.py

The top-level re package exposes the public API that most Python code touches. It holds the module-level pattern cache and delegates real work to _compiler and the C extension _sre.

Python 3.11 converted re from a single file into a package (Lib/re/__init__.py, _compiler.py, _parser.py, _constants.py). The public interface did not change.

Map

LinesSymbolRole
1–60module header, importspulls in _compiler, _parser, _constants, _sre
61–120flag constantsIGNORECASE, MULTILINE, DOTALL, VERBOSE, etc.
121–160_cachedict keyed by (type, pattern, flags, groups) deduplicating Pattern objects
161–220compilepublic entry point; checks cache then calls _compiler.compile
221–280match, search, fullmatchthin wrappers around compile(pattern).match/search/fullmatch
281–340findall, finditeriteration helpers
341–380sub, subn, splitmutation helpers
381–400error, escape, purgeexception re-export, escaping, cache flush

Reading

The pattern cache

The cache avoids recompiling the same pattern on every call site. The key encodes the pattern type (str vs bytes), the raw pattern, the flags, and the number of capture groups expected by the caller.

# CPython: Lib/re/__init__.py:121 _cache
_cache = {}
_MAXCACHE = 512

def _compile(pattern, flags):
try:
return _cache[type(pattern), pattern, flags]
except KeyError:
pass
p = _compiler.compile(pattern, flags)
if len(_cache) >= _MAXCACHE:
_cache.clear()
_cache[type(pattern), pattern, flags] = p
return p

compile and flag handling

re.compile is the canonical way to obtain a Pattern object. Flags can be passed as integers or combined with |; the compiler normalises them before the cache lookup.

# CPython: Lib/re/__init__.py:161 compile
def compile(pattern, flags=0):
"Compile a regular expression pattern, returning a Pattern object."
return _compile(pattern, flags)

Convenience wrappers

Every top-level function (match, search, findall, sub, split) is a one-liner that compiles the pattern then delegates to the resulting Pattern method. This keeps the fast path for already-cached patterns free of extra allocation.

# CPython: Lib/re/__init__.py:221 match
def match(pattern, string, flags=0):
"Try to apply the pattern at the start of the string."
return _compile(pattern, flags).match(string)

def search(pattern, string, flags=0):
"Scan through string looking for a match."
return _compile(pattern, flags).search(string)

Flag constants

The integer flag values mirror _sre constants so they can be passed straight through to the C engine.

# CPython: Lib/re/__init__.py:61 flag constants
IGNORECASE = I = RegexFlag.IGNORECASE
MULTILINE = M = RegexFlag.MULTILINE
DOTALL = S = RegexFlag.DOTALL
VERBOSE = X = RegexFlag.VERBOSE

gopy notes

  • _cache is a plain dict in CPython; gopy should mirror this with a mutex-guarded map keyed on a struct equivalent to (type, pattern, flags).
  • _MAXCACHE = 512 is the eviction threshold; CPython uses _cache.clear() (full flush), not LRU.
  • The RegexFlag enum wraps the raw integers for 3.6+; gopy only needs the integer values.
  • _sre.compile is the C boundary; gopy replaces it with the Go RE2/NFA backend.

CPython 3.14 changes

  • No API-breaking changes to __init__.py in 3.14; the package split introduced in 3.11 remains stable.
  • RegexFlag gained no new members between 3.13 and 3.14.
  • Cache eviction strategy (full clear at 512 entries) is unchanged.