Skip to main content

Lib/string.py

cpython 3.14 @ ab2d84fe1023/Lib/string.py

Lib/string.py is a pure-Python module. It has no C accelerator. Its contents fall into three groups.

The first group is a set of named character-set constants: ascii_letters, ascii_lowercase, ascii_uppercase, digits, hexdigits, octdigits, punctuation, whitespace, and printable. These are plain strings defined at import time.

The second group is Formatter, the class that backs str.format() when called through the Python-level API. CPython's built-in str.format calls the C implementation directly and never touches Formatter, but Formatter provides a fully customizable re-implementation of the same pipeline via vformat and parse.

The third group is Template, which provides $-style substitution with a user-overridable DELIMITER and IDPATTERN regex. It is distinct from str.format in that it does not support attribute access, indexing, or format specs.

Map

LinesSymbolRolegopy
1-60capwords, character-set constantscapwords(s, sep=None) splits on sep, calls str.capitalize on each word, and rejoins; constants are plain string literals assigned at module level.(stdlib pending)
61-130Formatter.__init__, Formatter.format, Formatter.vformatformat packs positional and keyword args; vformat calls _vformat with a recursion depth counter capped at 2.(stdlib pending)
130-220Formatter._vformatIterates parse() output; resolves field names via get_field; converts via convert_field; formats via format_field; recurses for nested braces.(stdlib pending)
220-300Formatter.parse, Formatter.get_field, Formatter.get_value, Formatter.convert_field, Formatter.format_fieldparse is a generator over (literal_text, field_name, format_spec, conversion) tuples from _string.formatter_field_name_split; convert_field dispatches !r, !s, !a.(stdlib pending)
300-380Formatter.check_unused_argsHook called after _vformat completes; base implementation is a no-op; subclasses can raise if any positional or keyword argument was not referenced.(stdlib pending)
380-450Template, Template.substitute, Template.safe_substitutesubstitute raises KeyError on missing keys; safe_substitute leaves unrecognized patterns intact; both use re.sub with _substitute as the replacement function.(stdlib pending)

Reading

Formatter._vformat recursion (lines 130 to 220)

cpython 3.14 @ ab2d84fe1023/Lib/string.py#L130-220

def _vformat(self, format_string, args, kwargs, used_args,
recursion_depth, auto_arg_index=0):
if recursion_depth < 0:
raise ValueError('Max string formatting recursion exceeded')
result = []
for literal_text, field_name, format_spec, conversion in \
self.parse(format_string):
if literal_text:
result.append(literal_text)
if field_name is not None:
obj, arg_used = self.get_field(field_name, args, kwargs)
used_args.add(arg_used)
obj = self.convert_field(obj, conversion)
# expand the format spec, possibly recursing
format_spec, auto_arg_index = self._vformat(
format_spec, args, kwargs,
used_args, recursion_depth-1,
auto_arg_index=auto_arg_index)
result.append(self.format_field(obj, format_spec))
return ''.join(result), auto_arg_index

_vformat is recursive because format specs can themselves contain format fields, e.g. '{:{width}}'.format(value, width=10). Each recursive call decrements recursion_depth; the initial call passes 2, which allows one level of nested braces in the format spec. auto_arg_index threads through the recursion so that automatically numbered fields ({}) are assigned indices in left-to-right document order even across recursive calls. used_args accumulates every field name or index that was looked up, so check_unused_args can find omissions afterwards.

Template.substitute regex (lines 380 to 450)

cpython 3.14 @ ab2d84fe1023/Lib/string.py#L380-450

class Template:
delimiter = '$'
idpattern = r'(?a:[_a-z][_a-z0-9]*)'
braceidpattern = None
flags = _re.IGNORECASE

def __init_subclass__(cls, /, **kwargs):
super().__init_subclass__(**kwargs)
if 'pattern' not in cls.__dict__:
cls._pattern = cls._compile_pattern(cls.pattern)

pattern = r"""
\$(?:
(?P<escaped>\$) | # $$ -> $
(?P<named>%(id)s) | # $identifier
{(?P<braced>%(bid)s)} | # ${identifier}
(?P<invalid>) # ill-formed $ is invalid
)
"""

def substitute(self, mapping={}, /, **kws):
if mapping:
mapping = ChainMap(kws, mapping)
else:
mapping = kws
def convert(mo):
named = mo.group('named') or mo.group('braced')
if named is not None:
return str(mapping[named])
if mo.group('escaped') is not None:
return self.delimiter
if mo.group('invalid') is not None:
self._invalid(mo)
raise ValueError('Unrecognized named group in pattern',
self.pattern)
return self.pattern.sub(convert, self.template)

The pattern regex has four named groups, tried left to right. escaped matches $$ and returns a literal $. named matches a bare $identifier. braced matches ${identifier}. invalid matches a lone $ not followed by any of the above, which causes _invalid to raise ValueError with position information. safe_substitute differs only in that _invalid and missing keys both leave the original text unchanged instead of raising.

Subclasses can change the substitution syntax by overriding delimiter, idpattern, braceidpattern, or pattern. __init_subclass__ recompiles _pattern automatically whenever pattern is not explicitly set on the subclass, so overriding idpattern alone is sufficient for the common case.

parse() field iteration (lines 220 to 300)

cpython 3.14 @ ab2d84fe1023/Lib/string.py#L220-300

def parse(self, format_string):
for literal_text, field_name, format_spec, conversion in \
_string.formatter_field_name_split(format_string):
yield literal_text, field_name, format_spec, conversion

parse is a generator that delegates to the C function _string.formatter_field_name_split. Each yielded tuple carries: literal_text (the verbatim text before the next {...} block, possibly empty), field_name (the raw field name string, or None if there is no replacement field), format_spec (everything after : inside the braces), and conversion (the single character after !, or None). Subclasses can override parse to accept an entirely different format-string syntax as long as they yield the same 4-tuple shape.

gopy mirror

Formatter depends on _string.formatter_field_name_split (a C helper in Modules/_string.c) for efficient format-string tokenization. Template depends only on re and collections.ChainMap. The constants group has no dependencies. A gopy port can ship the constants and Template as pure Go, then add Formatter once _string is available.