Skip to main content

Lib/glob.py

cpython 3.14 @ ab2d84fe1023/Lib/glob.py

glob provides glob() and iglob() for expanding shell-style wildcards against the real filesystem. Patterns follow POSIX shell rules: * matches any sequence of characters within a path component, ? matches exactly one character, and [seq] matches any character in the set. The special token ** when used as a complete path component matches zero or more directory levels recursively.

Internally the path is split into a directory part and a pattern part. If the pattern contains no magic characters the path is returned directly (after an existence check). If the directory part itself contains magic characters iglob recurses into itself. The _glob1 function uses os.scandir via _iterdir to list a directory and filters names with fnmatch.fnmatch.

translate(pattern) converts a glob pattern into a regular expression string suitable for re.compile. It is also used by pathlib._GlobWildcardSelector.

Map

LinesSymbolRolegopy
1-100glob, iglobPublic API; split magic from literal directory prefix; dispatch to _glob0 / _glob1 / _glob2.(stdlib pending)
100-200_glob0, _glob1, _iterdir_glob0 checks exact existence; _glob1 lists a directory and filters with fnmatch; _iterdir wraps os.scandir.(stdlib pending)
200-300_glob2, _rlistdirRecursive ** implementation: _rlistdir yields every subdirectory depth-first; _glob2 chains with _glob1.(stdlib pending)
300-350translate, escape, _has_magic, _ishiddentranslate converts a glob pattern to a regex; escape quotes magic chars; _has_magic tests via a compiled magic regex.(stdlib pending)

Reading

iglob recursive ** walk (lines 1 to 200)

cpython 3.14 @ ab2d84fe1023/Lib/glob.py#L1-200

def iglob(pathname, *, root_dir=None, dir_fd=None, recursive=False,
include_hidden=False):
sys.audit("glob.glob", pathname, recursive)
sys.audit("glob.glob/2", pathname, recursive, root_dir, dir_fd)
if not pathname:
return
dirname, basename = os.path.split(pathname)
if not has_magic(pathname):
# No magic: check existence and yield if present
if basename:
if os.path.lexists(pathname):
yield pathname
else:
if os.path.isdir(pathname):
yield pathname
return
if _isrecursive(basename):
yield from _glob2(dirname, basename, dironly, ...)
else:
dirs = iglob(dirname, ...) if has_magic(dirname) else [dirname]
glob_in_dir = _glob1 if has_magic(basename) else _glob0
for dirname in dirs:
for name in glob_in_dir(dirname, basename, ...):
yield os.path.join(dirname, name) if dirname else name

iglob works by splitting the pattern at the last os.sep. If the directory part also contains magic characters, it recurses so that /tmp/*/foo/*.py expands both wildcard segments correctly. The basename is then dispatched to one of three helpers: _glob0 for literal names (just an existence test), _glob1 for single-component patterns, and _glob2 for the recursive ** case. Because iglob is a generator, the entire tree walk is lazy. The root_dir and dir_fd parameters redirect the root of the search without changing the returned paths.

_glob1 magic vs literal split and translate (lines 100 to 350)

cpython 3.14 @ ab2d84fe1023/Lib/glob.py#L100-350

def _glob1(dirname, pattern, dironly, include_hidden=False):
names = list(_iterdir(dirname, dironly))
if not include_hidden and _ishidden(pattern):
names = [x for x in names if not _ishidden(x)]
return fnmatch.filter(names, pattern)


def _iterdir(dirname, dironly):
if not dirname:
dirname = os.curdir
try:
with os.scandir(dirname) as it:
for entry in it:
try:
if not dironly or entry.is_dir():
yield entry.name
except OSError:
pass
except OSError:
return


def translate(pat):
"""Translate a shell pattern to a regular expression."""
i, n = 0, len(pat)
res = ''
while i < n:
c = pat[i]
i += 1
if c == '*':
# compress consecutive stars
j = i
if j < n and pat[j] == '*':
res += '.*'
i = j + 1
else:
res += '[^/]*'
elif c == '?':
res += '[^/]'
elif c == '[':
# character class: copy verbatim until ']'
...
else:
res += re.escape(c)
return r'(?s:%s)\Z' % res

_iterdir wraps os.scandir and absorbs per-entry OSError exceptions (e.g. from broken symlinks or permission errors), which lets a single bad entry not abort the entire listing. fnmatch.filter applies the shell pattern using translate internally.

translate converts * to [^/]* (matching within one path component) and ** to .* (matching across separators). ? becomes [^/]. Character classes ([abc], [!abc]) are copied verbatim after adjusting ! to ^ for regex syntax. The result is wrapped in (?s:...)\Z so . matches newlines and the anchor is at the true end of string.

gopy mirror

glob depends on os.scandir, fnmatch.filter, and os.path utilities. All three are either already ported or in scope for the stdlib layer. The pure-Python translate function can be vendored directly. The sys.audit calls can be no-ops in gopy until the audit hook infrastructure is in place.