Lib/importlib/_bootstrap_external.py
cpython 3.14 @ ab2d84fe1023/Lib/importlib/_bootstrap_external.py
_bootstrap_external.py is the second frozen bootstrap file. It is loaded immediately after _bootstrap.py and extends the import system with everything needed to load modules from the file system. Where _bootstrap.py defines the abstract machinery (meta-path protocol, ModuleSpec, per-module locks), _bootstrap_external.py provides the concrete implementations: PathFinder (the last entry on the default sys.meta_path), FileFinder (the default sys.path_hooks entry), SourceFileLoader (reads .py, compiles, caches .pyc), and SourcelessFileLoader (reads pre-compiled .pyc only).
The file is also responsible for the .pyc cache invalidation scheme. By default CPython invalidates .pyc files by comparing the source file's mtime and size stored in the cache header against the current file metadata (timestamp-based validation). CPython 3.8 added hash-based validation, where the source file's hash is embedded in the .pyc header and checked on each load.
Like _bootstrap.py, this module is frozen into the interpreter via Python/frozen.c. It cannot import anything from the standard library before it has finished installing itself, so the code avoids module-level imports of anything that might not be available.
Map
| Lines | Symbol | Role | gopy |
|---|---|---|---|
| 1-80 | module docstring, platform constants, _path_* helpers | os.sep, os.path.join equivalents for bootstrap use | n/a |
| 81-200 | _path_stat, _path_isfile, _path_isabs, _relax_case | Low-level FS utilities that avoid importing os | Not yet ported |
| 201-380 | _get_supported_file_loaders, _classify_pyc, validate_bytecode_header, _validate_hash_pyc | .pyc header reading and validation | Not yet ported |
| 381-520 | _compile_bytecode, _code_to_timestamp_pyc, _code_to_hash_pyc, cache_from_source | Compilation and .pyc writing | Not yet ported |
| 521-700 | FileLoader, SourceLoader | Base classes: get_data, get_filename, path_stats, set_data | Not yet ported |
| 701-850 | SourceFileLoader | Concrete loader: read .py, compile, write .pyc, exec | Not yet ported |
| 851-950 | SourcelessFileLoader | Load pre-compiled .pyc only | Not yet ported |
| 951-1150 | FileFinder | sys.path_hooks entry: scan a directory for module files | Not yet ported |
| 1151-1350 | PathFinder | sys.meta_path entry: delegate to FileFinder via sys.path | Not yet ported |
| 1351-1500 | _setup, _install, _fix_up_module | Bootstrap wiring called by _frozen_importlib._bootstrap._setup | Not yet ported |
Reading
.pyc header validation (lines 201 to 380)
cpython 3.14 @ ab2d84fe1023/Lib/importlib/_bootstrap_external.py#L201-380
Every .pyc file begins with a 16-byte header. The first four bytes are the magic number, which encodes the CPython version and a counter that increments whenever the bytecode format changes. The next four bytes are a flags word (0 for timestamp-mode, 1 for checked-hash mode, 3 for unchecked-hash mode). The remaining eight bytes are either (mtime, source_size) in timestamp mode or the 8-byte hash of the source in hash mode.
validate_bytecode_header reads this header and raises ImportError on any mismatch so that stale .pyc files are silently recompiled rather than causing mysterious failures.
def validate_bytecode_header(data, source_stats=None, name=None, path=None):
magic = data[:4]
if magic != MAGIC_NUMBER:
message = f'bad magic number in {name!r}: {magic!r}'
raise ImportError(message, name=name, path=path)
flags = _unpack_uint32(data[4:8])
if flags == 0:
# timestamp-based
source_mtime = _unpack_uint32(data[8:12])
source_size = _unpack_uint32(data[12:16])
if source_stats is not None:
if source_mtime != int(source_stats['mtime']):
raise ImportError('bytecode is stale', name=name, path=path)
if source_size != (source_stats['size'] & 0xFFFFFFFF):
raise ImportError('bytecode is stale', name=name, path=path)
elif flags & 0b11 == 0b01:
# checked-hash mode
source_hash = data[8:16]
...
return data[16:] # raw bytecode payload
The flags field is what allows timestamp-mode and hash-mode .pyc files to coexist on the same sys.path without confusion.
FileFinder.find_spec: scanning a directory (lines 951 to 1150)
cpython 3.14 @ ab2d84fe1023/Lib/importlib/_bootstrap_external.py#L951-1150
FileFinder is the callable installed on sys.path_hooks. When PathFinder queries it for a module name, find_spec converts the dotted name to a file-system stem, enumerates the directory (caching the listing in self._path_cache to avoid repeated os.listdir calls), and checks each registered suffix in order. Source suffixes (.py) are checked before bytecode suffixes (.pyc), and directory entries are checked before file entries so that packages shadow same-named single-file modules.
def find_spec(self, name, target=None):
tail_module = name.rpartition('.')[2]
try:
mtime = _path_stat(self.path or _os.getcwd()).st_mtime
except OSError:
return None
if mtime != self._path_mtime:
self._fill_cache()
self._path_mtime = mtime
# check for a package (directory with __init__)
if _relax_case():
cache = self._relaxed_path_cache
tail_module_lower = tail_module.lower()
if tail_module_lower in cache:
base_path = _path_join(self.path, cache[tail_module_lower])
else:
base_path = _path_join(self.path, tail_module)
else:
...
for suffix, loader_class in self._loaders:
init_filename = '__init__' + suffix
full_path = _path_join(base_path, init_filename)
if _path_isfile(full_path):
return self._get_spec(loader_class, name, full_path, [base_path], target)
# check for a module file
for suffix, loader_class in self._loaders:
full_path = _path_join(self.path, tail_module + suffix)
...
The _fill_cache call rebuilds an in-memory set of the directory's entries, normalized for case-insensitive filesystems (macOS, Windows). The mtime check means that new .py files dropped into a directory are visible within one import attempt without restarting the interpreter.
SourceFileLoader.get_code: the read-compile-cache path (lines 701 to 850)
cpython 3.14 @ ab2d84fe1023/Lib/importlib/_bootstrap_external.py#L701-850
SourceFileLoader inherits SourceLoader.get_code, which orchestrates the full source-to-code-object pipeline. It reads the source bytes, checks whether a valid .pyc cache exists, and if so unmarshals the cached code object instead of recompiling. On a cache miss (or an invalid cache), it calls compile(), then attempts to write the new .pyc so future imports can skip compilation. Write failures (e.g. read-only file system) are silently ignored.
def get_code(self, fullname):
source_path = self.get_filename(fullname)
source_bytes = None
source_mtime = None
try:
bytecode_path = cache_from_source(source_path)
except NotImplementedError:
bytecode_path = None
if bytecode_path is not None:
try:
data = self.get_data(bytecode_path)
except OSError:
pass
else:
try:
bytes_data = validate_bytecode_header(
data, source_stats=self.path_stats(source_path),
name=fullname, path=bytecode_path)
return _compile_bytecode(bytes_data, name=fullname,
bytecode_path=bytecode_path,
source_path=source_path)
except ImportError:
pass
# cache miss: compile from source
source_bytes = self.get_data(source_path)
code_object = self.source_to_code(source_bytes, source_path)
...
if bytecode_path is not None:
data = _code_to_timestamp_pyc(code_object, source_mtime, len(source_bytes))
self._cache_bytecode(source_path, bytecode_path, data)
return code_object
The separation between get_data (raw bytes) and the validation/compilation logic means subclasses can override just get_data to load sources from non-filesystem locations (zip archives, network paths) while reusing the caching and validation code unchanged.
gopy mirror
Not yet ported. gopy supports only frozen and built-in module imports today, handled in vm/eval_import.go. File-system-based loading (PathFinder, FileFinder, SourceFileLoader) is not implemented. The .pyc cache machinery is similarly absent because gopy compiles Python source to its own bytecode format rather than storing CPython .pyc files. When file-based import is added, the Go analog of FileFinder would likely use os.ReadDir for directory scanning and a sync.Map for the path-importer cache corresponding to sys.path_importer_cache.
CPython 3.14 changes
CPython 3.14 extended the .pyc invalidation options by adding SOURCE_DATE_EPOCH support to _code_to_timestamp_pyc, allowing reproducible builds to pin the mtime stored in cache headers regardless of the actual source file timestamp. FileFinder gained a thread-safety fix for the free-threaded build, protecting _path_cache and _path_mtime with a per-finder lock. PathFinder.find_spec was updated to check sys.path_importer_cache under the interpreter's import lock to prevent two threads from simultaneously creating duplicate FileFinder instances for the same directory.