Skip to main content

Modules/zipimport.c

cpython 3.14 @ ab2d84fe1023/Modules/zipimport.c

zipimport is a C extension module that ships as part of CPython's core and provides zipimporter, a fully compliant PEP 302 / importlib finder-loader. It allows Python to treat a zip file on sys.path as a package tree, reading .py source files and .pyc bytecode files directly from the archive without extracting them to disk. The module is bootstrapped very early in interpreter startup so that the standard library itself can be distributed inside a zip.

The implementation maintains a module-level directory cache (zip_directory_cache) that maps each zip path to a dictionary of its entries. The cache is populated on first access and reused for subsequent imports from the same archive, making repeated imports fast even for large zip files. Cache entries are keyed by the full path of the zip file on the filesystem.

The zipimporter class implements the modern importlib.abc.MetaPathFinder and importlib.abc.Loader split interface: find_spec returns an importlib.machinery.ModuleSpec, and create_module plus exec_module handle the two-phase load. A thin compatibility shim preserves the older find_module / load_module API for code that still uses it.

Map

LinesSymbolRolegopy
1-80headers, zip_directory_cacheModule-level cache dict and include block
81-220zipimporter.__new__, zipimporter_initConstructor: locate zip boundary in path, populate cache
221-360zipimporter_find_specFinder: search cache for module, build ModuleSpec
361-430zipimporter_create_moduleLoader phase 1: return None for default module creation
431-560zipimporter_exec_moduleLoader phase 2: read bytes, compile or unmarshal, exec
561-650get_data, get_filename, get_source, get_codeAuxiliary loader helpers
651-750zip_get_data, read_directoryLow-level zip parsing: central directory walk, cache fill
751-850get_module_path, is_packagePath utilities and package detection
851-900Module def, PyInit_zipimportExtension init, type registration, exception type

Reading

Module-level cache and init (lines 1 to 80)

cpython 3.14 @ ab2d84fe1023/Modules/zipimport.c#L1-80

The file opens with standard C includes and declares zip_directory_cache as a module-level PyObject * (a plain dict). This cache is shared across all zipimporter instances so that two importers pointing at the same archive share the same directory listing.

/* Modules/zipimport.c ~line 55 */
static PyObject *zip_directory_cache = NULL;

static int
zipimport_exec(PyObject *module)
{
zip_directory_cache = PyDict_New();
...
}

Constructor: locating the zip boundary (lines 81 to 220)

cpython 3.14 @ ab2d84fe1023/Modules/zipimport.c#L81-220

zipimporter_init walks the filesystem path from right to left, chopping at each separator until it finds a file that is a valid zip archive (checked by reading the end-of-central-directory signature). The prefix to the right of the zip file becomes self->prefix, allowing imports from subdirectories inside the archive.

/* ~line 130 */
while (1) {
struct stat statbuf;
if (stat(path_buf, &statbuf) == 0 && S_ISREG(statbuf.st_mode)) {
if (check_is_zip(path_buf))
break; /* found the zip boundary */
}
/* strip last path component and retry */
p = strrchr(path_buf, SEP);
if (p == NULL) { /* not found */ ... }
*p = '\0';
}

find_spec: searching the cache (lines 221 to 360)

cpython 3.14 @ ab2d84fe1023/Modules/zipimport.c#L221-360

zipimporter_find_spec converts the dotted module name to a relative path inside the archive, then looks it up in the pre-populated cache dict. It tries both a plain .py entry and a __init__.py entry (for packages). On a hit it constructs a ModuleSpec via importlib.util.spec_from_file_location, setting the loader to self.

/* ~line 270 */
key = PyUnicode_FromFormat("%U%c%U", self->archive, SEP_CHAR, subpath);
item = PyDict_GetItemWithError(files, key);
if (item != NULL) {
/* build ModuleSpec */
spec = PyObject_CallMethod(util, "spec_from_file_location",
"OOO", fullname, path, self);
}

exec_module: reading and executing code (lines 431 to 560)

cpython 3.14 @ ab2d84fe1023/Modules/zipimport.c#L431-560

zipimporter_exec_module is the heart of the loader. It calls get_code which returns a PyCodeObject either by unmarshaling a .pyc file or by compiling the .py source retrieved from the archive. The code object is then executed in the module's __dict__ with PyEval_EvalCode.

/* ~line 490 */
code = zipimporter_get_code(self, fullname);
if (code == NULL) return -1;
res = PyEval_EvalCode(code, module->md_dict, module->md_dict);
Py_DECREF(code);
if (res == NULL) return -1;
Py_DECREF(res);
return 0;

read_directory: zip central directory walk (lines 651 to 750)

cpython 3.14 @ ab2d84fe1023/Modules/zipimport.c#L651-750

read_directory opens the zip file, seeks to the end-of-central-directory record to find the start of the central directory, then iterates over every file header. For each entry it stores a tuple (data_offset, compress_type, data_size, file_size, file_mtime) into the cache dict under the full path key. This one-time scan makes all subsequent lookups O(1).

/* ~line 700 */
for (i = 0; i < count; i++) {
/* read 46-byte central directory entry */
...
path = PyUnicode_FromFormat("%U%c%s", archive, SEP_CHAR, name_buf);
item = Py_BuildValue("(Hhiii)", data_offset,
compress, data_size, file_size, mtime);
PyDict_SetItem(files, path, item);
}

gopy mirror

Not yet ported.