Skip to main content

Lib/zipfile/__init__.py

Source:

cpython 3.14 @ ab2d84fe1023/Lib/zipfile/__init__.py

Map

LinesSymbolRole
1–80module header, importsconstants, _CD_* offsets, structCentralDir
81–260ZipInfoper-entry metadata: filename, compress_type, file_size, compress_size, CRC
261–380_get_codecs, _compressor, _decompressorcompressor/decompressor registry keyed by compress_type
381–620ZipExtFileread-only file wrapper returned by ZipFile.open()
621–900ZipFile.__init__, ZipFile.openarchive open, mode validation, decompressor selection
901–1100ZipFile.write, ZipFile.writestrcompression, ZIP64 extension headers
1101–1250ZipFile.extract, ZipFile.extractallpath safety via Path.is_relative_to guard
1251–1600ZipFile._RealGetContents, _EndRecDatacentral-directory parsing, ZIP64 end locator
1601–1900ZipFile.close, ZipFile._write_end_recordfinalise central directory, write end-of-central-directory record
1901–2500Pathpathlib-style facade over zip entries

Reading

ZipFile.open() and decompressor selection

ZipFile.open() is the central read path. It validates the entry, selects a decompressor via _get_codecs, and wraps the raw file object in ZipExtFile.

The compressor registry maps integer compression IDs (stored in ZipInfo.compress_type) to factory callables. _get_codecs raises NotImplementedError for unsupported methods so callers get a clear message rather than a silent no-op.

# CPython: Lib/zipfile/__init__.py:900 ZipFile.open
def open(self, name, mode="r", pwd=None, *, force_zip64=False):
...
zinfo = self.getinfo(name)
...
zef_file = _SharedFile(self.fp, zinfo.header_offset, ...)
...
return ZipExtFile(zef_file, mode, zinfo, pwd, close_fileobj)

ZipExtFile buffers compressed data internally and decompresses on demand through the _decompressor object stored at construction time. The wrapper also verifies the CRC-32 checksum when the caller reaches EOF, raising BadZipFile on mismatch.

ZipInfo: per-entry metadata

ZipInfo holds all per-entry fields decoded from the local file header and the central directory. The two most important size fields are file_size (uncompressed) and compress_size (bytes stored on disk). Both are read from the central-directory record; ZIP64 extensions replace them when the values overflow a 32-bit field.

# CPython: Lib/zipfile/__init__.py:81 ZipInfo
class ZipInfo:
...
compress_type: int
file_size: int # uncompressed size
compress_size: int # compressed size on disk
CRC: int # CRC-32 of uncompressed data

The from_file() class method is the canonical way to build a ZipInfo from a real filesystem path. It fills file_size, sets compress_type to ZIP_STORED by default, and leaves compress_size at zero until the entry is actually written.

extractall() path safety

Older Python versions allowed zip entries whose filename component contained .. segments, enabling directory-traversal writes outside the target directory. CPython 3.12 introduced an explicit guard using Path.is_relative_to:

# CPython: Lib/zipfile/__init__.py:1212 ZipFile.extractall
for member in members:
...
target = pathlib.Path(path) / member.filename
if not target.is_relative_to(path):
raise ValueError(
f"path is outside the target directory: {target!r}")

The guard is applied per-member before any file is opened on disk, so a malformed archive cannot escape the destination tree even when extractall() is called without inspecting individual entries.

gopy notes

Status: not yet ported.

Planned package path: module/zipfile/.

The port will need ZipInfo, ZipFile, ZipExtFile, and Path. The compressor registry (_get_codecs) maps naturally to a Go map[int]func() keyed by the same integer compression constants. compress/flate, compress/zlib, compress/bzip2, and compress/lzw from the Go standard library cover the four CPython-supported methods (stored, deflate, bzip2, lzma requires github.com/ulikunitz/xz). The path-safety guard in extractall() must be ported exactly, as it is a security boundary.