Skip to main content

Lib/gzip.py

Source:

cpython 3.14 @ ab2d84fe1023/Lib/gzip.py

Map

LinesSymbolRole
1–40module header, importsBadGzipFile, constants (FTEXT, FHCRC, FNAME, FCOMMENT, FEXTRA)
41–60BadGzipFileexception raised on malformed or truncated gzip data
61–180GzipFile.__init__, GzipFile.closemode/compresslevel/mtime setup, underlying file ownership
181–260_GzipWriter._write_gzip_headermagic bytes, method, flags, mtime, OS byte
261–360_GzipReader._read_gzip_headerheader parsing, multi-stream detection, flag decoding
361–460GzipFile.writebuffer accumulation, zlib.compressobj integration
461–540GzipFile.read, GzipFile.read1, GzipFile.peekdecompression, CRC verification, multi-stream stitching
541–600open, compress, decompressmodule-level convenience wrappers

Reading

GzipFile init and _GzipWriter._write_gzip_header

GzipFile.__init__ accepts mode, compresslevel, and mtime. When writing it creates a _GzipWriter internally and immediately writes the 10-byte fixed gzip header plus optional filename field.

The mtime parameter controls the 4-byte modification-time field in the header. Passing mtime=0 produces a reproducible output regardless of wall time, which is useful for deterministic builds.

# CPython: Lib/gzip.py:181 _GzipWriter._write_gzip_header
def _write_gzip_header(self, compresslevel):
self._fp.write(b'\037\213') # magic number
self._fp.write(b'\010') # method: deflate
...
mtime = self._write_mtime
if mtime is None:
mtime = time.time()
write32u(self._fp, int(mtime))
if compresslevel == BEST_COMPRESSION:
self._fp.write(b'\002')
elif compresslevel == BEST_SPEED:
self._fp.write(b'\004')
else:
self._fp.write(b'\000')
self._fp.write(b'\377') # OS: unknown

The OS byte is always 0xff (unknown) in CPython's implementation. RFC 1952 defines values for Unix (0x03), NTFS (0x0b), and others, but CPython does not expose OS selection to callers.

_GzipReader._read_gzip_header and multi-stream support

_read_gzip_header parses one gzip member header from the current file position. It checks the two-byte magic, reads the method byte (must be 8 for deflate), then decodes the flags byte to determine which optional header fields are present.

# CPython: Lib/gzip.py:261 _GzipReader._read_gzip_header
def _read_gzip_header(self):
magic = self._fp.read(2)
if magic == b'':
return False
if magic != b'\037\213':
raise BadGzipFile(f'Not a gzipped file ({magic!r})')
(method, flag, self._last_mtime) = struct.unpack("<BBIxx",
self._fp.read(8))
if method != 8:
raise BadGzipFile('Unknown compression method')
if flag & FEXTRA:
...
if flag & FNAME:
... # read null-terminated filename
if flag & FCOMMENT:
... # read null-terminated comment
if flag & FHCRC:
... # read and verify header CRC16
return True

After _read_eof detects the end of one deflate stream, _GzipReader calls _read_gzip_header again to check for a concatenated second member. This is how multi-stream gzip files (produced by tools like pigz or by appending two .gz files) are transparently decompressed as a single byte stream.

GzipFile.write() and buffer accumulation

GzipFile.write() feeds data to a zlib.compressobj and accumulates the compressed output. It also updates a running CRC-32 and an uncompressed-byte counter; both values are written into the 8-byte gzip trailer on close().

# CPython: Lib/gzip.py:361 GzipFile.write
def write(self, data):
self._check_not_closed()
if self.mode != WRITE:
import errno
raise OSError(errno.EBADF, "write() on read-only GzipFile object")
if isinstance(data, (bytes, bytearray)):
length = len(data)
else:
data = memoryview(data)
length = data.nbytes
if length > 0:
self._crc = zlib.crc32(data, self._crc)
self._size += length
self._fp.write(self._compress.compress(data))
return length

Calling flush() with zlib.Z_SYNC_FLUSH forces all buffered compressed data to the underlying file without closing the deflate stream, which is useful for streaming scenarios where the reader needs periodic progress.

gopy notes

Status: not yet ported.

Planned package path: module/gzip/.

The Go standard library's compress/gzip covers the basic read/write path, but a faithful port of the Python GzipFile API must expose mtime, compresslevel, and the BadGzipFile exception type (mapped to a Go error sentinel). Multi-stream support in _GzipReader requires explicit re-entry logic that is absent from compress/gzip's Reader, so that portion must be ported directly with CPython citations rather than delegated to the Go package. The compress, decompress, and open module-level helpers are thin wrappers and can be ported last.