Skip to main content

Lib/tarfile.py

Source:

cpython 3.14 @ ab2d84fe1023/Lib/tarfile.py

tarfile reads and writes POSIX and GNU tar archives with optional compression. Unlike zipfile, the tar format is a sequential stream of headers and data blocks; seeking requires reading from the start. PAX format (IEEE 1003.1-2001) allows arbitrary Unicode filenames and extended attributes.

Map

LinesSymbolRole
1-100Constants, ENCODING, BLOCKSIZEFormat constants (512-byte blocks)
101-400TarInfoPer-entry metadata: name, size, mtime, mode, uid, gid, type
401-700ExFileObjectFile-like object for reading a member
701-1100TarFile.__init__, open, _proc_*Archive open; header dispatch
1101-1500TarFile.getmembers, getmember, getnamesList contents
1501-1800TarFile.extractall, extract, extractfileExtraction
1801-2200TarFile.addfile, add, gettarinfoWriting entries
2201-2900TarFile.close, _create_gnu_long_header, _create_pax_generic_headerClose, GNU/PAX long-name extensions

Reading

512-byte block structure

Each tar entry consists of a 512-byte header block followed by the file data padded to a multiple of 512 bytes. The header contains a fixed-layout C struct with name, mode, uid, gid, size, mtime, checksum, type, and link fields.

TarInfo.frombuf

Parses a 512-byte header buffer:

# CPython: Lib/tarfile.py:1143 TarInfo.frombuf
@classmethod
def frombuf(cls, buf, encoding, errors):
tarinfo = cls()
tarinfo.name = nts(buf[0:100], encoding, errors)
tarinfo.mode = nti(buf[100:108])
tarinfo.uid = nti(buf[108:116])
tarinfo.gid = nti(buf[116:124])
tarinfo.size = nti(buf[124:136])
tarinfo.mtime = nti(buf[136:148])
...

nts converts null-terminated bytes to a string; nti converts octal ASCII bytes to an integer.

PAX format

PAX adds variable-length extended header records for fields that overflow the fixed-size ustar fields (long filenames, large sizes, high-resolution timestamps). Extended headers are themselves tar entries with type 'x' (local) or 'g' (global).

Compression detection

TarFile.open(mode='r:*') auto-detects the compression by reading the magic bytes and selecting the appropriate decompressor.

# CPython: Lib/tarfile.py:1620 TarFile.open
@classmethod
def open(cls, name=None, mode='r', fileobj=None, bufsize=RECORDSIZE, **kwargs):
...
if ':' in mode:
filemode, comptype = mode.split(':', 1)
...
if comptype == 'gz': fileobj = gzip.GzipFile(...)
elif comptype == 'bz2': fileobj = bz2.BZ2File(...)
elif comptype == 'xz': fileobj = lzma.LZMAFile(...)

gopy notes

Status: not yet ported. Go's archive/tar provides a compatible reader/writer for the ustar format. GNU and PAX extensions require additional header-dispatch logic. A gopy tarfile port would wrap archive/tar with a Python-compatible TarInfo struct.