Skip to main content

Lib/shutil.py

cpython 3.14 @ ab2d84fe1023/Lib/shutil.py

shutil (shell utilities) provides file and directory operations that the os module deliberately omits because they involve policy decisions: whether to copy metadata, how to handle errors during recursive deletion, how to pick a fast path for large transfers.

The file-copy family (copyfileobj, copyfile, copy, copy2) forms a layered stack. copyfileobj does the raw byte loop; copyfile opens source and destination and calls copyfileobj (or a fast path); copy adds mode bits; copy2 also copies timestamps and extended attributes. When os.sendfile is available and both source and destination are regular files on the same filesystem, _fastcopy_sendfile bypasses the read/write loop entirely by asking the kernel to move data in-kernel.

rmtree and copytree are the recursive counterparts. rmtree supports an onexc callback (renamed from onerror in 3.12) that lets callers retry or ignore individual removal failures. copytree gained dirs_exist_ok in 3.8, which allows merging into an existing destination directory instead of requiring it to be absent.

The archive subsystem (make_archive, unpack_archive) maintains a registry of format handlers (_ARCHIVE_FORMATS, _UNPACK_FORMATS), allowing third-party packages to register new formats via register_archive_format and register_unpack_format.

Map

LinesSymbolRolegopy
1-80Module header, SameFileError, ErrorException hierarchy and import block; SameFileError is raised by copyfile when src and dst refer to the same file.(stdlib pending)
81-180copyfileobj, _samefile, copyfile, _copyfileobj_readintoRaw byte copy; _fastcopy_sendfile and _fastcopy_fcopyfile (macOS) fast paths; copyfile opens files and dispatches to best available copy.(stdlib pending)
181-300copymode, copystat, copy, copy2copymode applies permission bits only; copystat copies mode, timestamps, flags, and xattrs; copy and copy2 combine copyfile with those.(stdlib pending)
301-500ignore_patterns, copytree, _copytreecopytree walks the source tree with os.scandir and copies each entry; ignore_patterns returns a callback that filters by glob.(stdlib pending)
501-600rmtree, _rmtree_safe_fd, _rmtree_unsafeRecursive deletion; _rmtree_safe_fd uses os.open/fstat to avoid TOCTOU races on platforms that support dir_fd; onexc callback on errors.(stdlib pending)
601-800move, _destinsrcmove attempts os.rename first; if that fails (cross-device) it copies then removes; _destinsrc guards against moving a tree into itself.(stdlib pending)
801-1000disk_usage, chown, which, get_terminal_sizePOSIX and Windows implementations of each; which searches PATH with extension handling for Windows .com/.exe/.bat; disk_usage wraps os.statvfs.(stdlib pending)
1001-1300_ARCHIVE_FORMATS, register_archive_format, unregister_archive_format, make_archive, _make_tarball, _make_zipfileArchive creation; make_archive calls the registered handler; _make_tarball wraps tarfile; _make_zipfile wraps zipfile.(stdlib pending)
1300-1700_UNPACK_FORMATS, register_unpack_format, unregister_unpack_format, unpack_archive, get_archive_formats, get_unpack_formatsArchive extraction registry; unpack_archive probes the format by extension then header bytes if format is None.(stdlib pending)

Reading

rmtree onerror/onexc recovery (lines 501 to 600)

cpython 3.14 @ ab2d84fe1023/Lib/shutil.py#L501-600

def rmtree(path, ignore_errors=False, onerror=None, *,
onexc=None, dir_fd=None):
...
if ignore_errors:
def onexc(*args):
pass
elif onexc is None:
if onerror is None:
def onexc(*args):
raise args[1]
else:
# Translate from onexc signature to onerror signature
def onexc(func, path, exc):
if exc is None:
exc_info = None, None, None
else:
exc_info = type(exc), exc, exc.__traceback__
return onerror(func, path, exc_info)
...
try:
if _use_fd_functions:
...
_rmtree_safe_fd(topfd, path, onexc)
else:
_rmtree_unsafe(path, onexc)
finally:
...

The onerror parameter was deprecated in 3.12 in favour of onexc, which receives the exception instance directly rather than a (type, value, traceback) triple. The shim at the top translates the new signature to the old one for code that still passes onerror.

_rmtree_safe_fd opens each subdirectory with os.open(name, os.O_RDONLY, dir_fd=...) and uses os.fstat to detect if the path was replaced by a symlink between the scandir and the open. This makes the implementation safe against symlink-substitution attacks, which _rmtree_unsafe (the fallback for Windows and older kernels) cannot prevent.

copytree with dirs_exist_ok (lines 301 to 500)

cpython 3.14 @ ab2d84fe1023/Lib/shutil.py#L301-500

def _copytree(entries, src, dst, symlinks, ignore, copy_function,
ignore_dangling_symlinks, dirs_exist_ok=False):
if ignore is not None:
ignored_names = ignore(os.fspath(src), [x.name for x in entries])
else:
ignored_names = set()

os.makedirs(dst, exist_ok=dirs_exist_ok)
errors = []
use_srcentry = copy_function is copy2 or copy_function is copy

for srcentry in entries:
if srcentry.name in ignored_names:
continue
srcname = os.path.join(src, srcentry.name)
dstname = os.path.join(dst, srcentry.name)
...
try:
if srcentry.is_symlink():
...
elif srcentry.is_dir():
copytree(srcname, dstname, symlinks, ignore,
copy_function, dirs_exist_ok=dirs_exist_ok)
else:
copy_function(srcentry if use_srcentry else srcname, dstname)
except Error as err:
errors.extend(err.args[0])
except OSError as why:
errors.append((srcname, dstname, str(why)))
...

dirs_exist_ok=True passes exist_ok=True to os.makedirs, allowing the destination directory to already exist. All errors across the entire tree walk are collected into errors and raised as a single shutil.Error at the end, rather than stopping on the first failure. The copy_function parameter defaults to copy2 but lets callers substitute copy, copyfile, or any function with the same (src, dst) signature.

When copy_function is copy2 or copy, _copytree passes a DirEntry object to avoid a second stat call (the entry's cached stat from scandir is reused).

_fastcopy_sendfile (lines 81 to 180)

cpython 3.14 @ ab2d84fe1023/Lib/shutil.py#L81-180

def _fastcopy_sendfile(fsrc, fdst):
"""Copy data from one regular mmap-like fd to another by using
high-performance sendfile(2) syscall.
This should work on Linux >= 2.6.33 only.
"""
# ...
blocksize = max(COPY_BUFSIZE, 2 ** min(
math.floor(math.log2(try_sendfile(0, 0, 0, 0) or 1)), 30))
offset = 0
while True:
try:
sent = os.sendfile(out_fd, in_fd, offset, blocksize)
except OSError as err:
if err.errno in _sendfile_err_codes:
# fall back to read/write loop
...
return _copyfileobj_readinto(fsrc, fdst, length=COPY_BUFSIZE)
raise
if sent == 0:
break # EOF
offset += sent

os.sendfile(out_fd, in_fd, offset, count) transfers bytes directly in the kernel, avoiding two copies through userspace buffers. The block size is derived from the first call's return value: on Linux, sendfile with count=0 returns the file size, from which an appropriate power-of-two block is computed. On kernels or filesystems where sendfile returns EINVAL, ENOSYS, or ENOTSUP, the function falls back to the _copyfileobj_readinto loop transparently.

gopy mirror

shutil.py is pure Python but its fast paths depend on os.sendfile (Linux), fcopyfile (macOS), and os.statvfs. The pure read/write loop via copyfileobj works without those. rmtree, copytree, move, and which can be bundled verbatim once os.scandir, os.makedirs, and the stat functions are available. Archive support requires tarfile and zipfile to be bundled first.