Lib/shutil.py
cpython 3.14 @ ab2d84fe1023/Lib/shutil.py
shutil (shell utilities) provides file and directory operations that the
os module deliberately omits because they involve policy decisions:
whether to copy metadata, how to handle errors during recursive deletion,
how to pick a fast path for large transfers.
The file-copy family (copyfileobj, copyfile, copy, copy2) forms a
layered stack. copyfileobj does the raw byte loop; copyfile opens
source and destination and calls copyfileobj (or a fast path); copy
adds mode bits; copy2 also copies timestamps and extended attributes.
When os.sendfile is available and both source and destination are
regular files on the same filesystem, _fastcopy_sendfile bypasses the
read/write loop entirely by asking the kernel to move data in-kernel.
rmtree and copytree are the recursive counterparts. rmtree supports
an onexc callback (renamed from onerror in 3.12) that lets callers
retry or ignore individual removal failures. copytree gained
dirs_exist_ok in 3.8, which allows merging into an existing destination
directory instead of requiring it to be absent.
The archive subsystem (make_archive, unpack_archive) maintains a
registry of format handlers (_ARCHIVE_FORMATS, _UNPACK_FORMATS),
allowing third-party packages to register new formats via
register_archive_format and register_unpack_format.
Map
| Lines | Symbol | Role | gopy |
|---|---|---|---|
| 1-80 | Module header, SameFileError, Error | Exception hierarchy and import block; SameFileError is raised by copyfile when src and dst refer to the same file. | (stdlib pending) |
| 81-180 | copyfileobj, _samefile, copyfile, _copyfileobj_readinto | Raw byte copy; _fastcopy_sendfile and _fastcopy_fcopyfile (macOS) fast paths; copyfile opens files and dispatches to best available copy. | (stdlib pending) |
| 181-300 | copymode, copystat, copy, copy2 | copymode applies permission bits only; copystat copies mode, timestamps, flags, and xattrs; copy and copy2 combine copyfile with those. | (stdlib pending) |
| 301-500 | ignore_patterns, copytree, _copytree | copytree walks the source tree with os.scandir and copies each entry; ignore_patterns returns a callback that filters by glob. | (stdlib pending) |
| 501-600 | rmtree, _rmtree_safe_fd, _rmtree_unsafe | Recursive deletion; _rmtree_safe_fd uses os.open/fstat to avoid TOCTOU races on platforms that support dir_fd; onexc callback on errors. | (stdlib pending) |
| 601-800 | move, _destinsrc | move attempts os.rename first; if that fails (cross-device) it copies then removes; _destinsrc guards against moving a tree into itself. | (stdlib pending) |
| 801-1000 | disk_usage, chown, which, get_terminal_size | POSIX and Windows implementations of each; which searches PATH with extension handling for Windows .com/.exe/.bat; disk_usage wraps os.statvfs. | (stdlib pending) |
| 1001-1300 | _ARCHIVE_FORMATS, register_archive_format, unregister_archive_format, make_archive, _make_tarball, _make_zipfile | Archive creation; make_archive calls the registered handler; _make_tarball wraps tarfile; _make_zipfile wraps zipfile. | (stdlib pending) |
| 1300-1700 | _UNPACK_FORMATS, register_unpack_format, unregister_unpack_format, unpack_archive, get_archive_formats, get_unpack_formats | Archive extraction registry; unpack_archive probes the format by extension then header bytes if format is None. | (stdlib pending) |
Reading
rmtree onerror/onexc recovery (lines 501 to 600)
cpython 3.14 @ ab2d84fe1023/Lib/shutil.py#L501-600
def rmtree(path, ignore_errors=False, onerror=None, *,
onexc=None, dir_fd=None):
...
if ignore_errors:
def onexc(*args):
pass
elif onexc is None:
if onerror is None:
def onexc(*args):
raise args[1]
else:
# Translate from onexc signature to onerror signature
def onexc(func, path, exc):
if exc is None:
exc_info = None, None, None
else:
exc_info = type(exc), exc, exc.__traceback__
return onerror(func, path, exc_info)
...
try:
if _use_fd_functions:
...
_rmtree_safe_fd(topfd, path, onexc)
else:
_rmtree_unsafe(path, onexc)
finally:
...
The onerror parameter was deprecated in 3.12 in favour of onexc,
which receives the exception instance directly rather than a
(type, value, traceback) triple. The shim at the top translates the
new signature to the old one for code that still passes onerror.
_rmtree_safe_fd opens each subdirectory with os.open(name, os.O_RDONLY, dir_fd=...) and uses os.fstat to detect if the path was replaced by a symlink between the scandir and the open. This makes the implementation safe against symlink-substitution attacks, which
_rmtree_unsafe (the fallback for Windows and older kernels) cannot prevent.
copytree with dirs_exist_ok (lines 301 to 500)
cpython 3.14 @ ab2d84fe1023/Lib/shutil.py#L301-500
def _copytree(entries, src, dst, symlinks, ignore, copy_function,
ignore_dangling_symlinks, dirs_exist_ok=False):
if ignore is not None:
ignored_names = ignore(os.fspath(src), [x.name for x in entries])
else:
ignored_names = set()
os.makedirs(dst, exist_ok=dirs_exist_ok)
errors = []
use_srcentry = copy_function is copy2 or copy_function is copy
for srcentry in entries:
if srcentry.name in ignored_names:
continue
srcname = os.path.join(src, srcentry.name)
dstname = os.path.join(dst, srcentry.name)
...
try:
if srcentry.is_symlink():
...
elif srcentry.is_dir():
copytree(srcname, dstname, symlinks, ignore,
copy_function, dirs_exist_ok=dirs_exist_ok)
else:
copy_function(srcentry if use_srcentry else srcname, dstname)
except Error as err:
errors.extend(err.args[0])
except OSError as why:
errors.append((srcname, dstname, str(why)))
...
dirs_exist_ok=True passes exist_ok=True to os.makedirs, allowing
the destination directory to already exist. All errors across the entire
tree walk are collected into errors and raised as a single shutil.Error
at the end, rather than stopping on the first failure. The copy_function
parameter defaults to copy2 but lets callers substitute copy,
copyfile, or any function with the same (src, dst) signature.
When copy_function is copy2 or copy, _copytree passes a
DirEntry object to avoid a second stat call (the entry's cached
stat from scandir is reused).
_fastcopy_sendfile (lines 81 to 180)
cpython 3.14 @ ab2d84fe1023/Lib/shutil.py#L81-180
def _fastcopy_sendfile(fsrc, fdst):
"""Copy data from one regular mmap-like fd to another by using
high-performance sendfile(2) syscall.
This should work on Linux >= 2.6.33 only.
"""
# ...
blocksize = max(COPY_BUFSIZE, 2 ** min(
math.floor(math.log2(try_sendfile(0, 0, 0, 0) or 1)), 30))
offset = 0
while True:
try:
sent = os.sendfile(out_fd, in_fd, offset, blocksize)
except OSError as err:
if err.errno in _sendfile_err_codes:
# fall back to read/write loop
...
return _copyfileobj_readinto(fsrc, fdst, length=COPY_BUFSIZE)
raise
if sent == 0:
break # EOF
offset += sent
os.sendfile(out_fd, in_fd, offset, count) transfers bytes directly in
the kernel, avoiding two copies through userspace buffers. The block size
is derived from the first call's return value: on Linux, sendfile with
count=0 returns the file size, from which an appropriate power-of-two
block is computed. On kernels or filesystems where sendfile returns
EINVAL, ENOSYS, or ENOTSUP, the function falls back to the
_copyfileobj_readinto loop transparently.
gopy mirror
shutil.py is pure Python but its fast paths depend on os.sendfile
(Linux), fcopyfile (macOS), and os.statvfs. The pure read/write loop
via copyfileobj works without those. rmtree, copytree, move, and
which can be bundled verbatim once os.scandir, os.makedirs, and the
stat functions are available. Archive support requires tarfile and
zipfile to be bundled first.