Lib/shutil.py
cpython 3.14 @ ab2d84fe1023/Lib/shutil.py
shutil provides higher-level file utilities that compose the lower-level os and io interfaces. The module detects at import time which fast-copy kernel interfaces are available (sendfile, copy_file_range, fcopyfile) and selects the most efficient path for each copy operation. Archive support is dispatched through a registry of named formats.
Map
| Lines | Symbol | Role |
|---|---|---|
| 56-74 | Module-level feature flags | _USE_CP_SENDFILE, _USE_CP_COPY_FILE_RANGE, _HAS_FCOPYFILE detect kernel fast-copy APIs |
| 75-96 | Exception hierarchy | Error, SameFileError, SpecialFileError, ReadError, RegistryError |
| 98-117 | _fastcopy_fcopyfile | macOS fcopyfile(3) zero-copy path for file-to-file copies |
| 137-177 | _fastcopy_copy_file_range | Linux copy_file_range(2) path; supports reflink on capable filesystems |
| 178-228 | _fastcopy_sendfile | Linux/Android/Solaris sendfile(2) zero-copy path |
| 229-257 | _copyfileobj_readinto / copyfileobj | Portable fallback: chunked readinto / read+write loop |
| 283-354 | copyfile | Copies file data, dispatches to fast-copy paths, rejects named pipes |
| 355-408 | copymode | Copies permission bits; uses lchmod for symlinks when available |
| 409-474 | copystat | Copies mode, atime, mtime, flags, xattrs; uses follow_symlinks=False on symlinks |
| 475-531 | copy / copy2 | copy = copyfile + copymode; copy2 = copyfile + copystat |
| 533-544 | ignore_patterns | Factory returning a callable that filters names against fnmatch patterns |
| 545-609 | _copytree | Internal recursive worker: respects symlinks, ignore, copy_function, and dirs_exist_ok |
| 611-667 | copytree | Public entry: calls os.scandir then delegates to _copytree |
| 669-712 | _rmtree_unsafe | Race-prone deletion using os.walk; used when openat/unlinkat are unavailable |
| 714-809 | _rmtree_safe_fd / _rmtree_safe_fd_step | fd-based deletion immune to symlink-race attacks; uses openat/unlinkat |
| 810-857 | rmtree | Public entry: resolves onerror/onexc callbacks, selects safe vs. unsafe implementation |
| 876-941 | move | Tries os.rename first; falls back to copy2 + rmtree for cross-device moves |
| 992-1063 | _make_tarball | Creates .tar, .tar.gz, .tar.bz2, .tar.xz, or .tar.zst archives |
| 1064-1148 | _make_zipfile | Creates .zip archives using zipfile.ZipFile |
| 1150-1182 | get_archive_formats / register_archive_format / unregister_archive_format | Registry for named archive producers |
| 1184-1249 | make_archive | Dispatches to a registered archive function by format name |
| 1251-1370 | get_unpack_formats / register_unpack_format / unpack_archive | Registry and dispatcher for archive extractors |
| 1456-1494 | chown | Changes file owner and group by name or numeric id |
| 1560-1641 | which | Searches PATH directories for an executable matching cmd and mode |
Reading
copyfile: fast-copy dispatch
copyfile is the central dispatch point. It checks for named pipes (which cannot be copied as regular files), then opens source and destination and tries each platform-specific fast path in order: fcopyfile on macOS, copy_file_range then sendfile on Linux, a Windows readinto loop for large files, and finally the generic copyfileobj fallback. Each fast path raises _GiveupOnFastCopy on any condition that disqualifies it.
# CPython: Lib/shutil.py:283 copyfile
def copyfile(src, dst, *, follow_symlinks=True):
sys.audit("shutil.copyfile", src, dst)
if _samefile(src, dst):
raise SameFileError("{!r} and {!r} are the same file".format(src, dst))
# ... special-file checks omitted for brevity ...
if not follow_symlinks and _islink(src):
os.symlink(os.readlink(src), dst)
else:
with open(src, 'rb') as fsrc:
with open(dst, 'wb') as fdst:
if _HAS_FCOPYFILE:
try:
_fastcopy_fcopyfile(fsrc, fdst, posix._COPYFILE_DATA)
return dst
except _GiveupOnFastCopy:
pass
elif _USE_CP_SENDFILE or _USE_CP_COPY_FILE_RANGE:
if _USE_CP_COPY_FILE_RANGE:
try:
_fastcopy_copy_file_range(fsrc, fdst)
return dst
except _GiveupOnFastCopy:
pass
if _USE_CP_SENDFILE:
try:
_fastcopy_sendfile(fsrc, fdst)
return dst
except _GiveupOnFastCopy:
pass
copyfileobj(fsrc, fdst)
return dst
_fastcopy_sendfile: zero-copy on Linux
The sendfile path calls os.sendfile in a loop, advancing offset by the number of bytes transferred each iteration. If the kernel reports ENOTSOCK (old Linux kernels that only support socket destinations), the module-level flag _USE_CP_SENDFILE is cleared so the path is never tried again in the same process.
# CPython: Lib/shutil.py:178 _fastcopy_sendfile
def _fastcopy_sendfile(fsrc, fdst):
global _USE_CP_SENDFILE
try:
infd = fsrc.fileno()
outfd = fdst.fileno()
except Exception as err:
raise _GiveupOnFastCopy(err)
blocksize = _determine_linux_fastcopy_blocksize(infd)
offset = 0
while True:
try:
sent = os.sendfile(outfd, infd, offset, blocksize)
except OSError as err:
err.filename = fsrc.name
err.filename2 = fdst.name
if err.errno == errno.ENOTSOCK:
_USE_CP_SENDFILE = False
raise _GiveupOnFastCopy(err)
if err.errno == errno.ENOSPC:
raise err from None
if offset == 0 and os.lseek(outfd, 0, os.SEEK_CUR) == 0:
raise _GiveupOnFastCopy(err)
raise err
else:
if sent == 0:
break
offset += sent
rmtree: fd-based safe deletion
When openat/unlinkat are available, rmtree uses _rmtree_safe_fd, which maintains an explicit stack of open directory file descriptors. Each step opens subdirectories with O_DIRECTORY | O_NOFOLLOW relative to the parent fd, making the walk immune to TOCTOU symlink substitution attacks. The unsafe _rmtree_unsafe variant is used on platforms that lack these syscalls.
# CPython: Lib/shutil.py:810 rmtree
def rmtree(path, ignore_errors=False, onerror=None, *, onexc=None, dir_fd=None):
sys.audit("shutil.rmtree", path, dir_fd)
if ignore_errors:
def onexc(*args):
pass
elif onerror is None and onexc is None:
def onexc(*args):
raise
elif onexc is None:
if onerror is None:
def onexc(*args):
raise
else:
def onexc(*args):
func, path, exc = args
if exc is None:
exc_info = None, None, None
else:
exc_info = type(exc), exc, exc.__traceback__
return onerror(func, path, exc_info)
_rmtree_impl(path, dir_fd, onexc)
which: PATH-based executable search
which splits PATH into directories and probes each one. On Windows it also expands PATHEXT to find the canonical extension. On POSIX the search is a simple os.access check. When cmd already contains a directory component, PATH scanning is skipped and only that directory is checked.
# CPython: Lib/shutil.py:1560 which
def which(cmd, mode=os.F_OK | os.X_OK, path=None):
use_bytes = isinstance(cmd, bytes)
dirname, cmd = os.path.split(cmd)
if dirname:
path = [dirname]
else:
if path is None:
path = os.environ.get("PATH", None)
if path is None:
try:
path = os.confstr("CS_PATH")
except (AttributeError, ValueError):
path = os.defpath
if not path:
return None
if use_bytes:
path = os.fsencode(path).split(os.fsencode(os.pathsep))
else:
path = os.fsdecode(path).split(os.pathsep)
# ... PATHEXT handling and final os.access loop follow ...
gopy notes
shutil is not yet ported in gopy. The copy family (copyfile, copy, copy2, copystat) is the highest-priority slice because gopy's pathlib.Path.copy implementation will need it. The sendfile and copy_file_range fast paths map directly to golang.org/x/sys/unix calls. rmtree and copytree are straightforward to port once os.scandir and os.walk are stable. Archive dispatch (make_archive, unpack_archive) is low priority and can be deferred until the Go archive/tar and archive/zip bindings are ready.