Skip to main content

Lib/shutil.py

cpython 3.14 @ ab2d84fe1023/Lib/shutil.py

shutil provides higher-level file utilities that compose the lower-level os and io interfaces. The module detects at import time which fast-copy kernel interfaces are available (sendfile, copy_file_range, fcopyfile) and selects the most efficient path for each copy operation. Archive support is dispatched through a registry of named formats.

Map

LinesSymbolRole
56-74Module-level feature flags_USE_CP_SENDFILE, _USE_CP_COPY_FILE_RANGE, _HAS_FCOPYFILE detect kernel fast-copy APIs
75-96Exception hierarchyError, SameFileError, SpecialFileError, ReadError, RegistryError
98-117_fastcopy_fcopyfilemacOS fcopyfile(3) zero-copy path for file-to-file copies
137-177_fastcopy_copy_file_rangeLinux copy_file_range(2) path; supports reflink on capable filesystems
178-228_fastcopy_sendfileLinux/Android/Solaris sendfile(2) zero-copy path
229-257_copyfileobj_readinto / copyfileobjPortable fallback: chunked readinto / read+write loop
283-354copyfileCopies file data, dispatches to fast-copy paths, rejects named pipes
355-408copymodeCopies permission bits; uses lchmod for symlinks when available
409-474copystatCopies mode, atime, mtime, flags, xattrs; uses follow_symlinks=False on symlinks
475-531copy / copy2copy = copyfile + copymode; copy2 = copyfile + copystat
533-544ignore_patternsFactory returning a callable that filters names against fnmatch patterns
545-609_copytreeInternal recursive worker: respects symlinks, ignore, copy_function, and dirs_exist_ok
611-667copytreePublic entry: calls os.scandir then delegates to _copytree
669-712_rmtree_unsafeRace-prone deletion using os.walk; used when openat/unlinkat are unavailable
714-809_rmtree_safe_fd / _rmtree_safe_fd_stepfd-based deletion immune to symlink-race attacks; uses openat/unlinkat
810-857rmtreePublic entry: resolves onerror/onexc callbacks, selects safe vs. unsafe implementation
876-941moveTries os.rename first; falls back to copy2 + rmtree for cross-device moves
992-1063_make_tarballCreates .tar, .tar.gz, .tar.bz2, .tar.xz, or .tar.zst archives
1064-1148_make_zipfileCreates .zip archives using zipfile.ZipFile
1150-1182get_archive_formats / register_archive_format / unregister_archive_formatRegistry for named archive producers
1184-1249make_archiveDispatches to a registered archive function by format name
1251-1370get_unpack_formats / register_unpack_format / unpack_archiveRegistry and dispatcher for archive extractors
1456-1494chownChanges file owner and group by name or numeric id
1560-1641whichSearches PATH directories for an executable matching cmd and mode

Reading

copyfile: fast-copy dispatch

copyfile is the central dispatch point. It checks for named pipes (which cannot be copied as regular files), then opens source and destination and tries each platform-specific fast path in order: fcopyfile on macOS, copy_file_range then sendfile on Linux, a Windows readinto loop for large files, and finally the generic copyfileobj fallback. Each fast path raises _GiveupOnFastCopy on any condition that disqualifies it.

# CPython: Lib/shutil.py:283 copyfile
def copyfile(src, dst, *, follow_symlinks=True):
sys.audit("shutil.copyfile", src, dst)
if _samefile(src, dst):
raise SameFileError("{!r} and {!r} are the same file".format(src, dst))
# ... special-file checks omitted for brevity ...
if not follow_symlinks and _islink(src):
os.symlink(os.readlink(src), dst)
else:
with open(src, 'rb') as fsrc:
with open(dst, 'wb') as fdst:
if _HAS_FCOPYFILE:
try:
_fastcopy_fcopyfile(fsrc, fdst, posix._COPYFILE_DATA)
return dst
except _GiveupOnFastCopy:
pass
elif _USE_CP_SENDFILE or _USE_CP_COPY_FILE_RANGE:
if _USE_CP_COPY_FILE_RANGE:
try:
_fastcopy_copy_file_range(fsrc, fdst)
return dst
except _GiveupOnFastCopy:
pass
if _USE_CP_SENDFILE:
try:
_fastcopy_sendfile(fsrc, fdst)
return dst
except _GiveupOnFastCopy:
pass
copyfileobj(fsrc, fdst)
return dst

_fastcopy_sendfile: zero-copy on Linux

The sendfile path calls os.sendfile in a loop, advancing offset by the number of bytes transferred each iteration. If the kernel reports ENOTSOCK (old Linux kernels that only support socket destinations), the module-level flag _USE_CP_SENDFILE is cleared so the path is never tried again in the same process.

# CPython: Lib/shutil.py:178 _fastcopy_sendfile
def _fastcopy_sendfile(fsrc, fdst):
global _USE_CP_SENDFILE
try:
infd = fsrc.fileno()
outfd = fdst.fileno()
except Exception as err:
raise _GiveupOnFastCopy(err)
blocksize = _determine_linux_fastcopy_blocksize(infd)
offset = 0
while True:
try:
sent = os.sendfile(outfd, infd, offset, blocksize)
except OSError as err:
err.filename = fsrc.name
err.filename2 = fdst.name
if err.errno == errno.ENOTSOCK:
_USE_CP_SENDFILE = False
raise _GiveupOnFastCopy(err)
if err.errno == errno.ENOSPC:
raise err from None
if offset == 0 and os.lseek(outfd, 0, os.SEEK_CUR) == 0:
raise _GiveupOnFastCopy(err)
raise err
else:
if sent == 0:
break
offset += sent

rmtree: fd-based safe deletion

When openat/unlinkat are available, rmtree uses _rmtree_safe_fd, which maintains an explicit stack of open directory file descriptors. Each step opens subdirectories with O_DIRECTORY | O_NOFOLLOW relative to the parent fd, making the walk immune to TOCTOU symlink substitution attacks. The unsafe _rmtree_unsafe variant is used on platforms that lack these syscalls.

# CPython: Lib/shutil.py:810 rmtree
def rmtree(path, ignore_errors=False, onerror=None, *, onexc=None, dir_fd=None):
sys.audit("shutil.rmtree", path, dir_fd)
if ignore_errors:
def onexc(*args):
pass
elif onerror is None and onexc is None:
def onexc(*args):
raise
elif onexc is None:
if onerror is None:
def onexc(*args):
raise
else:
def onexc(*args):
func, path, exc = args
if exc is None:
exc_info = None, None, None
else:
exc_info = type(exc), exc, exc.__traceback__
return onerror(func, path, exc_info)
_rmtree_impl(path, dir_fd, onexc)

which splits PATH into directories and probes each one. On Windows it also expands PATHEXT to find the canonical extension. On POSIX the search is a simple os.access check. When cmd already contains a directory component, PATH scanning is skipped and only that directory is checked.

# CPython: Lib/shutil.py:1560 which
def which(cmd, mode=os.F_OK | os.X_OK, path=None):
use_bytes = isinstance(cmd, bytes)
dirname, cmd = os.path.split(cmd)
if dirname:
path = [dirname]
else:
if path is None:
path = os.environ.get("PATH", None)
if path is None:
try:
path = os.confstr("CS_PATH")
except (AttributeError, ValueError):
path = os.defpath
if not path:
return None
if use_bytes:
path = os.fsencode(path).split(os.fsencode(os.pathsep))
else:
path = os.fsdecode(path).split(os.pathsep)
# ... PATHEXT handling and final os.access loop follow ...

gopy notes

shutil is not yet ported in gopy. The copy family (copyfile, copy, copy2, copystat) is the highest-priority slice because gopy's pathlib.Path.copy implementation will need it. The sendfile and copy_file_range fast paths map directly to golang.org/x/sys/unix calls. rmtree and copytree are straightforward to port once os.scandir and os.walk are stable. Archive dispatch (make_archive, unpack_archive) is low priority and can be deferred until the Go archive/tar and archive/zip bindings are ready.