Skip to main content

Lib/urllib/parse.py

Source:

cpython 3.14 @ ab2d84fe1023/Lib/urllib/parse.py

urllib.parse provides RFC 3986 URL parsing and manipulation. It is pure Python with no C extension. The module is widely used for constructing and decomposing HTTP URLs.

Map

LinesSymbolRole
1-80Scheme classification dictsuses_netloc, uses_query, uses_fragment
81-200urlparse, urlsplitParse a URL into components
201-320ParseResult, SplitResultNamed tuple classes with geturl() method
321-450urlunsplit, urlunparse, urljoinReconstruct and resolve URLs
451-600quote, quote_plus, unquote, unquote_plusPercent-encoding
601-700urlencodeDict to query string
701-850parse_qs, parse_qslQuery string parsing
851-1100splittype, splithost, etc.Legacy split functions (deprecated)

Reading

urlsplit

# CPython: Lib/urllib/parse.py:452 urlsplit
def urlsplit(url, scheme='', allow_fragments=True):
...
netloc = query = fragment = ''
i = url.find(':')
if i > 0 and url[0].isalpha() and all(c in scheme_chars for c in url[1:i]):
...
scheme, url = url[:i].lower(), url[i+1:]
if url[:2] == '//':
netloc, url = _splitnetloc(url, 2)
if allow_fragments and '#' in url:
url, fragment = url.split('#', 1)
if '?' in url:
url, query = url.split('?', 1)
...
return SplitResult(scheme, netloc, url, query, fragment)

Results are cached with functools.lru_cache for repeated calls with the same URL.

urljoin

Resolves a relative URL against a base URL per RFC 3986. The allow_fragments parameter controls whether fragment identifiers are preserved.

# CPython: Lib/urllib/parse.py:530 urljoin
def urljoin(base, url, allow_fragments=True):
...
bscheme, bnetloc, bpath, bparams, bquery, bfragment = urlparse(base, ...)
scheme, netloc, path, params, query, fragment = urlparse(url, ...)
if scheme != bscheme or scheme not in uses_relative:
...
if not path and not params:
path = bpath
...
elif path[:1] == '/':
...
else:
# Merge: replace last component of base path
...

quote and unquote

quote(string, safe='/', encoding=None, errors=None) percent-encodes all characters not in safe or the unreserved set. Uses a fast path for ASCII bytes and a slow path for Unicode.

unquote(string, encoding='utf-8', errors='replace') decodes %XX sequences. Since Python 3.1 it handles multi-byte UTF-8 sequences encoded as multiple %XX triples.

parse_qs and parse_qsl

parse_qsl returns a list of (key, value) pairs. parse_qs returns a dict mapping each key to a list of all its values (multiple key=val pairs with the same key accumulate).

gopy notes

Status: not yet ported. Go's net/url package covers urlparse/urljoin. url.Values.Encode() replaces urlencode. url.ParseQuery replaces parse_qs. The main gap is quote/unquote with custom safe sets; Go's url.QueryEscape uses a fixed safe set.