Lib/urllib/parse.py

Source:

cpython 3.14 @ ab2d84fe1023/Lib/urllib/parse.py

urllib.parse provides RFC 3986 URL parsing and manipulation. It is pure Python with no C extension. The module is widely used for constructing and decomposing HTTP URLs.

Map

Lines	Symbol	Role
1-80	Scheme classification dicts	`uses_netloc`, `uses_query`, `uses_fragment`
81-200	`urlparse`, `urlsplit`	Parse a URL into components
201-320	`ParseResult`, `SplitResult`	Named tuple classes with `geturl()` method
321-450	`urlunsplit`, `urlunparse`, `urljoin`	Reconstruct and resolve URLs
451-600	`quote`, `quote_plus`, `unquote`, `unquote_plus`	Percent-encoding
601-700	`urlencode`	Dict to query string
701-850	`parse_qs`, `parse_qsl`	Query string parsing
851-1100	`splittype`, `splithost`, etc.	Legacy split functions (deprecated)

Reading

`urlsplit`

# CPython: Lib/urllib/parse.py:452 urlsplit
def urlsplit(url, scheme='', allow_fragments=True):
    ...
    netloc = query = fragment = ''
    i = url.find(':')
    if i > 0 and url[0].isalpha() and all(c in scheme_chars for c in url[1:i]):
        ...
        scheme, url = url[:i].lower(), url[i+1:]
    if url[:2] == '//':
        netloc, url = _splitnetloc(url, 2)
    if allow_fragments and '#' in url:
        url, fragment = url.split('#', 1)
    if '?' in url:
        url, query = url.split('?', 1)
    ...
    return SplitResult(scheme, netloc, url, query, fragment)

Results are cached with functools.lru_cache for repeated calls with the same URL.

`urljoin`

Resolves a relative URL against a base URL per RFC 3986. The allow_fragments parameter controls whether fragment identifiers are preserved.

# CPython: Lib/urllib/parse.py:530 urljoin
def urljoin(base, url, allow_fragments=True):
    ...
    bscheme, bnetloc, bpath, bparams, bquery, bfragment = urlparse(base, ...)
    scheme, netloc, path, params, query, fragment = urlparse(url, ...)
    if scheme != bscheme or scheme not in uses_relative:
        ...
    if not path and not params:
        path = bpath
        ...
    elif path[:1] == '/':
        ...
    else:
        # Merge: replace last component of base path
        ...

`quote` and `unquote`

quote(string, safe='/', encoding=None, errors=None) percent-encodes all characters not in safe or the unreserved set. Uses a fast path for ASCII bytes and a slow path for Unicode.

unquote(string, encoding='utf-8', errors='replace') decodes %XX sequences. Since Python 3.1 it handles multi-byte UTF-8 sequences encoded as multiple %XX triples.

`parse_qs` and `parse_qsl`

parse_qsl returns a list of (key, value) pairs. parse_qs returns a dict mapping each key to a list of all its values (multiple key=val pairs with the same key accumulate).

gopy notes

Status: not yet ported. Go's net/url package covers urlparse/urljoin. url.Values.Encode() replaces urlencode. url.ParseQuery replaces parse_qs. The main gap is quote/unquote with custom safe sets; Go's url.QueryEscape uses a fixed safe set.

Map​

Reading​

urlsplit​

urljoin​

quote and unquote​

parse_qs and parse_qsl​

gopy notes​

Map