Skip to main content

Lib/urllib (part 3)

Source:

cpython 3.14 @ ab2d84fe1023/Lib/urllib/request.py

This annotation covers HTTP request dispatching. See lib_urllib2_detail for urllib.parse.urlparse, urlencode, quote, and unquote.

Map

LinesSymbolRole
1-80urllib.request.urlopenTop-level HTTP/FTP/file open
81-160Request classEncapsulate URL, headers, method
161-240OpenerDirectorChain of handlers
241-360HTTPHandlerOpen an HTTP URL
361-500ProxyHandlerInject proxy from environment

Reading

urllib.request.urlopen

# CPython: Lib/urllib/request.py:220 urlopen
def urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
*, cafile=None, capath=None, cadefault=False, context=None):
global _opener
if context is not None:
opener = build_opener(HTTPSHandler(context=context))
elif cafile or capath or cadefault:
opener = build_opener(HTTPSHandler(...))
else:
if _opener is None:
_opener = build_opener()
opener = _opener
return opener.open(url, data, timeout)

urlopen builds (or reuses) a global OpenerDirector and delegates to it. Passing context (an ssl.SSLContext) replaces the default HTTPSHandler for that call only. The module-level _opener is the lazy global.

Request class

# CPython: Lib/urllib/request.py:340 Request.__init__
class Request:
def __init__(self, url, data=None, headers={}, origin_req_host=None,
unverifiable=False, method=None):
self.full_url = url
self.data = data
self._tunnel_host = None
self.headers = {}
for key, value in headers.items():
self.add_header(key, value)
self.unverifiable = unverifiable
self.method = method

@property
def full_url(self):
return self._full_url

@full_url.setter
def full_url(self, url):
self._full_url, self.fragment = splittag(url)
self.type, self._r_host = splittype(self._full_url)

Request separates the URL into components. method=None means auto-detect: GET if data is None, POST if data is not None.

OpenerDirector

# CPython: Lib/urllib/request.py:440 OpenerDirector.open
class OpenerDirector:
def open(self, fullurl, data=None, timeout=...):
req = fullurl if isinstance(fullurl, Request) else Request(fullurl, data)
req.timeout = timeout
protocol = req.type
# Call all request handlers (e.g. ProxyHandler)
meth_name = protocol + '_request'
for processor in self.process_request.get(protocol, []):
meth = getattr(processor, meth_name)
req = meth(req)
# Open
r = self._open(req, data)
# Call all response handlers
...
return r

Handlers are called in priority order. Request handlers (e.g., ProxyHandler) can modify the Request object before it is opened. Response handlers (e.g., HTTPErrorProcessor) can follow redirects or raise HTTPError.

HTTPHandler

# CPython: Lib/urllib/request.py:1360 HTTPHandler.http_open
class HTTPHandler(AbstractHTTPHandler):
def http_open(self, req):
return self.do_open(http.client.HTTPConnection, req)

def do_open(self, http_class, req, **http_conn_args):
host = req.host
h = http_class(host, timeout=req.timeout, **http_conn_args)
...
h.request(req.get_method(), req.selector, req.data, headers)
r = h.getresponse()
return addinfourl(r, r.msg, req.full_url, r.status)

HTTPHandler creates an http.client.HTTPConnection, sends the request, and wraps the response in addinfourl (a file-like object with .status, .headers, .url).

ProxyHandler

# CPython: Lib/urllib/request.py:760 ProxyHandler
class ProxyHandler(BaseHandler):
def __init__(self, proxies=None):
if proxies is None:
proxies = getproxies() # reads HTTP_PROXY, HTTPS_PROXY, etc.
for type, url in proxies.items():
setattr(self, '%s_open' % type,
lambda r, proxy=url, type=type, meth=self.proxy_open:
meth(r, proxy, type))

ProxyHandler reads HTTP_PROXY/HTTPS_PROXY environment variables and dynamically creates http_open/https_open methods that rewrite the request to go through the proxy.

gopy notes

urlopen is module/urllib.URLOpen in module/urllib/module.go. It uses net/http.Get or net/http.Post. Request is module/urllib.Request. HTTPHandler.do_open creates an http.Client with the configured timeout. ProxyHandler uses http.ProxyFromEnvironment.