Lib/email/__init__.py
cpython 3.14 @ ab2d84fe1023/Lib/email/__init__.py
The email package spans roughly a dozen submodules. This annotation
covers the four most load-bearing files:
email/message.py: theMessageclass (and itsEmailMessagesubclass fromemail.policy), the header dict and payload storage.email/feedparser.py: theFeedParser/BytesFeedParserstate machine that parses an incremental byte or character stream into aMessagetree.email/generator.py:Generator,BytesGenerator, andDecodedGenerator, which serializeMessageobjects back to text.email/mime/: theMIMEText,MIMEMultipart,MIMEBase, and related convenience constructors.
email/__init__.py itself is a thin re-export layer that exposes the
convenience functions message_from_string, message_from_bytes,
message_from_file, and message_from_binary_file. All real logic lives
in the submodules.
The policy system (email.policy) controls whether the package operates
in the legacy compat32 mode (which tolerates malformed headers and
returns unstructured strings) or in the modern EmailPolicy mode (RFC
6532, which returns structured header objects from email.headerregistry).
Map
| Lines | Symbol | Role | gopy |
|---|---|---|---|
| 1-30 | message_from_string, message_from_bytes, message_from_file, message_from_binary_file | Thin wrappers around Parser and BytesParser; accept an optional policy keyword. | (pending) |
| message.py 1-150 | Message.__init__, _charset, _payload | Sets up the header list, payload slot, and default policy. | (pending) |
| message.py 150-400 | Message.__str__, as_string, as_bytes, __bytes__, is_multipart, set_unixfrom, get_unixfrom, attach, get_payload, set_payload | Serialization shims, multipart attachment list, and payload accessor. | (pending) |
| message.py 400-700 | set_charset, get_charset, __len__, __contains__, keys, values, items, get, get_all, __setitem__, __delitem__, replace_header | Header dict emulation over an ordered list of (name, value) pairs. | (pending) |
| feedparser.py 1-200 | BufferedSubFile, FeedParser.__init__, feed, _parse_headers | Incremental byte feeder and header-parsing entry. | (pending) |
| feedparser.py 200-450 | _parsegen, _parse_message_delivery_status, _parse_multipart | The main generator-coroutine state machine; multipart boundary detection and nesting. | (pending) |
| generator.py 1-200 | Generator.__init__, flatten, clone, _write, _write_headers, _handle_* | Text serializer; dispatches to per-content-type _handle_ methods. | (pending) |
| generator.py 200-350 | BytesGenerator, DecodedGenerator | Binary serializer (re-encodes headers) and decoded-payload serializer. | (pending) |
| mime/text.py | MIMEText | Convenience constructor: sets Content-Type, charset, and encodes the payload. | (pending) |
| mime/multipart.py | MIMEMultipart | Sets Content-Type: multipart/X and manages the boundary parameter. | (pending) |
| headerregistry.py 1-400 | HeaderRegistry, Address, Group, structured header classes | Parses raw header strings into typed objects under the modern policy. | (pending) |
Reading
FeedParser state machine (feedparser.py lines 200 to 450)
cpython 3.14 @ ab2d84fe1023/Lib/email/feedparser.py#L200-450
class FeedParser:
def __init__(self, _factory=None, *, policy=compat32):
self.policy = policy
self._factory = _factory or policy.message_factory
self._input = BufferedSubFile()
self._msgstack = []
self._parse = self._parsegen().__next__
self._cur = None
self._last = None
self._headersonly = False
def feed(self, data):
"""Feed more data into the parser."""
self._input.push(data)
self._call_parse()
def _call_parse(self):
try:
self._parse()
except StopIteration:
pass
def _parsegen(self):
# Phase 1: collect the headers.
for retval in self._parse_headers(self._input):
yield retval
# Phase 2: parse the body based on content-type.
if self._headersonly:
lines = []
while True:
line = self._input.readline()
if line == NeedMoreData:
yield NeedMoreData
continue
if not line:
break
lines.append(line)
self._cur.set_payload(EMPTYSTRING.join(lines))
return
...
if self._cur.get_content_maintype() == 'multipart':
for retval in self._parse_multipart(boundary):
yield retval
...
FeedParser uses a generator-coroutine pattern. _parsegen is a
generator whose __next__ method is stored as self._parse. Each call
to feed() pushes data into _input (a BufferedSubFile) and then
calls self._parse() once, which advances the generator one step.
The generator yields NeedMoreData whenever it calls _input.readline()
and gets back the sentinel NeedMoreData object rather than a real line.
The outer call to _call_parse catches the StopIteration that signals
the generator is done.
BufferedSubFile maintains a list of pushed strings and a position
pointer. readline() returns the next newline-terminated chunk or
NeedMoreData if no complete line is buffered yet. This design means the
caller can feed the parser arbitrary byte chunks (TCP fragments, etc.)
without worrying about line boundaries.
Message.get_payload encoding (message.py lines 150 to 400)
cpython 3.14 @ ab2d84fe1023/Lib/email/message.py#L150-400
def get_payload(self, i=None, decode=False):
if self.is_multipart():
if decode:
return None
if i is None:
return self._payload
else:
return self._payload[i]
# Non-multipart.
if i is not None and not isinstance(self._payload, list):
raise TypeError('Expected list, got %s' % type(self._payload))
payload = self._payload
cte = str(self.get('content-transfer-encoding', '')).lower()
if decode:
if cte == 'quoted-printable':
return quopri.decodestring(payload.encode('raw-unicode-escape'))
elif cte == 'base64':
try:
return _decode_b64(payload)
except binascii.Error:
return payload.encode('raw-unicode-escape')
elif cte in ('x-uuencode', 'uuencode', 'uue', 'x-uue'):
...
else:
return payload.encode('raw-unicode-escape')
return payload
_payload holds either a string (for single-part messages) or a list of
Message objects (for multipart). The decode=True path decodes the
Content-Transfer-Encoding and always returns bytes; the decode=False
path returns the raw (possibly encoded) string.
The 'raw-unicode-escape' encoding is used as a lossless bridge between
the 8-bit bytes that arrived from the network and Python's str type: it
maps each byte value 0x80-0xFF to the corresponding Latin-1 code point,
preserving the bit pattern exactly. Callers that need decoded text must
apply the Content-Type: charset themselves.
MIME multipart boundaries (generator.py lines 1 to 200)
cpython 3.14 @ ab2d84fe1023/Lib/email/generator.py#L1-200
class Generator:
def _handle_multipart(self, msg):
msgtexts = []
subparts = msg.get_payload()
for part in subparts:
s = self._new_buffer()
g = self.clone(s)
g.flatten(part, unixfrom=False, linesep=self._NL)
msgtexts.append(s.getvalue())
boundary = msg.get_boundary()
if not boundary:
boundary = _make_boundary(msgtexts)
msg.set_boundary(boundary)
alltext = self._NL.join([
'--' + boundary,
*interleave(msgtexts, '--' + boundary),
'--' + boundary + '--',
])
self.write(alltext)
Generator.flatten is the entry point. It calls _write, which
dispatches to a _handle_ method based on the message's main content
type. For multipart/* the handler recursively flattens each sub-part
into its own buffer, then joins the buffers with the boundary delimiter.
The boundary is taken from the Content-Type header parameter. If the
header has no boundary (which can happen when building a message
programmatically), _make_boundary generates one by hashing the
sub-part texts and checking that the result does not appear in any
sub-part body, a process that retries with progressively longer hashes
until a unique boundary is found.
gopy mirror
The email package is not yet ported. The parser, generator, and MIME
constructors are all pure Python with no C extension, but the package
is large and has many implicit dependencies (quopri, base64,
binascii, charset, encoders). A port would need to ship the
FeedParser state machine and Message header list as the core, with
MIME and header registry support added incrementally.