Skip to main content

Lib/xml/ (part 2)

Source:

cpython 3.14 @ ab2d84fe1023/Lib/xml/etree/ElementTree.py

This annotation covers tree traversal and serialization. See lib_xml_detail for ElementTree.__init__, Element.__init__, the C accelerator _elementtree, and namespace handling.

Map

LinesSymbolRole
1-100ElementTree.parseParse XML into an Element tree
101-220Element.find / findallXPath subset queries on the tree
221-360ElementTree.writeSerialize tree to XML bytes or string
361-500iterparseStream-parse large XML documents event by event
501-700xml.dom.minidomDOM Level 2 parseString, toprettyxml

Reading

ElementTree.parse

# CPython: Lib/xml/etree/ElementTree.py:580 ElementTree.parse
def parse(self, source, parser=None):
"""Load an external XML document into this element tree."""
close_source = False
if not hasattr(source, "read"):
source = open(source, "rb")
close_source = True
try:
if parser is None:
parser = XMLParser()
while True:
data = source.read(65536)
if not data:
break
parser.feed(data)
self._root = parser.close()
return self._root
finally:
if close_source:
source.close()

ET.parse('file.xml') reads in 64 KB chunks, feeding each to XMLParser (backed by expat). The root Element is returned and stored as self._root. For large documents, iterparse is preferred as it does not build the full tree.

Element.findall

# CPython: Lib/xml/etree/ElementTree.py:420 Element.findall
def findall(self, path, namespaces=None):
"""Find all subelements matching the XPath path."""
return ElementPath.findall(self, path, namespaces)

root.findall('.//{http://ns}tag') uses ElementPath, a minimal XPath subset supporting tag, *, ., .., [@attr], [tag], [position]. Full XPath is not supported; use lxml for that.

ElementTree.write

# CPython: Lib/xml/etree/ElementTree.py:660 ElementTree.write
def write(self, file_or_filename, encoding=None, xml_declaration=None,
default_namespace=None, method="xml", short_empty_elements=True):
"""Write the element tree to a file."""
...
serialize = _serialize[method] # 'xml', 'html', 'text', 'c14n'
serialize(write, self._root, qnames, namespaces, short_empty_elements)

tree.write('out.xml', encoding='unicode') returns a string. encoding='unicode' means write to a str file object; any other encoding writes bytes. short_empty_elements=True produces <br /> instead of <br></br>.

iterparse

# CPython: Lib/xml/etree/ElementTree.py:1020 iterparse
def iterparse(source, events=None, parser=None):
"""Incrementally parse XML. Yields (event, elem) pairs."""
# events: 'start', 'end', 'start-ns', 'end-ns', 'comment', 'pi'
close_source = not hasattr(source, "read")
...
pullparser = XMLPullParser(events=events, _parser=parser)
while True:
data = source.read(16384)
if not data:
break
pullparser.feed(data)
yield from pullparser.read_events()
pullparser.close()

iterparse is memory-efficient for large files. The common pattern is to clear() elements after processing to free memory: for event, elem in ET.iterparse(f): ... elem.clear().

gopy notes

ElementTree.parse is module/xml/etree.ElementTree.Parse in module/xml/etree/module.go. It uses Go's encoding/xml tokenizer under the hood. Element.findall implements the ElementPath mini-XPath evaluator in module/xml/etree/path.go. ElementTree.write serializes using a recursive visitor. iterparse is a Go channel-based iterator.