Lib/heapq.py
cpython 3.14 @ ab2d84fe1023/Lib/heapq.py
heapq exposes a min-heap that operates directly on a Python list without a
wrapper class. The module ships both a pure-Python implementation (this file)
and a C accelerator in _heapq. At import time the module replaces its own
functions with the C versions when available, so the Python code is the
reference implementation and the fallback for environments without the
extension.
Map
| Lines | Symbol | Role |
|---|---|---|
| 1-60 | module docstring, __all__ | Documents the heap invariant (heap[k] <= heap[2*k+1] and heap[2*k+2]) |
| 61-100 | heappush | Appends item, then calls _siftdown to restore the invariant |
| 101-130 | heappop | Swaps root with last element, shrinks list, calls _siftup on position 0 |
| 131-160 | heapreplace | Overwrites root without a pop/push round-trip; caller must supply a value no smaller than the replaced root |
| 161-185 | heappushpop | Atomic push-then-pop; short-circuits if the new item is already smallest |
| 186-220 | heapify | Floyd's linear-time construction: calls _siftup on every non-leaf in reverse order |
| 221-265 | _siftdown | Bubble-up: walks from leaf toward root while child is smaller than parent |
| 266-315 | _siftup | Trickle-down: walks from root toward leaves, always choosing the smaller child |
| 316-340 | nlargest, nsmallest | Strategy switch: sorted for n=1 or large n, heapq path for intermediate n |
| 341-350 | merge | k-way merge using a min-heap of (value, order, iterator) triples |
Reading
_siftdown and _siftup index arithmetic
The heap is stored as a flat list. For a node at index k, its parent is at
(k - 1) >> 1, its left child at 2*k + 1, and its right child at 2*k + 2.
_siftdown is the bubble-up path used after inserting at the end.
_siftup is the trickle-down path used after removing the root.
# CPython: Lib/heapq.py:221 _siftdown
def _siftdown(heap, startpos, pos):
newitem = heap[pos]
while pos > startpos:
parentpos = (pos - 1) >> 1
parent = heap[parentpos]
if newitem < parent:
heap[pos] = parent
pos = parentpos
continue
break
heap[pos] = newitem
# CPython: Lib/heapq.py:266 _siftup
def _siftup(heap, pos):
endpos = len(heap)
startpos = pos
newitem = heap[pos]
# Bubble up the smaller child until hitting a leaf.
childpos = 2*pos + 1 # leftmost child position
while childpos < endpos:
rightpos = childpos + 1
if rightpos < endpos and not heap[childpos] < heap[rightpos]:
childpos = rightpos
heap[pos] = heap[childpos]
pos = childpos
childpos = 2*pos + 1
heap[pos] = newitem
_siftdown(heap, startpos, pos)
_siftup first drives newitem all the way to a leaf (taking the smaller
child at each step), then calls _siftdown from that leaf position to place
newitem correctly. This two-phase approach reduces comparisons versus a
naive trickle-down, at the cost of one extra _siftdown call.
heappop and heapreplace
heappop avoids a naive "remove index 0 and re-heapify" approach. Instead it
swaps the last element into position 0 and sifts it down. This keeps the
operation O(log n).
# CPython: Lib/heapq.py:101 heappop
def heappop(heap):
lastelt = heap.pop()
if heap:
returnitem = heap[0]
heap[0] = lastelt
_siftup(heap, 0)
return returnitem
return lastelt
# CPython: Lib/heapq.py:131 heapreplace
def heapreplace(heap, item):
returnitem = heap[0]
heap[0] = item
_siftup(heap, 0)
return returnitem
heapreplace is faster than heappop followed by heappush because it does
a single _siftup instead of a _siftup and a _siftdown. The contract
(caller must not pass a value smaller than the current root) lets the
implementation skip the upward phase entirely.
heapify linear-time Floyd construction
Calling heappush n times is O(n log n). heapify uses Floyd's algorithm to
build a valid heap in O(n) by sifting down every non-leaf node in reverse
order. Leaves (indices n//2 and above) are trivially valid 1-element heaps
and need no work.
# CPython: Lib/heapq.py:186 heapify
def heapify(x):
n = len(x)
for i in reversed(range(n//2)):
_siftup(x, i)
The O(n) bound follows from the fact that nodes near the bottom of the tree are the most numerous and also require the fewest sift steps.
nlargest and nsmallest strategy switch
Both functions choose between three strategies depending on n relative to
the length of the input iterable.
# CPython: Lib/heapq.py:316 nsmallest
def nsmallest(n, iterable, key=None):
if n < 0:
return []
it = iter(iterable)
result = _nsmallest(n, it) # grab first n elements
if not result:
return result
if len(result) < n:
if key is None:
result.sort()
else:
result.sort(key=key)
return result
# General case: maintain a heap of the n smallest seen so far.
if key is None:
top = result[-1]
order = top
_heappop, _heapreplace, _heappush = heappop, heapreplace, heappush
heapify(result)
...
For n == 1 the function simply calls min(). For large n (roughly when
n * 4 >= len(iterable)) a full sort is cheaper. Between those bounds the
heap strategy runs in O(m log n) where m is the total input size.
merge k-way merge
merge accepts any number of sorted iterables and produces a single sorted
output without materialising all inputs at once. It seeds a min-heap with one
item from each iterator and repeatedly pops the minimum, refilling from the
same iterator.
# CPython: Lib/heapq.py:341 merge (core loop)
h = []
h_append = h.append
for order, it in enumerate(map(iter, iterables)):
try:
next_ = next
value = next_(it)
h_append([value, order, it, next_])
except StopIteration:
pass
heapify(h)
while len(h) > 1:
try:
while True:
value, order, it, next_ = s = h[0]
yield value
s[0] = value = next_(it)
heapreplace(h, s)
except StopIteration:
heappop(h)
break
The order field breaks ties between equal values from different iterables,
preserving stability. The tuple comparison [value, order, ...] means Python
never reaches the iterator field (which is not orderable) unless values
and order both tie, which cannot happen because each iterator has a unique
order index.
gopy notes
heapqis not yet ported to gopy. The functions are pure algorithmic code with no CPython internals, so a port undermodule/heapq/would be straightforward once the module loader (vm/eval_import.go) can serve it._siftdownand_siftupoperate on a*objects.Listequivalent. gopy's list type (objects/list.go) exposes indexed get/set at O(1), matching the assumption these functions rely on.heapreplaceandheappushpopare the two functions most worth porting early becauseCounter.most_commoncallsheapq.nlargest, which in turn callsheapreplacein its inner loop.mergeusesnext_stored directly in the heap element to avoid a per-iteration attribute lookup. gopy's iterator protocol goes throughobjects/protocol.goNext(), which has similar cost to a Python slot call.- The strategy switch in
nlargest/nsmallestinvolveslen(iterable), which requires consuming a tee or knowing the length upfront. gopy'sSizedcheck inobjects/protocol.gomirrors thehasattr(it, '__len__')guard CPython uses.