Skip to main content

Python/flowgraph.c

Python/flowgraph.c is the control-flow graph layer that sits between the compiler's instruction-sequence emitter and the assembler. It slices the flat instruction stream into basic blocks, runs a sequence of optimization passes including dead-code elimination and peephole rewrites, and linearizes the result back into an ordered block list for the assembler.

Source:

cpython 3.14 @ ab2d84fe1023/Python/flowgraph.c

Map

LinesSymbolRole
~80_PyCfgBuilderMutable builder holding the block graph under construction
~200_PyCfg_FromInstructionSequenceEntry point: converts flat instruction sequence to CFG
~450mark_reachableDFS from the entry block, marks every reachable block
~520_PyCfg_OptimizeCodeUnitTop-level pass runner: calls all sub-passes in order
~600remove_redundant_jumpsDrops unconditional jumps that fall through to the next block
~680push_cold_blocksMoves exception handler blocks after the hot path
~800optimize_basic_blockPer-block peephole: LOAD_CONST folding, NOP removal
~1200_PyCfgBuilder_FinishLinearizes the CFG into a sorted block sequence

Reading

_PyCfgBuilder and basic block graph

_PyCfgBuilder is a heap-allocated struct that owns all blocks via a _PyArena-backed pool. Each basic block (cfg_block) holds an inline array of cfg_instr entries plus a pointer to its fall-through successor and a pointer to its jump-target successor. Blocks also carry a b_visited flag used by DFS passes and a b_cold flag set by push_cold_blocks.

// CPython: Python/flowgraph.c:80 _PyCfgBuilder
typedef struct _PyCfgBuilder {
struct _PyCfg cfg; /* the graph (block list + entry block) */
cfg_block *g_curblock; /* block currently being filled */
PyArena *g_arena; /* memory arena for block/instr allocation */
} _PyCfgBuilder;

Instruction emission calls _PyCfgBuilder_Addop, which appends to g_curblock. A new block is opened with cfg_builder_new_block whenever a jump is emitted or a jump target is encountered. The split point is set retroactively: the instruction sequence carries offset labels and the builder resolves them to block pointers in a second pass.

_PyCfg_OptimizeCodeUnit and dead-code elimination

_PyCfg_OptimizeCodeUnit is the master pass runner. It calls the sub-passes in a fixed order and iterates the whole sequence until no pass changed anything. The dead-code elimination sub-pass (eliminate_dead_code) relies on mark_reachable to identify which blocks are live.

mark_reachable is a simple iterative DFS. It starts from the entry block and follows both the fall-through and jump-target edges of every block it reaches. Blocks not marked after the DFS are dropped by the cleanup pass.

// CPython: Python/flowgraph.c:450 mark_reachable
static int
mark_reachable(cfg_builder *g)
{
cfg_block **stack = NULL;
int nstack = 0;
g->g_entryblock->b_visited = 1;
PUSH_BLOCK(stack, nstack, g->g_entryblock);
while (nstack > 0) {
cfg_block *b = POP_BLOCK(stack, nstack);
for (int i = 0; i < b->b_iused; i++) {
cfg_instr *instr = &b->b_instr[i];
if (instr->i_target && !instr->i_target->b_visited) {
instr->i_target->b_visited = 1;
PUSH_BLOCK(stack, nstack, instr->i_target);
}
}
if (b->b_next && !b->b_next->b_visited) {
b->b_next->b_visited = 1;
PUSH_BLOCK(stack, nstack, b->b_next);
}
}
PyMem_Free(stack);
return SUCCESS;
}

After marking, any block with b_visited == 0 is unlinked from the block list and its instructions are discarded. The pass also trims instructions after an unconditional RETURN_VALUE or RAISE_VARARGS within a reachable block.

remove_redundant_jumps and peephole optimizations

remove_redundant_jumps scans every block for an unconditional JUMP whose target is the block's immediate fall-through successor. Such a jump can be deleted because control falls through naturally without it. The pass also handles the mirror case: a conditional jump whose true and false targets are both the same block is replaced with a POP_TOP to discard the condition value.

// CPython: Python/flowgraph.c:600 remove_redundant_jumps
static int
remove_redundant_jumps(cfg_builder *g)
{
for (cfg_block *b = g->g_entryblock; b != NULL; b = b->b_next) {
cfg_instr *last = _PyCfg_BasicBlockLastInstr(b);
if (last == NULL) continue;
if (IS_UNCONDITIONAL_JUMP_OPCODE(last->i_opcode)) {
if (last->i_target == b->b_next) {
last->i_opcode = NOP;
}
}
}
return SUCCESS;
}

The per-block peephole pass optimize_basic_block handles two main rewrites. First, adjacent LOAD_CONST pairs followed by a foldable binary operation are collapsed into a single LOAD_CONST carrying the pre-computed result. Second, NOP instructions produced by earlier passes are compacted out of the instruction array so that the block iused count decreases. This keeps the block arrays tight before linearization.

// CPython: Python/flowgraph.c:800 optimize_basic_block
static int
optimize_basic_block(PyObject *const_cache, cfg_block *bb, PyObject *consts)
{
cfg_instr *inst;
for (int i = 0; i < bb->b_iused; i++) {
inst = &bb->b_instr[i];
switch (inst->i_opcode) {
case NOP:
continue; /* removed in compaction below */
case LOAD_CONST: {
int nextop = i+1 < bb->b_iused ? bb->b_instr[i+1].i_opcode : 0;
if (nextop == LOAD_CONST) {
/* potential constant-fold opportunity; handled by caller */
}
break;
}
/* ... other cases ... */
}
}
return fold_nops(bb);
}

push_cold_blocks moves every block whose b_cold flag is set (exception handlers, unreachable-but-retained cleanup code) to the end of the block list. This keeps the hot path contiguous in the instruction stream, which improves instruction cache locality and simplifies branch prediction for common cases.

_PyCfgBuilder_Finish, linearization

After all passes converge, _PyCfgBuilder_Finish walks the block list in order and emits each block's instructions into a fresh _PyInstructionSequence. Jump targets are converted from block pointers back to instruction offsets using the accumulated byte-offset of each block's start position. The result is the flat instruction array that the assembler (Python/assemble.c) consumes to produce the PyCodeObject.

// CPython: Python/flowgraph.c:1200 _PyCfgBuilder_Finish
int
_PyCfgBuilder_Finish(cfg_builder *g, _PyInstructionSequence *seq)
{
int offset = 0;
for (cfg_block *b = g->g_entryblock; b != NULL; b = b->b_next) {
b->b_offset = offset;
offset += b->b_iused;
}
for (cfg_block *b = g->g_entryblock; b != NULL; b = b->b_next) {
for (int i = 0; i < b->b_iused; i++) {
cfg_instr *instr = &b->b_instr[i];
if (instr->i_target) {
instr->i_oparg = instr->i_target->b_offset;
}
if (_PyInstructionSequence_Addop(seq, instr->i_opcode,
instr->i_oparg,
&instr->i_loc) < 0) {
return ERROR;
}
}
}
return SUCCESS;
}

Block offsets are computed in a first linear scan, then a second scan resolves all jump operands using those offsets. The two-pass design avoids forward- reference patching during the first scan.

gopy notes

Status: partially ported.

The flow graph is implemented in compile/flowgraph.go and its companion files (compile/flowgraph_passes.go, compile/flowgraph_jumps.go, compile/flowgraph_stackdepth.go, compile/flowgraph_except.go). The basic block struct and builder are ported. mark_reachable, remove_redundant_jumps, and the NOP-compaction portion of optimize_basic_block are shipped. push_cold_blocks, constant-folding peephole rewrites, and the full _PyCfg_OptimizeCodeUnit iteration loop are pending.

Planned package path: compile (all flow-graph files).