Skip to main content

Python/compile.c (part 4)

Source:

cpython 3.14 @ ab2d84fe1023/Python/compile.c

This annotation covers the backend of the CPython compiler: how basic blocks are assembled into bytecode, how jump targets are resolved, and how the exception table (co_exceptiontable) is encoded.

Map

LinesSymbolRole
1-300basicblock, cfg_builderBasic block and control-flow graph types
301-600assemble_jump_offsetsFirst pass: assign PC offsets to basic blocks
601-900assemble_instructionsSecond pass: emit instruction bytes
901-1200build_exception_tableConstruct co_exceptiontable from block annotations
1201-1500optimize_basic_block, remove_redundant_jumpsPeephole optimizations
1501-1800write_location_entryEncode co_linetable entries

Reading

Basic block structure

// CPython: Python/compile.c:188 basicblock
typedef struct basicblock_ {
struct basicblock_ *b_list; /* linked list of all blocks */
struct basicblock_ *b_next; /* fall-through successor */
struct cfg_instr *b_instr; /* instruction array */
int b_iused; /* instructions used */
int b_ialloc; /* instructions allocated */
int b_label; /* target label (if any) */
int b_offset; /* bytecode offset (assigned in pass 1) */
unsigned b_startdepth; /* stack depth at block entry */
unsigned b_reachable; /* 1 if reachable from the entry block */
} basicblock;

Jump offset resolution

// CPython: Python/compile.c:480 assemble_jump_offsets
static int
assemble_jump_offsets(struct assembler *a, struct compiler *c)
{
/* Assign a byte offset to each basic block.
Iterate until stable (jumps may expand due to wide encoding). */
int again = 1;
while (again) {
again = 0;
int offset = 0;
for each block b:
b->b_offset = offset;
for each instr in b:
offset += instr_size(instr);
/* Recheck: a jump whose target moved now needs a wider encoding */
for each block b:
for each jump instr in b:
if target_offset(instr) != old_target(instr):
again = 1;
}
}

CPython uses EXTENDED_ARG prefixes to encode jump targets larger than 255. The iterative approach handles rare cases where expanding one jump causes another to cross the 255 threshold.

Exception table encoding

// CPython: Python/compile.c:940 build_exception_table
static int
build_exception_table(struct assembler *a, struct compiler *c)
{
/* For each (start, end, handler, stack_depth, lasti) interval: */
for each except_block:
write_varint(start_offset);
write_varint(size); /* end - start */
write_varint(handler_offset);
write_varint(stack_depth);
write_bool(lasti); /* should push lasti onto stack? */
}

The exception table replaces the old SETUP_FINALLY/POP_BLOCK opcode pairs (removed in 3.11). When an exception is raised, the interpreter does a binary search over co_exceptiontable to find the handler.

Peephole: LOAD_CONST + RETURN_VALUE

// CPython: Python/compile.c:1240 optimize_basic_block
/* Fold: LOAD_CONST x; RETURN_VALUE -> RETURN_CONST x */
if (instr->i_opcode == RETURN_VALUE &&
prev->i_opcode == LOAD_CONST) {
prev->i_opcode = NOP;
instr->i_opcode = RETURN_CONST;
instr->i_oparg = prev->i_oparg;
}

gopy notes

The Go compiler in compile/flowgraph.go and compile/flowgraph_jumps.go performs the same two-pass assembly. Jump offset resolution is in compile/flowgraph_jumps.go. The exception table is constructed in compile/flowgraph_except.go. The encoding format is identical to CPython 3.11+ so that .pyc files produced by gopy are compatible.