Skip to main content

Python/ast.c

Source:

cpython 3.14 @ ab2d84fe1023/Python/ast.c

Python/ast.c runs immediately after the parser returns an AST and before the symbol table or compiler walk it. It has two distinct responsibilities. The first is structural validation (_PyAST_Validate): it checks invariants the ASDL grammar cannot express, such as non-negative line numbers, consistent Load/Store/Del contexts, and the absence of meaningless node combinations. The second is optimization (_PyAST_Optimize): it performs a source-level constant-folding pass that collapses binary operations on literals, folds comparison chains, and merges adjacent string constants. The file also contains _Py_Mangle, the implementation of Python's name-mangling rule for __name identifiers inside class bodies.

Map

LinesSymbolRole
1-40includes, forward declarationspycore_ast.h, pycore_compile.h, static prototypes
41-80_PyAST_ValidatePublic entry point; sets up the recursion-depth counter
81-140validate_modDispatch over Module, Interactive, Expression, FunctionType
141-420validate_stmtRecursive validator for all statement node kinds
421-720validate_exprRecursive validator for all expression node kinds
721-780validate_argumentsChecks on function argument lists (duplicate names, etc.)
781-850validate_patternStructural-pattern node validator (3.10+)
851-950_Py_MangleName-mangling for __name identifiers in class bodies
951-1050fold_binopConstant-folds BinOp nodes whose both sides are literals
1051-1150fold_compareConstant-folds Compare nodes with literal operands
1151-1230fold_tupleConverts Tuple(Load) of constants to a single Constant
1231-1380_PyAST_ExprConstantValueCheck if an expression reduces to a constant
1381-1500_PyAST_OptimizeTop-level constant-folding driver; walks the full AST

Reading

Validation entry point and recursion guard

_PyAST_Validate is the sole public entry point for the validation pass. CPython calls it from Python/pythonrun.c right after PyParser_ASTFromStringObject returns, and also from the compile() built-in when it receives a pre-built AST object. The recursion counter mirrors the one in ceval.c and prevents deeply nested AST nodes from exhausting the C stack before they reach Python's recursion limit.

// CPython: Python/ast.c:41 _PyAST_Validate
int
_PyAST_Validate(mod_ty mod)
{
int res = -1;
PyThreadState *tstate = _PyThreadState_GET();
int recursion_limit = C_RECURSION_LIMIT;
int starting_recursion_depth;
if (_Py_EnterRecursiveCallTstate(tstate, " validating AST")) {
return -1;
}
starting_recursion_depth = tstate->c_recursion_remaining;
res = validate_mod(mod);
tstate->c_recursion_remaining = starting_recursion_depth;
_Py_LeaveRecursiveCallTstate(tstate);
return res;
}

The counter is decremented on every recursive validate_expr and validate_stmt call. If it hits zero, the function raises RecursionError and returns -1. This guard is separate from the Python-level recursion guard because AST nodes are a C-level structure.

validate_stmt: structural vs. semantic errors

validate_stmt distinguishes two error classes. Pure structural violations (wrong node kind in a position, a missing required field) raise SystemError because they indicate a bug in the parser or in external code that built AST nodes programmatically. Semantic violations (augmented assignment to a non-store target, return with a value in a generator with yield from) raise SyntaxError with a proper filename and line number attached. The context argument (Load / Store / Del) threads through all validate_expr calls to reject constructs like (a + b) = c without a separate tree walk.

// CPython: Python/ast.c:141 validate_stmt
static int
validate_stmt(struct validator *state, stmt_ty stmt)
{
int ret = -1;
if (!_Py_EnterRecursiveCall(" validating AST")) {
return -1;
}
switch (stmt->kind) {
case AugAssign_kind:
ret = validate_expr(state, stmt->v.AugAssign.target, Store)
&& validate_expr(state, stmt->v.AugAssign.value, Load);
break;
case Return_kind:
ret = !stmt->v.Return.value
|| validate_expr(state, stmt->v.Return.value, Load);
break;
/* ... */
default:
PyErr_SetString(PyExc_SystemError, "unexpected statement");
ret = -1;
}
_Py_LeaveRecursiveCall();
return ret;
}

_Py_Mangle: name-mangling for class bodies

Python's reference manual specifies that any identifier of the form __spam (at least two leading underscores, at most one trailing underscore) that appears inside a class body is textually transformed to _classname__spam. This transformation happens at the AST level, not at parse time, so that tools that build ASTs programmatically also benefit from it. _Py_Mangle is called from the compiler (not from the validator) for every name that passes through a class scope.

// CPython: Python/ast.c:851 _Py_Mangle
PyObject *
_Py_Mangle(PyObject *privateobj, PyObject *ident)
{
/* privateobj is the class name; ident is the raw identifier. */
if (privateobj == NULL || !PyUnicode_Check(privateobj)) {
return Py_NewRef(ident);
}
const char *name = PyUnicode_AsUTF8(ident);
if (name == NULL || name[0] != '_' || name[1] != '_') {
return Py_NewRef(ident);
}
/* strip leading underscores from the class name */
/* build "_classname__ident" */
/* ... */
}

The strip of leading underscores from the class name is intentional: ___Foo becomes Foo for the purpose of mangling, so __bar in ___Foo still mangles to _Foo__bar. At most one trailing underscore is preserved in the identifier.

Constant folding: fold_binop and fold_tuple

_PyAST_Optimize is the top-level driver for the constant-folding pass. It calls astfold_expr on every expression node, which in turn calls the specialized folders. The folding happens at the AST level so that the resulting simplified AST is what the compiler and the symbol table both see.

fold_binop checks whether both operands of a BinOp node are Constant nodes and whether the operator is one of the foldable set (Add, Sub, Mult, Div, Mod, Pow, FloorDiv, BitAnd, BitOr, BitXor, LShift, RShift, MatMult). If so, it evaluates the operation at compile time and replaces the BinOp with a new Constant node.

// CPython: Python/ast.c:951 fold_binop
static int
fold_binop(expr_ty node, PyArena *arena, _PyOptimizerObject *optimizer)
{
expr_ty lhs = node->v.BinOp.left;
expr_ty rhs = node->v.BinOp.right;
if (lhs->kind != Constant_kind || rhs->kind != Constant_kind) {
return 1; /* nothing to fold */
}
PyObject *lv = lhs->v.Constant.value;
PyObject *rv = rhs->v.Constant.value;
PyObject *result = binary_ops[node->v.BinOp.op](lv, rv);
if (result == NULL) {
/* overflow or type error: clear and leave node untouched */
PyErr_Clear();
return 1;
}
COPY_NODE(node, _PyAST_Constant(result, NULL, node->lineno,
node->col_offset, node->end_lineno,
node->end_col_offset, arena));
Py_DECREF(result);
return 1;
}

fold_tuple handles the common case where a Tuple node in Load context contains only Constant children. It converts the whole tuple to a single Constant holding a Python tuple, which avoids runtime allocation for literal tuple expressions like (1, 2, 3).

fold_compare folds Compare nodes such as 1 < 2 or "a" == "a" where all operands are constants and all operators are equality or ordering comparisons.

gopy notes

_PyAST_Validate is not yet ported in gopy. The planned home is compile/astvalidate. The Go equivalent will be Validate(mod ast.Mod) error, using a validator struct with a depth int field for the recursion guard.

_PyAST_Optimize maps to a future compile/astopt package. The constant-folding logic for fold_binop and fold_tuple can reuse Go's math/big for integer overflow detection. fold_compare is straightforward for numeric types; string comparisons need care around interning.

_Py_Mangle is partially replicated in compile/compiler.go via the mangle helper. The Go version should be reconciled against the CPython definition to ensure it handles the edge cases (leading underscores stripped from the class name, single trailing underscore preserved).