Skip to main content

Python/Python-ast.c

cpython 3.14 @ ab2d84fe1023/Python/Python-ast.c

Python-ast.c is not written by hand. It is generated by Parser/asdl_c.py from the grammar description in Parser/Python.asdl. The generator emits one C constructor for every node kind in the ASDL grammar, plus the recursive validator _PyAST_Validate.

The file serves three audiences:

  1. The compiler (Python/compile.c) calls the constructors when building the AST from the parse tree.
  2. The ast stdlib module re-exports the same node types as Python classes so user code can inspect or transform trees.
  3. Third-party tools (linters, formatters, type checkers) that embed CPython link against these constructors through the stable C API.

Key groups of symbols:

GroupExample symbolsNotes
Sequence allocator_Py_asdl_seq_new, _Py_asdl_generic_seq_newArena-backed; no individual frees needed
Module constructors_PyAST_Module, _PyAST_Interactive, _PyAST_ExpressionTop-level mod_ty nodes
Statement constructors_PyAST_FunctionDef, _PyAST_AsyncFunctionDef, _PyAST_ClassDef, _PyAST_Return, ...All ~20 stmt_ty variants
Expression constructors_PyAST_BoolOp, _PyAST_BinOp, _PyAST_UnaryOp, _PyAST_Lambda, _PyAST_Constant, ...All ~40 expr_ty variants
Pattern constructors_PyAST_MatchValue, _PyAST_MatchOr, ...PEP 634 structural pattern matching
Misc constructors_PyAST_alias, _PyAST_arg, _PyAST_keyword, _PyAST_withitemHelper node types
Validator_PyAST_ValidateRecursive sanity check; called before compilation

Map

LinesSymbolRolegopy
1-200(includes, arena helpers)_Py_asdl_seq_new, _Py_asdl_int_seq_new; arena glue-
201-500Module / Interactive / ExpressionTop-level mod_ty constructors-
501-1800Statement constructors_PyAST_FunctionDef through _PyAST_Nonlocal (all ~20 stmt kinds)-
1801-4500Expression constructors_PyAST_BoolOp through _PyAST_Constant (all ~40 expr kinds)-
4501-5800Pattern constructors_PyAST_MatchValue through _PyAST_MatchAs-
5801-6500Misc node constructors_PyAST_alias, _PyAST_arg, _PyAST_keyword, _PyAST_withitem, _PyAST_match_case-
6501-7500Python type objectsPyAST_type, per-node PyTypeObject definitions for the ast module-
7501-8000_PyAST_ValidateRecursive validator; checks invariants before compile.c consumes the tree-

Reading

Arena-backed sequence allocation

All ASDL sequences are backed by the compile-time PyArena. The allocator returns a typed asdl_seq * and registers the backing block with the arena so it is freed when the arena is torn down, with no per-item cleanup:

/* Python/Python-ast.c:42 _Py_asdl_seq_new */
asdl_seq *
_Py_asdl_seq_new(Py_ssize_t size, PyArena *arena)
{
asdl_seq *seq = NULL;
size_t n;
/* check for overflow */
if (size == 0) {
n = sizeof(asdl_seq);
} else {
if ((size_t)size > (SIZE_MAX - sizeof(asdl_seq)) / sizeof(void *)) {
PyErr_NoMemory();
return NULL;
}
n = sizeof(asdl_seq) + (size_t)size * sizeof(void *);
}
seq = (asdl_seq *)PyArena_Malloc(arena, n);
if (!seq) {
PyErr_NoMemory();
return NULL;
}
seq->size = size;
return seq;
}

Because every allocation is arena-owned, the compiler can build an entire AST without tracking individual node lifetimes.

A representative statement constructor: _PyAST_FunctionDef

Every stmt_ty constructor follows the same pattern: allocate a struct _stmt from the arena, fill its tag and union fields, copy in the source location, and return. No validation is performed here; that is left to _PyAST_Validate.

/* Python/Python-ast.c:560 _PyAST_FunctionDef */
stmt_ty
_PyAST_FunctionDef(identifier name, arguments_ty args, asdl_stmt_seq *body,
asdl_expr_seq *decorator_list, expr_ty returns,
string type_comment, asdl_type_param_seq *type_params,
int lineno, int col_offset, int end_lineno,
int end_col_offset, PyArena *arena)
{
stmt_ty p;
if (!name) {
PyErr_SetString(PyExc_ValueError,
"field 'name' is required for FunctionDef");
return NULL;
}
p = (stmt_ty)PyArena_Malloc(arena, sizeof(*p));
if (!p) return NULL;
p->kind = FunctionDef_kind;
p->v.FunctionDef.name = name;
p->v.FunctionDef.args = args;
p->v.FunctionDef.body = body;
p->v.FunctionDef.decorator_list = decorator_list;
p->v.FunctionDef.returns = returns;
p->v.FunctionDef.type_comment = type_comment;
p->v.FunctionDef.type_params = type_params;
p->lineno = lineno;
p->col_offset = col_offset;
p->end_lineno = end_lineno;
p->end_col_offset = end_col_offset;
return p;
}

Required fields (non-sequence, non-optional) are checked for NULL before the arena allocation; missing them is a programmer error in the parser, not a user-visible exception.

The AST validator

_PyAST_Validate is called from compiler_mod after _PyAST_Optimize and before _PySymtable_Build. It walks the entire tree and asserts structural invariants that the constructors cannot enforce (for example, that a Return node does not appear at module scope):

/* Python/Python-ast.c:7530 validate_expr (inner helper) */
static int
validate_expr(struct validator *state, expr_ty exp, expr_context_ty ctx)
{
/* recursion guard */
if (++state->recursion_depth > state->recursion_limit) {
PyErr_SetString(PyExc_RecursionError, "AST is too deeply nested");
return 0;
}
int ret = -1;
switch (exp->kind) {
case BoolOp_kind:
ret = validate_exprs(state, exp->v.BoolOp.values, Load, 0);
break;
case BinOp_kind:
ret = validate_expr(state, exp->v.BinOp.left, Load) &&
validate_expr(state, exp->v.BinOp.right, Load);
break;
case Constant_kind:
ret = validate_constant(exp->v.Constant.value);
break;
/* ... ~40 more cases ... */
default:
PyErr_Format(PyExc_SystemError, "unknown expr kind: %d", exp->kind);
ret = 0;
}
--state->recursion_depth;
return ret;
}

The recursion guard uses a struct validator carrying both current depth and a limit derived from sys.getrecursionlimit(), preventing stack overflows on adversarially nested sources.

gopy mirror

gopy uses its own AST node types defined in the parser/ package (populated by the PEG parser). Those types are idiomatic Go structs rather than arena-allocated C unions, and they do not correspond one-to-one to CPython's Python-ast.c constructors.

Python-ast.c has not been ported and a direct port is not planned. The file is machine-generated boilerplate; the meaningful logic lives in Parser/Python.asdl (the grammar) and in _PyAST_Validate (the validator). If a gopy AST validator is ever needed it would be modeled on the validator section of this file.

CPython 3.14 changes

  • The type_params field was added to FunctionDef, AsyncFunctionDef, and ClassDef constructors for PEP 695 type parameter syntax (def f[T](...)).
  • Pattern-matching node constructors (MatchValue, MatchOr, etc.) were stabilized; a few field names changed between 3.10 and 3.14.
  • _PyAST_Validate gained the struct validator recursion-depth tracking (replacing a bare integer passed through every call frame) to support accurate limit checking in sub-interpreters with independent recursion limits.
  • _Py_asdl_generic_seq_new was introduced as a typed alias over _Py_asdl_seq_new to give static analysis tools better type information for the untyped void * sequence slots used by pattern nodes.