1643. gopy parser errors and helpers
What we are porting
Three small-to-medium files plus the token table:
Parser/pegen_errors.c(~1500 lines): every SyntaxError text the parser emits. Most of CPython's user-visible parser quality lives here: "expected ':'", "cannot assign to ...", "did you mean :=?".Parser/action_helpers.c(~1500 lines): the helpers the generated parser calls in rule actions to build AST nodes._PyPegen_singleton_seq,_PyPegen_seq_insert_in_front,_PyPegen_join_names_with_dot, etc.Parser/peg_api.c(~200 lines): the public C entry points (PyPegen_ASTFromString,PyPegen_ASTFromFile). Most of this becomes one Go function onparser.Parse(1642), but the diagnostic-flush logic lands here.Parser/token.c(~250 lines): the token name table. CPython generates this fromGrammar/Tokens; we already generate the matchingtokenize/types_gen.go(1665), so this file becomes a formality.
SyntaxError panel (the bulk of the work)
Every diagnostic in pegen_errors.c becomes one entry in
parser/errors/messages.go:
// MsgInvalidAssignment is emitted when the LHS of `=` is not an
// assignable target. Mirrors _PyPegen_raise_syntax_error_known_location
// from pegen_errors.c at the "cannot assign to %s" call site.
const MsgInvalidAssignment = "cannot assign to %s"
The point is byte parity: when a Python user feeds gopy a broken program, the SyntaxError they see should be indistinguishable from CPython's. The set is closed (CPython freezes new error text per release), so we transcribe it once and pin it via golden tests.
Action helpers
The generated parser calls helpers like:
// SingletonSeq wraps one node into a one-element list.
// Mirrors _PyPegen_singleton_seq from action_helpers.c.
func SingletonSeq(n ast.Node) []ast.Node
// JoinNamesWithDot turns ["a", "b", "c"] into "a.b.c".
// Mirrors _PyPegen_join_names_with_dot.
func JoinNamesWithDot(names []*ast.Name) string
// SetExprContext walks a Name/Tuple/List/Starred tree and stamps
// each node's expr_context to Store/Del. Mirrors
// _PyPegen_set_expr_context.
func SetExprContext(e ast.Expr, ctx ast.ExprContext) ast.Expr
These are mechanical translations: each helper is one or two dozen lines of Go. About 60 helpers total.
Token table
Parser/token.c defines _PyParser_TokenNames[]. We already
generate this in tokenize/types_gen.go (1665). The duplication is
intentional: the parser side reads its own copy so the lexer and
parser do not share runtime state.
Error injection points
CPython's parser invokes the error builder at these moments:
- Forced expect (
expect_forced_token): "expected ','"-style. - Invalid rule fallback: when
call_invalid_rulesis on, the second pass produces the user-friendly text. - Tokenizer surfacing: lexer errors flow through
_PyPegen_tokenize_full_source_to_check_for_errorsthen into the same SyntaxError builder. - Indent/dedent issues:
pegen_errors.c:_PyPegen_check_tokenizer_errors.
The Go side groups these into one parser/errors/builder.go with
a small constructor table.
File mapping
| C source | Go target |
|---|---|
Parser/pegen_errors.c | parser/errors/messages.go |
parser/errors/builder.go | |
parser/errors/invalid_rules.go | |
Parser/action_helpers.c | parser/pegen/actions.go |
Parser/peg_api.c | folded into parser/parser.go (1642) |
Parser/token.c | folded into tokenize/types_gen.go (1665) |
Checklist
Status legend: [x] shipped, [ ] pending, [~] partial / scaffold,
[n] deferred / not in scope this phase.
Files
-
parser/errors/messages.go: every SyntaxError string frompegen_errors.cas aMsg*constant. ~120 entries. -
parser/errors/builder.go:Raise,RaiseAt,RaiseRangehelpers that wrap a*SyntaxErrorwith the right position. -
parser/errors/invalid_rules.go: the second-pass user-friendly diagnostics (call_invalid_rulesmode). -
parser/errors/tokenizer_errors.go: lexer-side error lift (indent inconsistency, unexpected EOF in string, etc.). - [~]
parser/pegen/actions.goplusarguments.go,comprehension.go,extras.go,fstring.go: the action helper surface. ~35 functions across the five files cover the panel the v0.5.5 grammar exercises (sequences, name joins, expr-context, f-string assembly, arguments). The remaining_PyPegen_*/_PyAST_*helpers the generator references are emitted as panic-stubs inparser_gen.goand replaced one-by-one as the typed AST surface fills in (gated byparser/pegen/action_helpers_gen.go's exclusion list). -
parser/errors/messages_test.go: golden panel pinning everyMsg*to its CPython text. Refresh viago test -update.
SyntaxError byte-parity panel
- "expected ':'" family (forced expect).
- "cannot assign to ..." family (invalid LHS).
- "did you mean := ?" hint.
- Indent / dedent inconsistency.
- Unmatched / mismatched paren.
- Unterminated string literal (single-line and triple-quoted).
- f-string nesting errors.
- PEP 695 type-param errors.
- Match-statement pattern errors.
- Walrus inside comprehension iterable.
- Star-expression placement.
Action helper panel
-
SingletonSeq,SeqInsertInFront,SeqAppend,SeqFlatten. -
JoinNamesWithDot,SeqLastItem,SeqFirstItem. -
SetExprContextover Name / Tuple / List / Starred / Attribute / Subscript. -
MakeArguments,EmptyArguments. -
KeyValuePairs,KeywordOrStarredsplitter. -
ConcatenateStrings(joins adjacent string literals). - f-string / t-string assembly helpers.
- Comprehension-from-generators helpers.
- [~] Per-rule
actionAst*/actionPgen*helpers that build typed AST nodes.actionPgenMakeModuleships (the entry point for the file rule); the rest are panic-stubs emitted by the generator and filled in lazily as the corpus coverage gate (parser/corpus_test.go) demands.
Out of scope
- Suggestion-style "did you mean ...?" beyond what
pegen_errors.calready produces. The richer suggestion surface (Levenshtein keyword search) lives inerrors/suggest(1611).