Skip to main content

Parser/

The tokenizer and PEG parser. Source for CPython 3.14 lives at cpython/Parser/. The directory has three layers:

  • tokenizer/: input adapters (file, string, readline, utf-8).
  • lexer/: the state machine that turns characters into tokens.
  • pegen.c / parser.c / action_helpers.c: the PEG parser itself. parser.c is generated from Grammar/python.gram by Tools/peg_generator/.

File inventory

FileLinesPagegopy port
Parser/lexer/lexer.c1635lexerparser/lexer/
Parser/lexer/state.c151(pending)parser/lexer/state.go
Parser/lexer/buffer.c76(pending)parser/lexer/buffer.go
Parser/tokenizer/file_tokenizer.c493(pending)parser/tokenizer/
Parser/tokenizer/helpers.c581(pending)parser/tokenizer/
Parser/tokenizer/string_tokenizer.c148(pending)parser/tokenizer/
Parser/tokenizer/utf8_tokenizer.c55(pending)parser/tokenizer/
Parser/tokenizer/readline_tokenizer.c134(pending)n/a
Parser/pegen.c1083pegenparser/pegen/
Parser/pegen.h388(pending)parser/pegen/types.go
Parser/pegen_errors.c462(pending)parser/pegen/errors.go
Parser/action_helpers.c1953action-helpersparser/pegen/action_helpers_gen.go
Parser/string_parser.c339(pending)parser/string_parser.go
Parser/parser.c38073parser-generatedparser/parser_gen.go
Parser/token.c250tokenparser/token/
Parser/peg_api.c39(trivial wrapper)n/a
Parser/myreadline.c437(host-only)n/a

Total: 23 files, ~46k lines (38k of which is the generated parser.c).

Reading order

  1. token.c: the token kind enum and pretty-printer.
  2. lexer.c: the byte-level scanner.
  3. pegen.c: the parser runtime (memo table, error recovery, top-level entry).
  4. action_helpers.c: AST construction helpers invoked from the generated parser.
  5. parser.c: the generated PEG state machine.