Parser/token.c
cpython 3.14 @ ab2d84fe1023/Parser/token.c
Auto-generated by Tools/build/generate_token.py from
Grammar/Tokens. The file holds the token-kind string table and three
operator-lookup helpers. Tiny at 250 lines, but it is the canonical
list of token kinds the rest of the front end depends on.
Map
| Lines | Symbol | Role | gopy |
|---|---|---|---|
| 8-79 | _PyParser_TokenNames[] | String table indexed by token kind. | parser/token/names.go:Names |
| 83-113 | _PyToken_OneChar | Single-char operator dispatch. | parser/token/lookup.go:OneChar |
| 115-197 | _PyToken_TwoChars | Two-char operator dispatch. | parser/token/lookup.go:TwoChars |
| 199-250 | _PyToken_ThreeChars | Three-char operator dispatch. | parser/token/lookup.go:ThreeChars |
Reading
_PyParser_TokenNames (lines 8 to 79)
cpython 3.14 @ ab2d84fe1023/Parser/token.c#L8-79
A flat const char * array indexed by the token-kind enum from
Include/internal/pycore_token.h. The order matters: the lexer
returns indices into this array, and tokenize.py uses the same
ordering when round-tripping source.
const char * const _PyParser_TokenNames[] = {
"ENDMARKER",
"NAME",
"NUMBER",
"STRING",
"NEWLINE",
"INDENT",
"DEDENT",
...
"EXCLAMATION", /* line 63: f-string conversion suffix (3.12+) */
...
"FSTRING_START", /* line 68 */
"FSTRING_MIDDLE",
"FSTRING_END",
"TSTRING_START", /* line 71: PEP 750 template strings */
"TSTRING_MIDDLE",
"TSTRING_END",
"COMMENT",
"NL",
"<ERRORTOKEN>",
"<ENCODING>",
"<N_TOKENS>",
};
The three sentinels at the tail are not real tokens. <ENCODING> is
emitted at most once, as the first token from the file tokenizer, and
carries the source encoding name. <ERRORTOKEN> is what the lexer
returns when it cannot recognise the next character. <N_TOKENS> is
the count and never appears in token streams.
_PyToken_OneChar (lines 83 to 113)
cpython 3.14 @ ab2d84fe1023/Parser/token.c#L83-113
int
_PyToken_OneChar(int c1)
{
switch (c1) {
case '!': return EXCLAMATION;
case '%': return PERCENT;
case '&': return AMPER;
case '(': return LPAR;
case ')': return RPAR;
...
case '~': return TILDE;
}
return OP;
}
Twenty-three single-character operators map straight through. Anything
not listed (digits, letters, whitespace, multi-char leads such as =)
falls into return OP, which the caller treats as "no one-char
match".
_PyToken_TwoChars (lines 115 to 197)
cpython 3.14 @ ab2d84fe1023/Parser/token.c#L115-197
Nested switch on c1 then c2. Three cases worth singling out:
case '<':
switch (c2) {
case '<': return LEFTSHIFT;
case '=': return LESSEQUAL;
case '>': return NOTEQUAL; /* line 166: Py2 <> kept for embedders */
}
break;
case '-':
switch (c2) {
case '=': return MINEQUAL;
case '>': return RARROW; /* line 148: return-annotation arrow */
}
break;
case ':':
switch (c2) {
case '=': return COLONEQUAL; /* line 159: walrus, PEP 572 */
}
break;
<> returns NOTEQUAL for Python 2 compatibility. The lexer rejects
<> upstream, but the helper keeps the case for embedders that call
it directly.
_PyToken_ThreeChars (lines 199 to 250)
cpython 3.14 @ ab2d84fe1023/Parser/token.c#L199-250
Five three-character operators in Python 3.14: **=, ..., //=,
<<=, >>=. The function is structured as a triple-nested switch.
case '*':
switch (c2) {
case '*':
switch (c3) {
case '=': return DOUBLESTAREQUAL;
}
break;
}
break;
case '.':
switch (c2) {
case '.':
switch (c3) {
case '.': return ELLIPSIS;
}
break;
}
break;
The lexer probes optimistically one, then two, then three characters,
keeping the first non-OP return.
gopy mirror
The Go port preserves the table layout and switch shape. Token kinds
become a type Kind int with Stringer; the string array is
generated at package init from pycore_token.h. The escalating
one/two/three-char probes live in parser/lexer/op.go, not the token
package, so the token package stays purely tabular.
Regeneration
Edit Grammar/Tokens in CPython and rerun
Tools/build/generate_token.py. gopy mirrors the same generator
output. Do not hand-edit token.c.