`Parser/token.c`

cpython 3.14 @ ab2d84fe1023/Parser/token.c

Auto-generated by Tools/build/generate_token.py from Grammar/Tokens. The file holds the token-kind string table and three operator-lookup helpers. Tiny at 250 lines, but it is the canonical list of token kinds the rest of the front end depends on.

Map

Lines	Symbol	Role	gopy
8-79	`_PyParser_TokenNames[]`	String table indexed by token kind.	`parser/token/names.go:Names`
83-113	`_PyToken_OneChar`	Single-char operator dispatch.	`parser/token/lookup.go:OneChar`
115-197	`_PyToken_TwoChars`	Two-char operator dispatch.	`parser/token/lookup.go:TwoChars`
199-250	`_PyToken_ThreeChars`	Three-char operator dispatch.	`parser/token/lookup.go:ThreeChars`

Reading

`_PyParser_TokenNames` (lines 8 to 79)

cpython 3.14 @ ab2d84fe1023/Parser/token.c#L8-79

A flat const char * array indexed by the token-kind enum from Include/internal/pycore_token.h. The order matters: the lexer returns indices into this array, and tokenize.py uses the same ordering when round-tripping source.

const char * const _PyParser_TokenNames[] = {
    "ENDMARKER",
    "NAME",
    "NUMBER",
    "STRING",
    "NEWLINE",
    "INDENT",
    "DEDENT",
    ...
    "EXCLAMATION",     /* line 63: f-string conversion suffix (3.12+) */
    ...
    "FSTRING_START",   /* line 68 */
    "FSTRING_MIDDLE",
    "FSTRING_END",
    "TSTRING_START",   /* line 71: PEP 750 template strings */
    "TSTRING_MIDDLE",
    "TSTRING_END",
    "COMMENT",
    "NL",
    "<ERRORTOKEN>",
    "<ENCODING>",
    "<N_TOKENS>",
};

The three sentinels at the tail are not real tokens. <ENCODING> is emitted at most once, as the first token from the file tokenizer, and carries the source encoding name. <ERRORTOKEN> is what the lexer returns when it cannot recognise the next character. <N_TOKENS> is the count and never appears in token streams.

`_PyToken_OneChar` (lines 83 to 113)

cpython 3.14 @ ab2d84fe1023/Parser/token.c#L83-113

int
_PyToken_OneChar(int c1)
{
    switch (c1) {
    case '!': return EXCLAMATION;
    case '%': return PERCENT;
    case '&': return AMPER;
    case '(': return LPAR;
    case ')': return RPAR;
    ...
    case '~': return TILDE;
    }
    return OP;
}

Twenty-three single-character operators map straight through. Anything not listed (digits, letters, whitespace, multi-char leads such as =) falls into return OP, which the caller treats as "no one-char match".

`_PyToken_TwoChars` (lines 115 to 197)

cpython 3.14 @ ab2d84fe1023/Parser/token.c#L115-197

Nested switch on c1 then c2. Three cases worth singling out:

case '<':
    switch (c2) {
    case '<': return LEFTSHIFT;
    case '=': return LESSEQUAL;
    case '>': return NOTEQUAL;   /* line 166: Py2 <> kept for embedders */
    }
    break;
case '-':
    switch (c2) {
    case '=': return MINEQUAL;
    case '>': return RARROW;     /* line 148: return-annotation arrow */
    }
    break;
case ':':
    switch (c2) {
    case '=': return COLONEQUAL; /* line 159: walrus, PEP 572 */
    }
    break;

<> returns NOTEQUAL for Python 2 compatibility. The lexer rejects <> upstream, but the helper keeps the case for embedders that call it directly.

`_PyToken_ThreeChars` (lines 199 to 250)

cpython 3.14 @ ab2d84fe1023/Parser/token.c#L199-250

Five three-character operators in Python 3.14: **=, ..., //=, <<=, >>=. The function is structured as a triple-nested switch.

case '*':
    switch (c2) {
    case '*':
        switch (c3) {
        case '=': return DOUBLESTAREQUAL;
        }
        break;
    }
    break;
case '.':
    switch (c2) {
    case '.':
        switch (c3) {
        case '.': return ELLIPSIS;
        }
        break;
    }
    break;

The lexer probes optimistically one, then two, then three characters, keeping the first non-OP return.

gopy mirror

The Go port preserves the table layout and switch shape. Token kinds become a type Kind int with Stringer; the string array is generated at package init from pycore_token.h. The escalating one/two/three-char probes live in parser/lexer/op.go, not the token package, so the token package stays purely tabular.

Regeneration

Edit Grammar/Tokens in CPython and rerun Tools/build/generate_token.py. gopy mirrors the same generator output. Do not hand-edit token.c.

Map​

Reading​

_PyParser_TokenNames (lines 8 to 79)​

_PyToken_OneChar (lines 83 to 113)​

_PyToken_TwoChars (lines 115 to 197)​

_PyToken_ThreeChars (lines 199 to 250)​

gopy mirror​

Regeneration​

Map