Lib/keyword.py (and Lib/token.py)
cpython 3.14 @ ab2d84fe1023/Lib/keyword.py
This page covers two small reference files that feed the tokenizer and parser. They are documented together because both are auto-generated from grammar sources and both are consumed by the same stage of compilation.
Lib/keyword.py (65 lines) lists the 35 hard keywords and 4 soft keywords
of Python 3.14, and provides iskeyword and issoftkeyword predicate
functions. It is regenerated by
PYTHONPATH=Tools/peg_generator python3 -m pegen.keywordgen Grammar/python.gram Grammar/Tokens Lib/keyword.py
(or make regen-keyword).
Lib/token.py (145 lines) lists 69 token-type constants, builds the
tok_name reverse dict, and provides EXACT_TOKEN_TYPES (operator strings
to token-type numbers). It is regenerated by Tools/build/generate_token.py.
Map
| Lines | Symbol | Role | gopy |
|---|---|---|---|
| 1-16 | module docstring, __all__ | Documents the make regen-keyword command; declares iskeyword, issoftkeyword, kwlist, softkwlist. | parser/token/ |
| 18-54 | kwlist | Alphabetically sorted list of all 35 hard keywords. Includes False, None, True (which are keywords, not just name constants, since Python 3). | parser/token/ |
| 56-61 | softkwlist | Four soft keywords: _, case, match, type. These are valid identifiers outside a match/case/type statement; the parser context, not the lexer, decides whether they are keywords. | parser/token/ |
| 63-64 | iskeyword, issoftkeyword | Each is frozenset(list).__contains__, giving O(1) membership tests. | parser/token/ |
| 1-79 | Token type constants | 69 integer constants from ENDMARKER = 0 to N_TOKENS = 69, plus NT_OFFSET = 256. New in 3.14: TSTRING_START (62), TSTRING_MIDDLE (63), TSTRING_END (64) for template strings. | token/types_gen.go |
| 81-84 | tok_name | Dict built from globals() at module load: {0: 'ENDMARKER', 1: 'NAME', ...}. __all__ is extended with every name in the dict. | token/token.go |
| 86-135 | EXACT_TOKEN_TYPES | Dict mapping operator strings to token-type numbers, e.g. {'==': EQEQUAL, '->': RARROW, '...': ELLIPSIS}. 53 entries total. Used by the tokenize module and the peg parser action helpers. | token/types_gen.go |
| 137-144 | ISTERMINAL, ISNONTERMINAL, ISEOF | Predicate functions comparing token numbers against NT_OFFSET (256). Terminal tokens are below 256; non-terminal grammar symbols are at or above. | token/token.go |
Reading
kwlist completeness (lines 18 to 54)
cpython 3.14 @ ab2d84fe1023/Lib/keyword.py#L18-54
kwlist = [
'False',
'None',
'True',
'and',
'as',
'assert',
'async',
'await',
'break',
'class',
'continue',
'def',
'del',
'elif',
'else',
'except',
'finally',
'for',
'from',
'global',
'if',
'import',
'in',
'is',
'lambda',
'nonlocal',
'not',
'or',
'pass',
'raise',
'return',
'try',
'while',
'with',
'yield'
]
All 35 entries are sorted alphabetically. The list is the single source of
truth for iskeyword. The peg generator reads Grammar/python.gram and
extracts every quoted token that is an all-lowercase identifier; it then
adds False, None, and True explicitly (they are not lowercase).
softkwlist is separate:
softkwlist = [
'_',
'case',
'match',
'type'
]
_ is soft because it is valid in ordinary assignment. match, case,
and type were added as soft keywords in 3.10/3.12 to avoid breaking
existing code that used them as variable names. The lexer always produces
a NAME token for these; the peg parser grammar uses context to decide
whether to treat them as keywords.
Token type constants and tok_name (lines 1 to 84)
cpython 3.14 @ ab2d84fe1023/Lib/token.py#L1-84
ENDMARKER = 0
NAME = 1
NUMBER = 2
STRING = 3
NEWLINE = 4
INDENT = 5
DEDENT = 6
# ... operator tokens 7-54 ...
OP = 55
TYPE_IGNORE = 56
TYPE_COMMENT = 57
SOFT_KEYWORD = 58
FSTRING_START = 59
FSTRING_MIDDLE = 60
FSTRING_END = 61
TSTRING_START = 62
TSTRING_MIDDLE = 63
TSTRING_END = 64
COMMENT = 65
NL = 66
ERRORTOKEN = 67
ENCODING = 68
N_TOKENS = 69
NT_OFFSET = 256
tok_name = {value: name
for name, value in globals().items()
if isinstance(value, int) and not name.startswith('_')}
The constants are pinned to Grammar/Tokens via generate_token.py; do
not renumber them. The tok_name dict is built from globals() at
module-load time, which means it also picks up NT_OFFSET and N_TOKENS.
Three token groups are worth noting:
NEWLINE(4) ends a logical line;NL(66) is a physical newline inside a multi-line expression and does not end a statement.FSTRING_START/FSTRING_MIDDLE/FSTRING_END(59-61) were added in 3.12 when the C tokenizer grew native f-string support.TSTRING_START/TSTRING_MIDDLE/TSTRING_END(62-64) were added in 3.14 for PEP 750 template strings.ENCODING(68) is emitted as the very first token bytokenize.pyto announce the source encoding; it is never produced by the C tokenizer.
In gopy token/types_gen.go is generated by tools/tokens_go from
Grammar/Tokens and Include/internal/pycore_token.h. Numeric values
match CPython exactly. token/token.go provides the Type.String() method
(equivalent to tok_name[t]).
EXACT_TOKEN_TYPES (lines 86 to 135)
cpython 3.14 @ ab2d84fe1023/Lib/token.py#L86-135
EXACT_TOKEN_TYPES = {
'(': LPAR,
')': RPAR,
'[': LSQB,
']': RSQB,
':': COLON,
',': COMMA,
'+': PLUS,
'-': MINUS,
'*': STAR,
'**': DOUBLESTAR,
'/': SLASH,
'//': DOUBLESLASH,
'==': EQEQUAL,
'!=': NOTEQUAL,
'->': RARROW,
'...': ELLIPSIS,
':=': COLONEQUAL,
# ... 35 more entries ...
}
EXACT_TOKEN_TYPES has 53 entries. It is used by the tokenize module's
TokenInfo.exact_type property: when a token is OP, the property looks
up the string value in EXACT_TOKEN_TYPES to return the precise type such
as PLUS or DOUBLESTAR. The peg parser's action helpers also use it when
they need to emit a specific punctuation token from an expected string.
Notable entries:
- Three-character
...maps toELLIPSIS(52). ->maps toRARROW(51) for return-type annotations.:=maps toCOLONEQUAL(53) for the walrus operator.!maps toEXCLAMATION(54), used inside f-string/t-string format specs.
In gopy the same mapping is in token/types_gen.go as ExactTokenTypes map[string]Type.
gopy mirror
token/types_gen.go is generated by tools/tokens_go and mirrors both
Lib/token.py and Include/internal/pycore_token.h. It exports:
- All 69 token-type constants as
const ( ENDMARKER Type = 0 ... ). ExactTokenTypes map[string]TypematchingEXACT_TOKEN_TYPES.tokenNames [69]stringused byType.String()intoken/token.go.
token/token.go exposes ISTERMINAL, ISNONTERMINAL, and ISEOF as
methods on Type, plus String() for formatting.
The keyword predicate equivalents are in the peg parser. iskeyword is
implemented inline in the lexer by checking token.KEYWORD after an
identifier is scanned. issoftkeyword is tested in the grammar actions
that handle match/case/type statements.