Skip to main content

Lib/keyword.py (and Lib/token.py)

cpython 3.14 @ ab2d84fe1023/Lib/keyword.py

This page covers two small reference files that feed the tokenizer and parser. They are documented together because both are auto-generated from grammar sources and both are consumed by the same stage of compilation.

Lib/keyword.py (65 lines) lists the 35 hard keywords and 4 soft keywords of Python 3.14, and provides iskeyword and issoftkeyword predicate functions. It is regenerated by PYTHONPATH=Tools/peg_generator python3 -m pegen.keywordgen Grammar/python.gram Grammar/Tokens Lib/keyword.py (or make regen-keyword).

Lib/token.py (145 lines) lists 69 token-type constants, builds the tok_name reverse dict, and provides EXACT_TOKEN_TYPES (operator strings to token-type numbers). It is regenerated by Tools/build/generate_token.py.

Map

LinesSymbolRolegopy
1-16module docstring, __all__Documents the make regen-keyword command; declares iskeyword, issoftkeyword, kwlist, softkwlist.parser/token/
18-54kwlistAlphabetically sorted list of all 35 hard keywords. Includes False, None, True (which are keywords, not just name constants, since Python 3).parser/token/
56-61softkwlistFour soft keywords: _, case, match, type. These are valid identifiers outside a match/case/type statement; the parser context, not the lexer, decides whether they are keywords.parser/token/
63-64iskeyword, issoftkeywordEach is frozenset(list).__contains__, giving O(1) membership tests.parser/token/
1-79Token type constants69 integer constants from ENDMARKER = 0 to N_TOKENS = 69, plus NT_OFFSET = 256. New in 3.14: TSTRING_START (62), TSTRING_MIDDLE (63), TSTRING_END (64) for template strings.token/types_gen.go
81-84tok_nameDict built from globals() at module load: {0: 'ENDMARKER', 1: 'NAME', ...}. __all__ is extended with every name in the dict.token/token.go
86-135EXACT_TOKEN_TYPESDict mapping operator strings to token-type numbers, e.g. {'==': EQEQUAL, '->': RARROW, '...': ELLIPSIS}. 53 entries total. Used by the tokenize module and the peg parser action helpers.token/types_gen.go
137-144ISTERMINAL, ISNONTERMINAL, ISEOFPredicate functions comparing token numbers against NT_OFFSET (256). Terminal tokens are below 256; non-terminal grammar symbols are at or above.token/token.go

Reading

kwlist completeness (lines 18 to 54)

cpython 3.14 @ ab2d84fe1023/Lib/keyword.py#L18-54

kwlist = [
'False',
'None',
'True',
'and',
'as',
'assert',
'async',
'await',
'break',
'class',
'continue',
'def',
'del',
'elif',
'else',
'except',
'finally',
'for',
'from',
'global',
'if',
'import',
'in',
'is',
'lambda',
'nonlocal',
'not',
'or',
'pass',
'raise',
'return',
'try',
'while',
'with',
'yield'
]

All 35 entries are sorted alphabetically. The list is the single source of truth for iskeyword. The peg generator reads Grammar/python.gram and extracts every quoted token that is an all-lowercase identifier; it then adds False, None, and True explicitly (they are not lowercase).

softkwlist is separate:

softkwlist = [
'_',
'case',
'match',
'type'
]

_ is soft because it is valid in ordinary assignment. match, case, and type were added as soft keywords in 3.10/3.12 to avoid breaking existing code that used them as variable names. The lexer always produces a NAME token for these; the peg parser grammar uses context to decide whether to treat them as keywords.

Token type constants and tok_name (lines 1 to 84)

cpython 3.14 @ ab2d84fe1023/Lib/token.py#L1-84

ENDMARKER = 0
NAME = 1
NUMBER = 2
STRING = 3
NEWLINE = 4
INDENT = 5
DEDENT = 6
# ... operator tokens 7-54 ...
OP = 55
TYPE_IGNORE = 56
TYPE_COMMENT = 57
SOFT_KEYWORD = 58
FSTRING_START = 59
FSTRING_MIDDLE = 60
FSTRING_END = 61
TSTRING_START = 62
TSTRING_MIDDLE = 63
TSTRING_END = 64
COMMENT = 65
NL = 66
ERRORTOKEN = 67
ENCODING = 68
N_TOKENS = 69
NT_OFFSET = 256

tok_name = {value: name
for name, value in globals().items()
if isinstance(value, int) and not name.startswith('_')}

The constants are pinned to Grammar/Tokens via generate_token.py; do not renumber them. The tok_name dict is built from globals() at module-load time, which means it also picks up NT_OFFSET and N_TOKENS. Three token groups are worth noting:

  • NEWLINE (4) ends a logical line; NL (66) is a physical newline inside a multi-line expression and does not end a statement.
  • FSTRING_START/FSTRING_MIDDLE/FSTRING_END (59-61) were added in 3.12 when the C tokenizer grew native f-string support.
  • TSTRING_START/TSTRING_MIDDLE/TSTRING_END (62-64) were added in 3.14 for PEP 750 template strings.
  • ENCODING (68) is emitted as the very first token by tokenize.py to announce the source encoding; it is never produced by the C tokenizer.

In gopy token/types_gen.go is generated by tools/tokens_go from Grammar/Tokens and Include/internal/pycore_token.h. Numeric values match CPython exactly. token/token.go provides the Type.String() method (equivalent to tok_name[t]).

EXACT_TOKEN_TYPES (lines 86 to 135)

cpython 3.14 @ ab2d84fe1023/Lib/token.py#L86-135

EXACT_TOKEN_TYPES = {
'(': LPAR,
')': RPAR,
'[': LSQB,
']': RSQB,
':': COLON,
',': COMMA,
'+': PLUS,
'-': MINUS,
'*': STAR,
'**': DOUBLESTAR,
'/': SLASH,
'//': DOUBLESLASH,
'==': EQEQUAL,
'!=': NOTEQUAL,
'->': RARROW,
'...': ELLIPSIS,
':=': COLONEQUAL,
# ... 35 more entries ...
}

EXACT_TOKEN_TYPES has 53 entries. It is used by the tokenize module's TokenInfo.exact_type property: when a token is OP, the property looks up the string value in EXACT_TOKEN_TYPES to return the precise type such as PLUS or DOUBLESTAR. The peg parser's action helpers also use it when they need to emit a specific punctuation token from an expected string.

Notable entries:

  • Three-character ... maps to ELLIPSIS (52).
  • -> maps to RARROW (51) for return-type annotations.
  • := maps to COLONEQUAL (53) for the walrus operator.
  • ! maps to EXCLAMATION (54), used inside f-string/t-string format specs.

In gopy the same mapping is in token/types_gen.go as ExactTokenTypes map[string]Type.

gopy mirror

token/types_gen.go is generated by tools/tokens_go and mirrors both Lib/token.py and Include/internal/pycore_token.h. It exports:

  • All 69 token-type constants as const ( ENDMARKER Type = 0 ... ).
  • ExactTokenTypes map[string]Type matching EXACT_TOKEN_TYPES.
  • tokenNames [69]string used by Type.String() in token/token.go.

token/token.go exposes ISTERMINAL, ISNONTERMINAL, and ISEOF as methods on Type, plus String() for formatting.

The keyword predicate equivalents are in the peg parser. iskeyword is implemented inline in the lexer by checking token.KEYWORD after an identifier is scanned. issoftkeyword is tested in the grammar actions that handle match/case/type statements.