Skip to main content

pycore_global_strings.h

Pre-interned string storage for CPython's runtime. Every identifier or special literal that the interpreter needs repeatedly (dunder names, keyword strings, codec names) is stored once in _PyRuntime.cached_objects.strings and accessed through the _Py_ID() or _Py_STR() macros. No allocation happens at call sites; callers borrow a reference to a singleton that lives for the lifetime of the interpreter.

Map

LinesSymbolRole
18–26STRUCT_FOR_ASCII_STR, STRUCT_FOR_STR, STRUCT_FOR_IDLayout macros that embed a PyASCIIObject plus inline data
31–823struct _Py_global_stringsAggregate of literals and identifiers sub-structs, auto-generated
33–56literalsNamed special strings: <module>, utf-8, <lambda>, etc.
58–814identifiersDunder names and other identifiers: __init__ through zstd_dict
815–823ascii[128], latin1[128]Fast single-character string table
830–831_Py_ID(NAME)Macro returning a borrowed PyObject* for a known identifier
832–833_Py_STR(NAME)Macro returning a borrowed PyObject* for a known literal
834–837_Py_LATIN1_CHR(CH)Macro returning a pre-interned single-character string
849_Py_DECLARE_STR(name, str)Documentation-only macro; expands to nothing

Reading

Struct layout macros (lines 18–26)

Each string is stored as an anonymous struct embedding the full PyASCIIObject header followed by inline character data. This lets the linker place the string body immediately after its header with no extra allocation.

// CPython: Include/internal/pycore_global_strings.h:18 STRUCT_FOR_ASCII_STR
#define STRUCT_FOR_ASCII_STR(LITERAL) \
struct { \
PyASCIIObject _ascii; \
uint8_t _data[sizeof(LITERAL)]; \
}
#define STRUCT_FOR_STR(NAME, LITERAL) \
STRUCT_FOR_ASCII_STR(LITERAL) _py_ ## NAME;
#define STRUCT_FOR_ID(NAME) \
STRUCT_FOR_ASCII_STR(#NAME) _py_ ## NAME;

STRUCT_FOR_ID stringifies NAME so the struct's _data array is sized exactly to the identifier text including the NUL terminator.

Access macros (lines 830–837)

// CPython: Include/internal/pycore_global_strings.h:830 _Py_ID
#define _Py_ID(NAME) \
(_Py_SINGLETON(strings.identifiers._py_ ## NAME._ascii.ob_base))
#define _Py_STR(NAME) \
(_Py_SINGLETON(strings.literals._py_ ## NAME._ascii.ob_base))
#define _Py_LATIN1_CHR(CH) \
((CH) < 128 \
? (PyObject*)&_Py_SINGLETON(strings).ascii[(CH)] \
: (PyObject*)&_Py_SINGLETON(strings).latin1[(CH) - 128])

_Py_SINGLETON expands to _PyRuntime.cached_objects, so _Py_ID(__init__) is a direct field access into the global runtime struct, not a hash-table lookup.

Identifier list structure (lines 58–814)

The identifiers sub-struct is generated by Tools/build/generate_global_objects.py. Each entry uses STRUCT_FOR_ID, which stringifies the C token to produce the Python identifier string. The list spans every dunder method (__abs__ through __xor__), all co_* code-object attribute names, and common keyword-argument names.

Single-character tables (lines 815–823)

// CPython: Include/internal/pycore_global_strings.h:815 ascii
struct {
PyASCIIObject _ascii;
uint8_t _data[2];
} ascii[128];
struct {
PyCompactUnicodeObject _latin1;
uint8_t _data[2];
} latin1[128];

Characters 0–127 use PyASCIIObject; characters 128–255 use PyCompactUnicodeObject because they require a Latin-1 kind flag. _Py_LATIN1_CHR dispatches between the two arrays at compile time via the constant branch.

gopy notes

gopy stores interned strings in objects/str.go using a Go sync.Map keyed by string content. The _Py_ID pattern maps naturally to a package-level var holding a pre-interned *StrObject, initialized in an init() function. The single-character Latin-1 table corresponds to the latin1 array in objects/str.go.

The _Py_DECLARE_STR macro (line 849) is a no-op in CPython; gopy has no equivalent and does not need one.

CPython 3.14 changes

Python 3.12 introduced this header as a replacement for the older _Py_Identifier linked-list mechanism. In 3.14 the identifier list was extended with __annotate__, __annotate_func__, __annotations_cache__, __conditional_annotations__, __firstlineno__, __static_attributes__, _strptime_datetime_date, _strptime_datetime_time, and zstd_dict, reflecting new language features and stdlib additions in 3.12 through 3.14.