Skip to main content

Include/internal/pycore_global_strings.h

cpython 3.14 @ ab2d84fe1023/Include/internal/pycore_global_strings.h

Every Python attribute lookup, function call, and import involves comparing string identifiers. To avoid repeated heap allocation and hash computation for the most common names, CPython pre-interns a fixed table of strings at interpreter startup. This header declares that table.

There are two families:

  • _Py_ID(name) expands to &_PyRuntime.cached_objects.interned_strings.id.name, a pointer to an immortal PyUnicodeObject whose value is "name". Used for identifiers like __name__, __init__, append, write, etc.
  • _Py_STR(name) expands to &_PyRuntime.cached_objects.interned_strings.str.name, used for a small set of special strings that are not valid Python identifiers ("", "\n", "<anonymous>", etc.).

Both families are initialized once by _PyUnicode_InitStaticStrings during Py_Initialize. After that, internal C code can compare against them via _PyUnicode_EqualToASCIIId or simply with pointer equality, since there is only one copy of each string in the process.

In gopy, these singletons are replaced by Go string constants or a startup-initialized map from name to *objects.Str. Pointer-equality shortcuts are replaced by string equality on Go string values, which is O(len) but avoids heap allocation.

Map

SymbolKindPurpose
_Py_ID(name)macroYields a *PyUnicodeObject for the pre-interned identifier name
_Py_STR(name)macroYields a *PyUnicodeObject for a pre-interned special string
struct _Py_global_stringsstructHolds two anonymous structs: id (one field per identifier) and str (one field per special string)
_PyUnicode_EqualToASCIIIdfunction (declared elsewhere)Compares any PyObject* against a pre-interned id without computing a hash
_PyUnicode_InitStaticStringsfunction (declared elsewhere)Populates every field of _Py_global_strings at startup

The id struct contains roughly 260 fields in CPython 3.14, covering names like __abs__, __add__, __all__, ..., zip. The str struct contains around a dozen entries.

Reading

Macro expansion and struct layout

The two macros resolve to struct-member addresses inside _PyRuntime:

/* Include/internal/pycore_global_strings.h */
#define _Py_ID(name) \
(&_PyRuntime.cached_objects.interned_strings.id.name)

#define _Py_STR(name) \
(&_PyRuntime.cached_objects.interned_strings.str.name)

struct _Py_global_strings {
struct {
/* one field per pre-interned identifier */
PyObject *__abs__;
PyObject *__add__;
PyObject *__all__;
/* ... ~260 more ... */
PyObject *__name__;
PyObject *append;
PyObject *write;
/* ... */
} id;
struct {
/* special strings that are not identifiers */
PyObject *anon_string; /* "<anonymous>" */
PyObject *empty; /* "" */
PyObject *newline; /* "\n" */
/* ... */
} str;
};

Because every field is inside a single contiguous _PyRuntime struct, the entire table lives in one allocation and the pointers remain stable for the interpreter's lifetime.

Pointer-equality fast paths in the attribute machinery

The primary consumer of _Py_ID is Objects/object.c and the type machinery. A typical use:

/* Objects/typeobject.c (representative pattern) */
if (name == _Py_ID(__init__)) {
/* fast path: we already know the interned pointer */
...
}
/* fallback for non-interned names: */
if (_PyUnicode_EqualToASCIIString(name, "__init__")) {
...
}

The pointer comparison name == _Py_ID(__init__) is a single instruction and is valid because PyUnicode_InternInPlace guarantees that any string whose value matches an already-interned string will be redirected to the canonical object. Code that calls PyUnicode_InternInPlace on a string received from Python code can therefore use pointer equality instead of strcmp.

_Py_STR for non-identifier constants

_Py_STR covers strings that cannot be used as C identifiers. For example the empty string and the anonymous-function display name:

/* Objects/funcobject.c */
if (func->func_qualname == NULL) {
func->func_qualname = Py_NewRef(_Py_STR(anon_string));
}

This avoids creating a new "" or "<anonymous>" PyObject every time a lambda or comprehension is constructed without an explicit name.

gopy mirror

gopy replaces the entire _Py_global_strings mechanism with Go-level string constants. There is no _PyRuntime struct and no immortality mechanism; instead:

// objects/global_strings.go (representative)
const (
IdDunder__name__ = "__name__"
IdAppend = "append"
IdWrite = "write"
StrAnon = "<anonymous>"
StrEmpty = ""
)

Where CPython uses pointer equality after InternInPlace, gopy uses Go's built-in string equality (==), which compares content and is O(len). For the common short identifiers (under 32 bytes) this is typically two or three instructions on a modern CPU and requires no hash computation.

_PyUnicode_EqualToASCIIId has no direct gopy analogue; call sites are translated to plain s == IdFoo comparisons.

CPython 3.14 changes

  • The table grew by roughly 20 entries in 3.13-3.14 to cover new dunder names introduced by the type parameter syntax (__type_params__, __value__) and the new warnings module hooks.
  • In 3.12, strings in _Py_ID and _Py_STR were made immortal (reference count set to _Py_IMMORTAL_REFCNT) so that Py_INCREF / Py_DECREF on them become no-ops. This is part of the broader immortalization work that also covers True, False, None, and small integers.
  • The struct fields were sorted alphabetically in 3.13 to make binary-search or SIMD future optimizations easier; the sort order is now enforced by a CI script.