Author ncoghlan
Recipients eric.snow, ncoghlan, petr.viktorin, shihai1991
Date 2020-01-27.14:28:30
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1580135310.72.0.978238650594.issue39465@roundup.psfhosted.org>
In-reply-to
Content
Both https://github.com/python/cpython/pull/18066 (collections module) and https://github.com/python/cpython/pull/18032 (asyncio module) ran into the problem where porting them to multi-phase initialisation involves replacing their usage of the `_Py_IDENTIFIER` macro with some other mechanism.

When _posixsubprocess was ported, the replacement was a relatively ad hoc combination of string interning and the interpreter-managed module-specific state: https://github.com/python/cpython/commit/5a7d2e11aaea2dd32878dc5c6b1aae8caf56cb44

I'm wondering if we may able to devise a comparable struct-field based system that replaces the `_Py_IDENTIFIER` local static variable declaration macro and the `Py_Id_<name>` lookup convention with a combination like (using the posix subprocess module conversion as an example):

    // Identifier usage declaration (replaces _Py_IDENTIFIER)
    _Py_USE_CACHED_IDENTIFIER(_posixsubprocessstate(m), disable);

    // Identifier usage remains unchanged, but uses a regular local variable
    // rather than the static variable declared by _Py_IDENTIFIER
    result = _PyObject_CallMethodIdNoArgs(gc_module, &PyId_disable);

And then the following additional state management macros would be needed to handle the string interning and reference counting:

    // Module state struct declaration
    typedef struct {
        // This would declare an initialised array of _Py_Identifier structs
        // under a name like __cached_identifiers__. The end of the array
        // would be indicated by a strict with "value" set to NULL.
        _Py_START_CACHED_IDENTIFIERS;
        _Py_CACHED_IDENTIFIER(disable);
        _Py_CACHED_IDENTIFIER(enable);
        _Py_CACHED_IDENTIFIER(isenabled);
        _Py_END_CACHED_IDENTIFIERS;
        );
    } _posixsubprocessstate;

    // Module tp_traverse implementation
    _Py_VISIT_CACHED_IDENTIFIERS(_posixsubprocessstate(m));

    // Module tp_clear implementation (also called by tp_free)
    _Py_CLEAR_CACHED_IDENTIFIERS(_posixsubprocessstate(m));

With the requirement to declare usage of the cached identifiers, they could be lazily initialized the same way the existing static variables are (even re-using the same struct declaration).

Note: this is just a draft of one possible design, the intent of this issue is to highlight the fact that this issue has now come up multiple times, and it would be good to have a standard answer available.
History
Date User Action Args
2020-01-27 14:28:30ncoghlansetrecipients: + ncoghlan, petr.viktorin, eric.snow, shihai1991
2020-01-27 14:28:30ncoghlansetmessageid: <1580135310.72.0.978238650594.issue39465@roundup.psfhosted.org>
2020-01-27 14:28:30ncoghlanlinkissue39465 messages
2020-01-27 14:28:30ncoghlancreate