Author ncoghlan
Recipients eric.snow, ncoghlan, petr.viktorin, shihai1991, vstinner
Date 2020-02-08.12:52:39
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1581166359.8.0.337239419828.issue39465@roundup.psfhosted.org>
In-reply-to
Content
As Petr notes, as long as all subinterpreters share the GIL, and share str instances, then the existing _Py_IDENTIFIER mechanism will work fine for both single phase and multi-phase initialisation.

However, that constraint also goes the other way: as long as we have modules that use the existing _Py_IDENTIFIER mechanism, then subinterpreters *must* share str instances, and hence *must* share the GIL.

Hence the "enhancement" classification: there's nothing broken right now, but if we're ever going to achieve the design goal of using subinterpreters to exploit multiple CPU cores without the overhead of running multiple full interpreter processes, we're going to need to design a different way of handling this.

Something to keep in mind with `_Py_IDENTIFIER` and any replacement API: the baseline for performance comparisons is https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_InternFromString

The reason multi-phase initialisation makes this more complicated is that it means we can't use the memory addresses of C process globals as unique identifiers any more, since more than one module object may be created from the same C shared library.

However, if we assume we've moved to per-module state storage (to get unique memory addresses back), then we can largely re-use the existing `_Py_IDENTIFIER` machinery to make the lookup as fast as possible, while still avoiding conflicts between subinterpreters.
History
Date User Action Args
2020-02-08 12:52:40ncoghlansetrecipients: + ncoghlan, vstinner, petr.viktorin, eric.snow, shihai1991
2020-02-08 12:52:39ncoghlansetmessageid: <1581166359.8.0.337239419828.issue39465@roundup.psfhosted.org>
2020-02-08 12:52:39ncoghlanlinkissue39465 messages
2020-02-08 12:52:39ncoghlancreate