Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide convenient C API for storing per-interpreter state #80305

Closed
ncoghlan opened this issue Feb 26, 2019 · 9 comments
Closed

Provide convenient C API for storing per-interpreter state #80305

ncoghlan opened this issue Feb 26, 2019 · 9 comments
Assignees
Labels
3.8 only security fixes type-feature A feature request or enhancement

Comments

@ncoghlan
Copy link
Contributor

BPO 36124
Nosy @arigo, @ncoghlan, @vstinner, @encukou, @ericsnowcurrently
PRs
  • bpo-36124: Add PyInterpreterState.dict. #12132
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ericsnowcurrently'
    closed_at = <Date 2019-03-15.23:49:00.271>
    created_at = <Date 2019-02-26.13:46:42.361>
    labels = ['type-feature', '3.8']
    title = 'Provide convenient C API for storing per-interpreter state'
    updated_at = <Date 2019-03-15.23:49:00.270>
    user = 'https://github.com/ncoghlan'

    bugs.python.org fields:

    activity = <Date 2019-03-15.23:49:00.270>
    actor = 'eric.snow'
    assignee = 'eric.snow'
    closed = True
    closed_date = <Date 2019-03-15.23:49:00.271>
    closer = 'eric.snow'
    components = []
    creation = <Date 2019-02-26.13:46:42.361>
    creator = 'ncoghlan'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 36124
    keywords = ['patch']
    message_count = 9.0
    messages = ['336670', '336695', '336950', '336997', '337089', '337501', '337506', '338043', '338044']
    nosy_count = 5.0
    nosy_names = ['arigo', 'ncoghlan', 'vstinner', 'petr.viktorin', 'eric.snow']
    pr_nums = ['12132']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue36124'
    versions = ['Python 3.8']

    @ncoghlan
    Copy link
    Contributor Author

    (New issue derived from https://bugs.python.org/issue35886#msg336501 )

    cffi needs a generally available way to get access to a caching dict for the currently active subinterpreter. Currently, they do that by storing it as an attribute in the builtins namespace: https://bitbucket.org/cffi/cffi/src/07d1803cb17b230571e3155e52082a356b31d44c/c/call_python.c?fileviewer=file-view-default

    As a result, they had to amend their code to include the CPython internal headers in 3.8.x, in order to regain access to the "builtins" reference.

    Armin suggested that a nicer way for them to achieve the same end result is if there was a PyInterpreter_GetDict() API, akin to https://docs.python.org/3/c-api/init.html#c.PyThreadState_GetDict

    That way they could store their cache dict in there in 3.8+, and only use the builtin dict on older Python versions.

    @ncoghlan ncoghlan added 3.8 only security fixes type-feature A feature request or enhancement labels Feb 26, 2019
    @ericsnowcurrently
    Copy link
    Member

    +1 from me

    @armin, thanks to Nick I understand your request better now. I'll put up a PR by the end of the week if no one beats me to it.

    @ericsnowcurrently
    Copy link
    Member

    Thinking about this, what is the key difference with the existing PyModule_GetState() function? Is it just the return type (module-defined void * vs. a regular dict)? Certainly it provides a C-only namespace that all extensions can share (like PyThreadState_Get() does), but I'm not sure that's desirable.

    Anyway, I'd rather not add PyInterpreterState_GetDict() if it is essentially equivalent to PyModule_GetState().

    @ericsnowcurrently ericsnowcurrently self-assigned this Mar 1, 2019
    @arigo
    Copy link
    Mannequin

    arigo mannequin commented Mar 2, 2019

    PyModule_GetState() requires having the module object that corresponds to the given interpreter state. I'm not sure how a C extension module is supposed to get its own module object corresponding to the current interpreter state, without getting it from the caller in some way.

    The mess in cffi's call_python.c would be much reduced if we had bpo-36124 (fixed to call Py_CLEAR(), see comment in bpo-36124).

    If you want to point out a different approach that might work too, that's OK too. It's just that the current approach was arrived at after multiple generations of crash reports, which makes me uneasy about changing it in more subtle ways than just killing it in favor of a careful PyInterpreterState_GetDict().

    If you want to see details of the current hacks, I can explain https://bitbucket.org/cffi/cffi/src/d765c36df047cf9d5e766777049c4107e1f4cb00/c/call_python.c :

    The goal is that we are given a finite (but unknown at compile-time) number of 'externpy' data structures, and for each pair (externpy, interp) the user can assign a callable 'PyObject *'. The annoying part of the logic is that we have a C-exposed callback function (line 204) which is called with a pointer to one of these 'externpy' structures, and we need to look up the right 'PyObject *' to call.

    At line 255 we just got the GIL and need to check if the 'PyThreadState_GET()->interp' is equal to the one previously seen (an essential optimization: we can't do complicated logic in the fast path). We hack by checking for 'interp->modules' because that's a PyObject. The previous time this code was invoked, we stored a reference to 'interp->modules' in the C structure 'externpy', with an incref. So this fast-path pointer comparison is always safe (no freed object whose address can be accidentally reused). This test will quickly pass if this function is called in the same 'interp' many times in a row.

    The slow path is in _update_cache_to_call_python(), which calls _get_interpstate_dict(), whose only purpose is to return a dictionary that depends on 'interp'. Note how we need to be very careful about various cases, like shutdown. _get_interpstate_dict() can fail and return NULL, but it cannot give a fatal error. That's why we couldn't call, say, PyImport_GetModuleDict(), because this gives a fatal error if 'interp' is being shut down at the moment.

    Overall, the logic uses both 'interp->modules' and 'interp->builtins'. The 'modules' is used only for the pointer equality check, because that's an object that is not supposed to be freed until the very last moment. The 'builtins' is used to store the special name "__cffi_backend_extern_py" in it, because we can't store that in 'interp->modules' directly without crashing various 3rd-party Python code if this special key shows up in 'sys.modules'. The value corresponding to this special name is a dictionary {PyLong_FromVoidPtr(externpy): infotuple-describing-the-final-callable}.

    @encukou
    Copy link
    Member

    encukou commented Mar 4, 2019

    PyModule_GetState() gives you *per-module* state, not per-interpreter state.

    Module objects are shared across subinterpreters, unless you use multi-phase initialization.

    PyModule_GetState() requires having the module object that corresponds
    to the given interpreter state. I'm not sure how a C extension module
    is supposed to get its own module object corresponding to the current
    interpreter state, without getting it from the caller in some way.

    This is the problem described in PEP-573: you don't always have access to your own module object. That keeps some more complex modules from switching to multi-phase init.

    Unless this issue can wait for when PEP-580, PEP-573, and possibly some fallout of unknown unknowns are solved, let's add PyInterpreterState_GetDict for now.

    @ericsnowcurrently
    Copy link
    Member

    On Sat, Mar 2, 2019 at 12:33 AM Armin Rigo <report@bugs.python.org> wrote:

    PyModule_GetState() requires having the module object that corresponds
    to the given interpreter state. I'm not sure how a C extension module is
    supposed to get its own module object corresponding to the current
    interpreter state, without getting it from the caller in some way.

    Fair enough. :)

    If you want to point out a different approach that might work too, that's OK too.

    As Petr noted, the preferred solution isn't feasible yet (pending
    several PEPs) and depends on using multi-phase extension module
    initialization (PEP-489). Furthermore, that assumes that the
    preferred solution would meet your performance needs. If you think it
    wouldn't then this is a great chance to speak up. :)

    It's just that the current approach was arrived at after multiple generations of
    crash reports, which makes me uneasy about changing it in more subtle ways
    than just killing it in favor of a careful PyInterpreterState_GetDict().

    Understood.

    Thanks for the detailed explanation on why you are using
    "interp->dict", and how you need to avoid fatal errors (e.g. from
    "PyImport_GetModuleDict()" during shutdown).

    Your solution seems reasonable, since every interpreter will have it's
    own "modules" object. However, note that "interp->modules" can get
    swapped out with a different object at any moment. The use case is
    temporarily setting a different import state (e.g. isolating a
    module's import). Currently this isn't very common (especially since
    "interp->modules" is currently not sync'ed with "sys.modules"), but we
    have plans for making this easier to do from Python code in the
    not-distant future.

    Regardless, I agree that PyInterpreterState_GetDict() will solve
    several problems for you. I'm sorry we didn't provide this solution
    for you sooner.

    @ericsnowcurrently
    Copy link
    Member

    Also, while PyThreadState_GetDict() is the inspiration here, we don't have to copy it exactly. For instance, PyInterpreterState_GetDict() takes a PyInterpreterState* argument, whereas PyThreadState_GetDict() takes no arguments and gets the PyThreadState* from thread-local storage.

    Is there anything else that would make sense to do differently with PyInterpreterState_GetDict()? It's pretty basic, so I'm guessing "no". :)

    @ericsnowcurrently
    Copy link
    Member

    New changeset d2fdd1f by Eric Snow in branch 'master':
    bpo-36124: Add PyInterpreterState.dict. (gh-12132)
    d2fdd1f

    @ericsnowcurrently
    Copy link
    Member

    @arigo, thanks for nudging us here. :) Let me know if there's anything else that would help here.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants