Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removal of _PyImport_FindExtensionObject() in 3.10 limits custom extension module loaders #89470

Closed
indygreg mannequin opened this issue Sep 28, 2021 · 11 comments
Closed
Labels
3.10 only security fixes topic-C-API type-feature A feature request or enhancement

Comments

@indygreg
Copy link
Mannequin

indygreg mannequin commented Sep 28, 2021

BPO 45307
Nosy @vstinner, @encukou, @ambv, @serhiy-storchaka, @indygreg, @pablogsal
PRs
  • [3.10] bpo-45307: Restore private C API function _PyImport_FindExtensionObject() #28594
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-09-29.12:11:01.518>
    created_at = <Date 2021-09-28.01:01:15.653>
    labels = ['expert-C-API', 'type-feature', '3.10']
    title = 'Removal of _PyImport_FindExtensionObject() in 3.10 limits custom extension module loaders'
    updated_at = <Date 2021-10-04.19:18:43.413>
    user = 'https://github.com/indygreg'

    bugs.python.org fields:

    activity = <Date 2021-10-04.19:18:43.413>
    actor = 'pablogsal'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-09-29.12:11:01.518>
    closer = 'pablogsal'
    components = ['C API']
    creation = <Date 2021-09-28.01:01:15.653>
    creator = 'indygreg'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 45307
    keywords = ['patch']
    message_count = 6.0
    messages = ['402754', '402761', '402776', '402777', '402807', '403168']
    nosy_count = 6.0
    nosy_names = ['vstinner', 'petr.viktorin', 'lukasz.langa', 'serhiy.storchaka', 'indygreg', 'pablogsal']
    pr_nums = ['28594']
    priority = None
    resolution = None
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue45307'
    versions = ['Python 3.10']

    @indygreg
    Copy link
    Mannequin Author

    indygreg mannequin commented Sep 28, 2021

    https://bugs.python.org/issue41994 / commit 4db8988 removed _PyImport_FindExtensionObject() from the C API and it is no longer available in CPython 3.10 after alpha 5.

    At least py2exe and PyOxidizer rely on this API for implementing a custom module loader for extension modules. Essentially, both want to implement a custom Loader.create_module() so they can use a custom mechanism for injecting a shared library into the process. While the details shouldn't be important beyond "they can't use imp.create_dynamic()," both use a similar technique that hooks LoadLibrary() on Windows to enable them to load a DLL from memory (as opposed to a file).

    While I don't have the extension module loading mechanism fully paged in to my head at the moment, I believe the reason that _PyImport_FindExtensionObject() (now effectively import_find_extension()) is important for py2exe and PyOxidizer is because they need to support at most once initialization, including honoring the multi-phase initialization semantics. Since the state of extension modules is stored in static PyObject *extensions and the thread state (which are opaque to the C API), the loss of _PyImport_FindExtensionObject() means there is no way to check for and use an existing extension module module object from the bowels of the importer machinery. And I think this means it isn't possible to implement well-behaved alternate extension module loaders any more.

    I'm aware the deleted API was "private" and probably shouldn't have been used in the first place. And what py2exe and PyOxidizer are doing here is rather unorthodox.

    In my mind the simplest path forward is restoring _PyImport_FindExtensionObject(). But a properly designed public API is probably a better solution.

    Until 3.10 makes equivalent functionality available or another workaround is supported, PyOxidizer won't be able to support loading extension modules from memory on Windows on Python 3.10. This is unfortunate. But probably not a deal breaker and I can probably go several months shipping PyOxidizer with this regression without too many complaints.

    @indygreg indygreg mannequin added 3.10 only security fixes topic-C-API type-feature A feature request or enhancement labels Sep 28, 2021
    @serhiy-storchaka
    Copy link
    Member

    Gregory, could you please build Python 3.10 with PR 28594 applied and test whether py2exe and PyOxidizer work well with it? The restored function no longer used in the CPython code, so it is not tested now.

    @vstinner
    Copy link
    Member

    Until 3.10 makes equivalent functionality available or another workaround is supported,

    I don't understand which long term solution do you propose.

    a properly designed public API is probably a better solution

    Can you propose a public API?

    @pablogsal
    Copy link
    Member

    I don't understand which long term solution do you propose.

    A kind reminder that for 3.10 we cannot add new APIs so this proposal still *makes sense* for the short term (3.10).

    @ambv
    Copy link
    Contributor

    ambv commented Sep 28, 2021

    New changeset ec4e2ec by Serhiy Storchaka in branch '3.10':
    [3.10] bpo-45307: Restore private C API function _PyImport_FindExtensionObject() (GH-28594)
    ec4e2ec

    @pablogsal
    Copy link
    Member

    New changeset 2ca4ab8 by Pablo Galindo (Serhiy Storchaka) in branch '3.10':
    [3.10] bpo-45307: Restore private C API function _PyImport_FindExtensionObject() (GH-28594)
    2ca4ab8

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @albertosottile
    Copy link

    py2exe maintainer here. In the end, did you define/propose a public API or any replacement API for this elimination? @indygreg

    @vstinner
    Copy link
    Member

    py2exe maintainer here. In the end, did you define/propose a public API or any replacement API for this elimination? @indygreg

    If nobody asks for an API, no API is added.

    @indygreg
    Copy link
    Contributor

    I don't believe I did. Since the deleted/private API was restored, it made my problem go away. I hadn't noticed that it was deleted in 3.11. So this will likely bit me/us again :/

    @encukou
    Copy link
    Member

    encukou commented Sep 20, 2022

    While I don't have the extension module loading mechanism fully paged in to my head at the moment, I believe the reason that _PyImport_FindExtensionObject() (now effectively import_find_extension()) is important for py2exe and PyOxidizer is because they need to support at most once initialization, including honoring the multi-phase initialization semantics. Since the state of extension modules is stored in static PyObject *extensions and the thread state (which are opaque to the C API), the loss of _PyImport_FindExtensionObject() means there is no way to check for and use an existing extension module module object from the bowels of the importer machinery. And I think this means it isn't possible to implement well-behaved alternate extension module loaders any more.

    As far as I understand your use case, I think you should have the necessary API for it. Please help me understand what I'm missing.

    I'm not clear on what “at most once initialization” means in your case. Is it at least once within a process? If so, would refusing repeated initialization within a process not work for you? I understand that for a generic module loader a simple global bool won't work, but a {(name, filename): bool} mapping might. Perhaps you might need to store some process-global state instead of a bool (if you move beyond at most once initialization, you'll probably want to keep the module def or the PyInit_* function here).

    Python stores already-imported modules in sys.modules, so you shouldn't need import_find_extension's storage. Of course, sys.modules has much looser semantics: users can manipulate it, meaning that you should e.g. use PyModule_GetDef to check what you get from sys.modules is the module you're looking for.
    Users can also delete a module from sys.modules, which they can't do with import_find_extension's storage, but at that point they're actively shooting themselves in the foot. Do your projects need to protect against that?

    @albertosottile
    Copy link

    albertosottile commented Sep 20, 2022

    If nobody asks for an API, no API is added.

    I am not that familiar with Python internal procedures, could you tell me what process we should follow to ask for a new API?

    I don't believe I did. Since the deleted/private API was restored, it made my problem go away. I hadn't noticed that it was deleted in 3.11. So this will likely bit me/us again :/

    Last year I did not start working on 3.10 in time, so I only started after the release and I did not even notice it was gone, thanks to your previous efforts (I did notice the deprecation warning, but it does not specify deletion in 3.11).

    Since we are again so late in the 3.11 release process, would it be absolutely preposterous to reintroduce _PyImport_FindExtensionObject as it was done last year?


    Now a few notes about proposing a new API. In general, it is quite hard for me to advocate for it, as I am not entirely sure it is needed and what its purpose should be.

    Let's start with a recap of the py2exe use case. The purpose of the affected code is to dynamically load an extension module in Python from memory. There is some external code that provides the right pointer to PyInit_{modname}, which is then processed as in CPython according to PEP 489 or single-phase initialization. To the best of my knowledge, _PyImport_FindExtensionObject is only used in the beginning to check if the extension module was already loaded: if that is the case, the whole importing mechanism is just skipped. We do not make any attempt to re-use the returned module.

    This is my current understanding of the code but, please keep in mind this code was written 8 years ago, by somebody else, and even back then was largely inspired by the CPython implementation of the time (3.4). Also back then, the code contained a lot of calls to private CPython API so, it might be some decisions were taken based on the reference import implementation and never questioned, more than out of necessity. On this regard, it is worth to mention that this comment

    /* Magic for extension modules (built-in as well as dynamically
       loaded). 
       ...
    

    is also present in the py2exe affected codebase.

    That being said, if we trust my understanding of the code, the use case could be as well satisfied by looking for modname with PyMapping_HasKeyString in the return of PyImport_GetModuleDict. Hence, there should be no need for a new API.

    EDIT: @encukou I realize now you proposed a similar approach while I was writing this paper, this kinda makes me more confident...

    albertosottile added a commit to py2exe/py2exe that referenced this issue Sep 22, 2022
    This is due to the removal of _PyImport_FindExtensionObject
    from Python 3.11. As discussed in
    python/cpython#89470
    albertosottile added a commit to py2exe/py2exe that referenced this issue Sep 22, 2022
    This is due to the removal of _PyImport_FindExtensionObject
    from Python 3.11. As discussed in
    python/cpython#89470
    albertosottile added a commit to py2exe/py2exe that referenced this issue Oct 26, 2022
    This is due to the removal of _PyImport_FindExtensionObject
    from Python 3.11. As discussed in
    python/cpython#89470
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes topic-C-API type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants