classification
Title: Trouble when reloading extension modules.
Type: crash Stage:
Components: Extension Modules, Interpreter Core Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, christoph.wiedemann, cschramm, eric.snow, ncoghlan, petr.viktorin, scoder
Priority: normal Keywords:

Created on 2018-08-01 17:36 by christoph.wiedemann, last changed 2018-08-08 07:24 by christoph.wiedemann.

Messages (14)
msg322876 - (view) Author: chris (christoph.wiedemann) Date: 2018-08-01 17:36
I'm linking an issue from numpy here: https://github.com/numpy/numpy/issues/8097

Embedding python suffers from a possibility to reliably reset the state of the python interpreter. For my use case, I noticed that when using numpy with Py_Initialize() and Py_Finalize():

Py_Initialize()
// call code importing numpy
Py_Finalize()
Py_Initialize()
// call same code again

The above code will result in a crash.

One of the comments in the referenced issue is that Py_Finalize doesn't unload loaded DLL's or shared objects. Doing that would probably fix the issues.

As of now, embedding python is fundamentally broken for applications which want to embed non-trivial scientific python scripts involving user-adapted python code, because

a) Py_Finalize cannot be used reliably
b) There is no possibility to reliably reset the python interpreter otherwise (because the sub-interpreters are also not working reliably, which is stated in the documentation)
c) manually reloading changed modules via importlib.reload is not a feasible solution

The possibility to reset an embedded python interpreter to an initial state is a strong requirement for such applications.
msg322881 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2018-08-01 19:32
The matter of unloading extension modules is partly covered in bpo-401713.  However, note that a few things have changed in the last 18 years. :)  I think it would be worth revisiting the decision in that issue at this point.

If that were sorted out would there be other issues to address?

Regardless, If I understood right, your only objective here is to completely reset Python in the current process to the initial state.  You mention "embedded python interpreter", but it sounds more like you mean "embedded python runtime" (particularly since you are using Py_Initialize and Py_Finalize).  Why is completely resetting Python "a strong requirement"?
msg322882 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2018-08-01 19:43
Regarding your 3 points:

> a) Py_Finalize cannot be used reliably

Note that unloading extension modules is not the only thing that Py_Finalize isn't doing that it probably should be.  I expect that you would find a number of memory leaks and potentially cases that crash the process when calling Py_Initialize() after Py_Finalize().

As bpo-401713 indicates, calling Py_Initialize/Py_Finalize multiple times isn't a well supported pattern.  The docs indicate likewise (see https://docs.python.org/3/c-api/init.html#c.Py_FinalizeEx).  That said, I'm sure everyone agrees that it *should* work and that we should fix issues there when we can.  It's mostly a matter of finding folks motivated enough to work on it. :)

> b) There is no possibility to reliably reset the python interpreter otherwise

At the level of the interpreter (main or otherwise), the C-API doesn't really provide anything like that, though I suppose it could.  However, it's mostly simpler to just make a new interpreter.  So I doubt we'd introduce an explicit concept of resetting an interpreter.  What would be the advantage?

At the level of the *runtime* the C-API likewise doesn't offer anything for reset, other than what you've already tried.  It would really help to understand why resetting is needed.

>  (because the sub-interpreters are also not working reliably,
> which is stated in the documentation)

This is something I'm actively working on, including improving isolation between interpreters (minimizing global state) and improving the extension module experience when using subinterpreters.  The latter is tricky, particularly with extension modules that use libraries that have their own C globals.

Note that isolation between interpreters will never be perfect.  If you need perfect isolation then use multiple processes.  All interpreters in the same process will always share the runtime state, as well as process-wide data.

FWIW, even if we were to completely isolate the Python runtime, allowing more than one runtime in a process, the runtimes would still share the process-wide data.

> c) manually reloading changed modules via importlib.reload is not a
> feasible solution

Yeah, while I suppose it could work for some extensions, I expect it would not work for many.
msg322886 - (view) Author: chris (christoph.wiedemann) Date: 2018-08-01 20:37
Thanks for your comments and the link to the issue from the year 2000.

> You mention "embedded python interpreter", but it sounds more like you mean "embedded python runtime"

Yes that's right. Sorry for imprecise wording.

> Why is completely resetting Python "a strong requirement"?

Because otherwise, if this is not an option, we need to restart the embedding C/C++ application whenever a python module is changed and need to be reloaded. This is occurring frequently in our use case and it is the main reason we want to embed python for rapid development purposes. (The application itself is a development environment for computer vision algorithms, where python is only one of multiple interacting, configurable plugins.)

Being forced to restart the whole application on code changes compromises the advantage of using a scripting language without edit/compile/link steps to a degree which questions the whole idea. I'm not very confident in hacking reload(...) stuff in the modules. 

Interestingly enough, our use case matches exactly the one which has been described as unlikely in the original issue (https://bugs.python.org/issue401713#msg34524): the python runtime is dynamically loaded at runtime as a plugin :) I haven't tried it, but I suppose that we get the very same issues when we reload the plugin, because the dynamic libraries of extension modules are still loaded with an invalid state.

Maybe using processes and some kind of socket / shared memory communication would suit our needs better, but this is also much more complicated and error-prone to implement than simply embedding python into the main process.
msg322890 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2018-08-01 21:47
Ah, thanks for clarifying.  So which of these is the main thing you really want:

1. reload extension modules
2. completely restart Python

It sounds like #1.  If that's the case then there are a number of issues to resolve to make it work.  However, there are some serious technical challenges to overcome. :/

So if it's #1 the problem space is relatively focused so a solution (if possible) would be tractable in the "short" term, so it *could* happen. :)  However, if it's #2 then a lot of things will have to fixed and so realistically it might never happen.

FYI, either way none of it will be backported, so the functionality would not be available on Python 3.7 or earlier.

---------------

FWIW, in VS Code they run their plugins (extensions) in a separate process.  Their docs give some insight into plugin system design. :)

   https://code.visualstudio.com/docs/extensionAPI/patterns-and-principles#_stability-extension-isolation
msg322894 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2018-08-01 22:10
Also, part of the motivation for PEP 489 (extension module initialization) was to help with reloading extension modules.
msg322896 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2018-08-01 22:15
Also, PEP 3121 provides a good summary of some of the issues at hand.
msg322931 - (view) Author: chris (christoph.wiedemann) Date: 2018-08-02 09:01
Okay, completely restarting python is not really necessary. Being able to reliably unload and later on re-import python modules (extension modules as well as pure python modules) in an embedded python runtime would solve my problems. 

One way to achieve that is currently Py_Initialize / Py_Finalize, but there are the drawbacks already mentioned. Another possibility is using sub-interpreters. 

If either of these could be fixed for extension modules (possibly with unloading the shared objects / DLL's :) ) I'd be fine.

I completely understand your point about backporting and it is not an issue.
msg322943 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2018-08-02 10:53
PEP 489 (Multi-phase extension module initialization) makes it possible/easy to unload/reimport extension modules, in the sense of destroying/recreating the module object. The problem is that the modules needs to opt-in to supporting this, which is not always easy (e.g. the module needs to not use C globals, or use them carefully), and sometimes it's still nearly impossible (see the in-progress PEP 573).

Unloading the actual shared library is another matter, though. That's not currently planned. There's no good way to ensure that there no remaining objects that could reference the shared library's code.
Instead, your best bet is probably to name the new .so/DLL differently, and load an extra copy. (PEP 489 makes it possible to make the .so/DLL contain a module with a different name.) If you do go this way, I'd welcome feedback.
msg322970 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2018-08-02 15:38
@chris, I can't promise that anything will happen right away, but I'll be sure to look into this further when I work on improving extension module usage in subinterpreters in the next few months.
msg322972 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2018-08-02 15:56
I've changed the issue title to reflect where things stand.

Hmm, doing so reminded me of an important consideration here.  A module object is effectively a fairly light wrapper around a dict.  When you call importlib.reload() the loader from the module's spec is used to re-execute the module's existing dict. [1][2]  A new module is not created and the existing module namespace is not reset.  So during reload the module is responsible for deleting anything in its namespace that wouldn't get replaced when re-executed (including attributes that were added to the namespace externally).  For most modules this isn't an issue.  However, it's something to consider when reloading a module.  See the docs for more explanation and caveats. [3]

[1] https://github.com/python/cpython/blob/master/Lib/importlib/__init__.py#L169
[2] https://github.com/python/cpython/blob/master/Lib/importlib/_bootstrap.py#L610
[3] https://docs.python.org/3/library/importlib.html#importlib.reload
msg322978 - (view) Author: Stefan Behnel (scoder) * Date: 2018-08-02 16:22
a) Probably not something to fix in released versions any more, so increasing version from 3.5 to 3.8.

b) Regarding shared library unloading and the problems mentioned, I'm also not sure if there is a way to safely unload transitively imported libraries, e.g. if the extension module is a wrapper for an external C library (which then might come with its own dependencies again, which might still be in use by other extension modules, etc.).
msg323112 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-08-04 16:06
As others have noted, dynamically reloading CPython extension modules is akin to dynamically reloading any other C/C++ shared library, so it has enough opportunities for things to go wrong that we consider allowing the shared state to persist across initialize/finalize cycles the less problematic of two problematic options (at least for now).

Reliable hot reloading support is typically a property of pure Python modules (and even at that higher level, inter-module dependencies can still cause problems at runtime).

(FWIW, this problem is currently also the main reason we don't offer an in-REPL package installation command - while PEP 489 offers significant opportunities for improvement, it's likely to be years before we see widespread adoption of that, so we prefer to advise folks to run an installer outside the REPL, then restart and replay their interactive session)

If subinterpreters are an option though, then yeah, that has far more potential to be viable. It wouldn't be trivial, as we'd need to add dlmopen support (thanks Stack Overflow [1]) to give the subinterpreter a truly independent copy of the shared library (and also work out whatever the equivalent to dlmopen might be on other platforms), but going down that path could also potentially provide a way around the known problems with global state leaking between subinterpreters via extension modules.

[1] https://stackoverflow.com/questions/48864659/loading-shared-library-twice/48888598#48888598
msg323265 - (view) Author: chris (christoph.wiedemann) Date: 2018-08-08 07:24
For short-term / mid-term we have now decided to start python as seperate process and interact with some kind of IPC. That leads to a limited interaction model between python and the embedded app but it has the advantage that unloading is possible (by simply restarting python).

Hopefully, at some day python will have better support for unloading / reloading extension modules, but as some pointed out this will take time also until extension modules adopt new API discussed in the PEPs.

Thanks for discussion and information!
History
Date User Action Args
2018-08-08 07:24:40christoph.wiedemannsetmessages: + msg323265
2018-08-04 16:06:27ncoghlansetmessages: + msg323112
2018-08-02 16:22:14scodersetmessages: + msg322978
versions: + Python 3.8, - Python 3.5
2018-08-02 15:56:19eric.snowsetmessages: + msg322972
title: Embedding Python; Py_Initialize / Py_Finalize cycles -> Trouble when reloading extension modules.
2018-08-02 15:38:11eric.snowsetmessages: + msg322970
2018-08-02 10:53:05petr.viktorinsetmessages: + msg322943
2018-08-02 09:01:36christoph.wiedemannsetmessages: + msg322931
2018-08-02 07:53:00cschrammsetnosy: + cschramm
2018-08-01 22:15:22eric.snowsetmessages: + msg322896
2018-08-01 22:10:52eric.snowsetnosy: + petr.viktorin
messages: + msg322894
2018-08-01 21:47:28eric.snowsetmessages: + msg322890
2018-08-01 20:55:24scodersetnosy: + scoder
2018-08-01 20:37:02christoph.wiedemannsetmessages: + msg322886
2018-08-01 19:43:01eric.snowsetmessages: + msg322882
2018-08-01 19:32:57eric.snowsetnosy: + brett.cannon, ncoghlan, eric.snow
messages: + msg322881
2018-08-01 17:36:13christoph.wiedemanncreate