This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Request for multi-phase initialization API to run code after importlib init
Type: enhancement Stage:
Components: C API Versions: Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eric.snow, indygreg, ncoghlan, steve.dower, vstinner
Priority: normal Keywords:

Created on 2020-04-19 20:10 by indygreg, last changed 2022-04-11 14:59 by admin.

Messages (4)
msg366802 - (view) Author: Gregory Szorc (indygreg) * Date: 2020-04-19 20:10
I'm porting PyOxidizer to the PEP 587 APIs. So far, it is mostly straightforward.

I was looking forward to adopting the multi-phase initialization API because I thought it would enable me to get rid of the very ugly hack PyOxidizer uses to inject its custom meta path importer. This hack is described in gory detail at https://docs.rs/pyembed/0.7.0/pyembed/technotes/index.html#our-importing-mechanism and the tl;dr is we use a modified version of the `importlib._bootstrap_external` module to configure a builtin extension module on sys.meta_path so it can handle all imports during initialization.

The good news is the multi-phase initialization API gives us an injection point between "core" and "main." I _think_ I would be able to import the built-in extension here and register it on sys.meta_path.

However, the new multi-phase initialization API doesn't give us the total control that we need. Specifically, PyOxidizer's importer is leveraging a few functions in importlib._bootstrap_external as part of its operation. So it needs this module to be available before it can be loaded and installed on sys.meta_path. It also wants total control over sys.meta_path. So we don't want importlib._bootstrap_external to be mucking with sys.meta_path and imports being performed before PyOxidizer has a chance to readjust state.

The critical feature that PyOxidizer needs is the ability to muck with sys.meta_path and importlib *after* importlib externals are initialized and *before* any non-builtin, non-frozen import is attempted. In the current state of the initialization code, we need to run custom code between init_importlib_external() and when the first non-builtin, non-frozen import is attempted (currently during _PyUnicode_InitEncodings()).

Would it be possible to get a multi-phase initialization API that stops after init_importlib_external()?

If not, could we break up PyConfig._install_importlib into 2 pieces to allow disabling of just importlib._bootstrap_external and provide a supported mechanism to initialize the external mechanism between "core" and "main" initialization? (Although I'm not sure if this is possible, since "main" finishes initializing aspects of "sys" before init_importlib_external() and I'm not sure if it is safe to initialize importlib externals before this is done. I'm guessing there is a reason that code runs before importlib is fully initialized.)

I suppose I could change PyOxidizer's functionality a bit to work around the lack of an importlib._bootstrap_external module between "core" and "main" initialization. I'm pretty sure I could make this work. But my strong preference is to inject code after importlib external support is fully initialized but before any imports are performed with it.

Overall the PEP 587 APIs are terrific and a substantial improvement over what came before. Thank you for all your work on this feature, Victor!
msg366831 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-20 14:29
The current workaround is to use PyConfig._init_main=0, call Py_InitializeFromConfig(), tune Python, and then call _Py_InitializeMain() to finish the "main" initialization:

https://docs.python.org/dev/c-api/init_config.html#multi-phase-initialization-private-provisional-api

You can execute code between init_importlib() and init_importlib_external().

I understand that it's not enough for you. As I wrote, it's a workaround.

--

Well, your use case is to fully control the "Python Path Configuration":
https://docs.python.org/dev/c-api/init_config.html#path-configuration

This part was completely put aside in PEP 587 on purpose (to be able to put the done part in Python 3.8), but it's part of PEP 432.

It was proposed to rewrite getpath.c and getpathc.c in pure Python to let embedders to fully override this code with their own Python implementation. But nobody is available to implement this feature.
msg367284 - (view) Author: Gregory Szorc (indygreg) * Date: 2020-04-25 21:23
Having approached this with a fresh brain, I was able to port PyOxidizer to use the multi-phase initialization API with minimal regressions!

The relevant code exists at https://github.com/indygreg/PyOxidizer/blob/b5aa2b3a96dbd01e9d78857e124f1052f42f86c6/pyembed/src/interpreter.rs#L550.

Here's the sequence:

1) Initialize core
2) Import our custom built-in extension module
3) Run some more code to initialize our extension module (update sys.meta_path, etc)
4) Initialize main
5) Remove PathFinder if filesystem imports are disabled

#5 isn't ideal: I would prefer an API to disable the registration of that importer completely. But PyOxidizer's importer is first on sys.meta_path and PathFinder shouldn't come into play. So it should be mostly harmless.

A super minor paper cut is the lack of a PyConfig_SetBytesString() variant for PyWideStringList_Append(). It was slightly annoying having to convert a POSIX char* path to a wchar_t* since paths on POSIX are bytes.

Another potential area for improvement is around error handling before main is initialized. I'd like to represent PyErr raised during initialization as a Rust String, as PyErr isn't appropriate since there isn't a fully initialized Python interpreter. However, I discovered that serializing PyErr to strings is a bit brittle before main is initialized. e.g. https://docs.python.org/3/faq/extending.html#id11 says to use PyErr_Print() and replace sys.stdout. But sys.stdout isn't initialized yet and I'm scared to touch it. It also appears various functions in traceback rely on facilities from an initialized interpreter (such as finding sources). It would be useful if there were some kind of PyErr API that returned a PyString (or PyStatus) and was guaranteed to work before main is initialized.

Overall, the new code in PyOxidizer is much, much cleaner! Thanks again for the new API!
msg367421 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-27 13:40
> 5) Remove PathFinder if filesystem imports are disabled

Extract of importlib._bootstrap_external._install():

def _install(_bootstrap_module):
    ...
    sys.meta_path.append(PathFinder)

PathFinder is always registered. So you are not only asking for an API to customize sys.path, but also to customize sys.meta_path, right?


> A super minor paper cut is the lack of a PyConfig_SetBytesString() variant for PyWideStringList_Append(). It was slightly annoying having to convert a POSIX char* path to a wchar_t* since paths on POSIX are bytes.

Would you mind to open a separated issue for this feature request?


> It would be useful if there were some kind of PyErr API that returned a PyString (or PyStatus) and was guaranteed to work before main is initialized.

Are you asking to format the current exception as a string? Something like traceback.format_exc() but as a C function?


> Overall, the new code in PyOxidizer is much, much cleaner! Thanks again for the new API!

You're welcome. PyOxidizer is a good use case for PEP 587 (PyConfig). Sadly, you have to drop support for Python 3.8 and older, or maintain two code paths. I saw many projects which maintains two code paths: one for Python 3 (use Unicode and a few other changes), one for Python 2 (use bytes).
History
Date User Action Args
2022-04-11 14:59:29adminsetgithub: 84513
2020-04-27 13:40:17vstinnersetmessages: + msg367421
2020-04-25 21:23:15indygregsetmessages: + msg367284
2020-04-20 14:30:06vstinnersetnosy: + ncoghlan, eric.snow, steve.dower
2020-04-20 14:29:42vstinnersetmessages: + msg366831
2020-04-19 20:10:46indygregsettype: enhancement
2020-04-19 20:10:33indygregcreate