Author vstinner
Recipients eric.snow, grahamd, ncoghlan, pitrou, steve.dower, vstinner
Date 2019-02-13.23:04:31
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1550099072.15.0.587753992836.issue22213@roundup.psfhosted.org>
In-reply-to
Content
> It seems that the disagreement about the design is fundamentally a disagreement between a "quick, painful but complete fix" and "slow, careful improvements with a transition period". Both are valid approaches, and since Victor is putting actual effort in right now he gets to "win", but I do think we can afford to move faster.

Technically, the API already exists and is exposed as a private API:

* "_PyCoreConfig" structure
* "_PyInitError _Py_InitializeFromConfig(const _PyCoreConfig *config)" function
* "void _Py_FatalInitError(_PyInitError err)" function (should be called on failure)

I'm not really sure of the benefit compared to the current initialization API using Py_xxx global configuration variables (ex: Py_IgnoreEnvironmentFlag) and Py_Initialize().

_PyCoreConfig basically exposed *all* input parameters used to initialize Python, much more than jsut global configuration variables and the few function that can be called before Py_Initialize():
https://docs.python.org/dev/c-api/init.html


> Currently PEP 432 is the best description we have, and it looks like Victor has been heading in that direction too (deliberately? I don't know :) ).

Well, it's a strange story. At the beginning, I had a very simple use case... it took me more or less one year to implement it :-) My use case was to add... a new -X utf8 command line option:

* parsing the command line requires to decode bytes using an encoding
* the encoding depends on the locale, environment variable and options on the command line
* environment variables depend on the command line (-E option)

If the utf8 mode is enabled (PEP 540), the encoding must be set to UTF-8, all configuration must be removed and the whole configuration (env vars, cmdline, etc.) must be read again from scratch :-)

To be able to do that, I had to collect *every single* thing which has an impact on the Python initialization: all things that I moved into _PyCoreConfig.

... but I didn't want to break the backward compatibility, so I had to keep support for Py_xxx global configuration variables... and also the few initialization functions like Py_SetPath() or Py_SetStandardStreamEncoding().

Later it becomes very dark, my goal became very unclear and I looked at the PEP 432 :-)

Well, I wanted to expose _PyCoreConfig somehow, so I looked at the PEP 432 to see how it can be exposed.


> By necessity, it touches a lot of people's contributions to Python, but it also has the potential to seriously improve even more people's ability to _use_ Python (for example, I know an app that you all would recognize the name of who is working on embedding Python right now and would _love_ certain parts of this side of things to be improved).

_PyCoreConfig "API" makes some things way simpler. Maybe it was already possible to do them previously but it was really hard, or maybe it was just not possible.

If a _PyCoreConfig field is set: it has the priority over any other way to initialize the field. _PyCoreConfig has the highest prioririty.

For example, _PyCoreConfig allows to completely ignore the code which computes sys.path (and related variables) by setting directly the "path configuration":

* nmodule_search_path, module_search_paths: list of sys.path paths
* executable: sys.executable */
* prefix: sys.prefix
* base_prefix: sys.base_prefix
* exec_prefix: sys.exec_prefix
* base_exec_prefix sys.base_exec_prefix
* (Windows only) dll_path: Windows DLL path

The code which initializes these fields is really complex. Without _PyCoreConfig, it's hard to make sure that these fields are properly initialized as an embedder would like.




> Nick, Victor, Eric, (others?) - are you interested in having a virtual whiteboard session to brainstorm how the "perfect" initialization looks? And probably a follow-up to brainstorm how to get there without breaking the world? I don't think we're going to get to be in the same room anytime before the language summit, and it would be awesome to have something concrete to discuss there.

Sorry, I'm not sure of the API / structures, but when I discussed with Eric Snow at the latest sprint, we identified different steps in the Python initialization:

* only use bytes (no encoding), no access to the filesystem (not needed at this point)
* encoding defined, can use Unicode
* use the filesystem
* configuration converted as Python objects
* Python is fully initialized

--

Once I experimented to reorganize _PyCoreConfig and _PyMainInterpreterConfig to avoid redundancy: add a _PyPreConfig which contains only fields which are needed before _PyMainInterpreterConfig. With that change, _PyMainInterpreterConfig (and _PyPreConfig) *contained* _PyCoreConfig.

But it the change became very large, I wasn't sure that it was a good idea, I abandonned my change.

* https://github.com/python/cpython/pull/10575
* https://bugs.python.org/issue35266
* I have a more advanced version in this branch of my fork: https://github.com/vstinner/cpython/commits/pre_config_next

--

Ok, something else. _PyCoreConfig (and _PyMainInterpreterConfig) contain memory allocated on the heap. Problem: Python initialization changes the memory allocator. Code using _PyCoreConfig requires some "tricks" to ensure that the memory is *freed* with the same allocator used to *allocate* memory.

I created bpo-35265 "Internal C API: pass the memory allocator in a context" to pass a "context" to a lot of functions, context which contains the memory allocator but can contain more things later.

The idea of "a context" came during the discussion about a new C API: stop to rely on any global variable or "shared state", but *explicitly* pass a context to all functions. With that, it becomes possible to imagine to have two interpreters running in the same threads "at the same time".

Honestly, I'm not really sure that it's fully possible to implement this idea... Python has *so many* "shared state", like *everywhere*. It's really a giant project to move these shared states into structures and pass pointers to these structures.

So again, I abandonned my experimental change:
https://github.com/python/cpython/pull/10574

--

Memory allocator, context, different structures for configuration... it's really not an easy topic :-( There are so many constraints put into a single API!

The conservation option at this point is to keep the API private.

... Maybe we can explain how to use the private API but very explicitly warn that this API is experimental and can be broken anytime... And I plan to break it, to avoid redundancy between core and main configuration for example.

... I hope that these explanations give you a better idea of the big picture and the challenges :-)
History
Date User Action Args
2019-02-13 23:04:32vstinnersetrecipients: + vstinner, ncoghlan, pitrou, grahamd, eric.snow, steve.dower
2019-02-13 23:04:32vstinnersetmessageid: <1550099072.15.0.587753992836.issue22213@roundup.psfhosted.org>
2019-02-13 23:04:32vstinnerlinkissue22213 messages
2019-02-13 23:04:31vstinnercreate