classification
Title: Make pyvenv style virtual environments easier to configure when embedding Python
Type: enhancement Stage:
Components: Versions: Python 3.8
process
Status: open Resolution:
Dependencies: 22257 Superseder:
Assigned To: Nosy List: eric.snow, grahamd, inada.naoki, ncoghlan, pitrou, pyscripter, steve.dower, vstinner
Priority: normal Keywords:

Created on 2014-08-17 11:42 by grahamd, last changed 2019-10-17 23:01 by pyscripter.

Messages (26)
msg225434 - (view) Author: Graham Dumpleton (grahamd) Date: 2014-08-17 11:42
In am embedded system, as the 'python' executable is itself not run and the Python interpreter is initialised in process explicitly using PyInitialize(), in order to find the location of the Python installation, an elaborate sequence of checks is run as implemented in calculate_path() of Modules/getpath.c.

The primary mechanism is usually to search for a 'python' executable on PATH and use that as a starting point. From that it then back tracks up the file system from the bin directory to arrive at what would be the perceived equivalent of PYTHONHOME. The lib/pythonX.Y directory under that for the matching version X.Y of Python being initialised would then be used.

Problems can often occur with the way this search is done though.

For example, if someone is not using the system Python installation but has installed a different version of Python under /usr/local. At run time, the correct Python shared library would be getting loaded from /usr/local/lib, but because the 'python' executable is found from /usr/bin, it uses /usr as sys.prefix instead of /usr/local.

This can cause two distinct problems.

The first is that there is no Python installation at all under /usr corresponding to the Python version which was embedded, with the result of it not being able to import 'site' module and therefore failing.

The second is that there is a Python installation of the same major/minor but potentially a different patch revision, or compiled with different binary API flags or different Unicode character width. The Python interpreter in this case may well be able to start up, but the mismatch in the Python modules or extension modules and the core Python library that was actually linked can cause odd errors or crashes to occur.

Anyway, that is the background.

For an embedded system the way this problem was overcome was for it to use Py_SetPythonHome() to forcibly override what should be used for PYTHONHOME so that the correct installation was found and used at runtime.

Now this would work quite happily even for Python virtual environments constructed using 'virtualenv' allowing the embedded system to be run in that separate virtual environment distinct from the main Python installation it was created from.

Although this works for Python virtual environments created using 'virtualenv', it doesn't work if the virtual environment was created using pyvenv.

One can easily illustrate the problem without even using an embedded system.

$ which python3.4
/Library/Frameworks/Python.framework/Versions/3.4/bin/python3.4

$ pyvenv-3.4 py34-pyvenv

$ py34-pyvenv/bin/python
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 00:54:21)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.prefix
'/private/tmp/py34-pyvenv'
>>> sys.path
['', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python34.zip', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/plat-darwin', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/lib-dynload', '/private/tmp/py34-pyvenv/lib/python3.4/site-packages']

$ PYTHONHOME=/tmp/py34-pyvenv python3.4
Fatal Python error: Py_Initialize: unable to load the file system codec
ImportError: No module named 'encodings'
Abort trap: 6

The basic problem is that in a pyvenv virtual environment, there is no duplication of stuff in lib/pythonX.Y, with the only thing in there being the site-packages directory.

When you start up the 'python' executable direct from the pyvenv virtual environment, the startup sequence checks know this and consult the pyvenv.cfg to extract the:

home = /Library/Frameworks/Python.framework/Versions/3.4/bin

setting and from that derive where the actual run time files are.

When PYTHONHOME or Py_SetPythonHome() is used, then the getpath.c checks blindly believe that is the authoritative value:

 * Step 2. See if the $PYTHONHOME environment variable points to the
 * installed location of the Python libraries.  If $PYTHONHOME is set, then
 * it points to prefix and exec_prefix.  $PYTHONHOME can be a single
 * directory, which is used for both, or the prefix and exec_prefix
 * directories separated by a colon.

    /* If PYTHONHOME is set, we believe it unconditionally */
    if (home) {
        wchar_t *delim;
        wcsncpy(prefix, home, MAXPATHLEN);
        prefix[MAXPATHLEN] = L'\0';
        delim = wcschr(prefix, DELIM);
        if (delim)
            *delim = L'\0';
        joinpath(prefix, lib_python);
        joinpath(prefix, LANDMARK);
        return 1;
    }
Because of this, the problem above occurs as the proper runtime directories for files aren't included in sys.path. The result being that the 'encodings' module cannot even be found.

What I believe should occur is that PYTHONHOME should not be believed unconditionally. Instead there should be a check to see if that directory contains a pyvenv.cfg file and if there is one, realise it is a pyvenv style virtual environment and do the same sort of adjustments which would be made based on looking at what that pyvenv.cfg file contains.

For the record this issue is affecting Apache/mod_wsgi and right now the only workaround I have is to tell people that in addition to setting the configuration setting corresponding to PYTHONHOME, to use configuration settings to have the same effect as doing:

PYTHONPATH=/Library/Frameworks/Python.framework/Versions/3.4/lib/python34.zip:/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4:/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/plat-darwin:/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/lib-dynload

so that the correct runtime files are found.

I am still trying to work out a more permanent workaround I can add to mod_wsgi code itself since can't rely on a fix for existing Python versions with pyvenv support.

Only other option is to tell people not to use pyvenv and use virtualenv instead.

Right now I can offer no actual patch as that getpath.c code is scary enough that not even sure at this point where the check should be incorporated or how.

Only thing I can surmise is that the current check for pyvenv.cfg being before the search for the prefix is meaning that it isn't consulted.

    /* Search for an environment configuration file, first in the
       executable's directory and then in the parent directory.
       If found, open it for use when searching for prefixes.
    */

    {
        wchar_t tmpbuffer[MAXPATHLEN+1];
        wchar_t *env_cfg = L"pyvenv.cfg";
        FILE * env_file = NULL;

        wcscpy(tmpbuffer, argv0_path);

        joinpath(tmpbuffer, env_cfg);
        env_file = _Py_wfopen(tmpbuffer, L"r");
        if (env_file == NULL) {
            errno = 0;
            reduce(tmpbuffer);
            reduce(tmpbuffer);
            joinpath(tmpbuffer, env_cfg);
            env_file = _Py_wfopen(tmpbuffer, L"r");
            if (env_file == NULL) {
                errno = 0;
            }
        }
        if (env_file != NULL) {
            /* Look for a 'home' variable and set argv0_path to it, if found */
            if (find_env_config_value(env_file, L"home", tmpbuffer)) {
                wcscpy(argv0_path, tmpbuffer);
            }
            fclose(env_file);
            env_file = NULL;
        }
    }

    pfound = search_for_prefix(argv0_path, home, _prefix, lib_python);
msg225436 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-08-17 11:58
Yeah, PEP 432 (my proposal to redesign the startup sequence) could just as well be subtitled "getpath.c hurts my brain" :P

One tricky part here is going to be figuring out how to test this - perhaps adding a new test option to _testembed and then running it both inside and outside a venv.
msg225437 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-08-17 12:18
Graham pointed out that setting PYTHONHOME ends up triggering the same control flow through getpath.c as calling Py_SetPythonHome, so this can be tested just with pyvenv and a suitably configured environment.

It may still be a little tricky though, since we normally run the pyvenv tests in isolated mode to avoid spurious failures due to bad environment settings...
msg225739 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-08-23 10:24
Some more experiments, comparing an installed vs uninstalled Python. One failure mode is that setting PYTHONHOME just plain breaks running from a source checkout (setting PYTHONHOME to the checkout directory also fails):

$ ./python -m venv --without-pip /tmp/issue22213-py35

$ /tmp/issue22213-py35/bin/python -c "import sys; print(sys.base_prefix, sys.base_exec_prefix)"
/usr/local /usr/local

$ PYTHONHOME=/usr/local /tmp/issue22213-py35/bin/python -c "import sys; print(sys.base_prefix, sys.base_exec_prefix)"
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'
Aborted (core dumped)

Trying after running "make altinstall" (which I had previously done for 3.4) is a bit more enlightening:

$ python3.4 -m venv --without-pip /tmp/issue22213-py34

$ /tmp/issue22213-py34/bin/python -c "import sys; print(sys.base_prefix, sys.base_exec_prefix)"
/usr/local /usr/local

$ PYTHONHOME=/usr/local /tmp/issue22213-py34/bin/python -c "import sys; print(sys.base_prefix, sys.base_exec_prefix)"
/usr/local /usr/local

$ PYTHONHOME=/tmp/issue22213-py34 /tmp/issue22213-py34/bin/python -c "import sys; print(sys.base_prefix, sys.base_exec_prefix)"
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'
Aborted (core dumped)

$ PYTHONHOME=/tmp/issue22213-py34:/usr/local /tmp/issue22213-py34/bin/python -c "import sys; print(sys.base_prefix, sys.base_exec_prefix)"
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'
Aborted (core dumped)
[ncoghlan@lancre py34]$ PYTHONHOME=/usr/local:/tmp/issue22213-py34/bin /tmp/issue22213-py34/bin/python -c "import sys; print(sys.base_prefix, sys.base_exec_prefix)"
/usr/local /tmp/issue22213-py34/bin

I think what this is actually showing is that there's a fundamental conflict between mod_wsgi's expectation of being able to set PYTHONHOME to point to the virtual environment, and the way PEP 405 virtual environments actually work.

With PEP 405, all the operations in getpath.c expect to execute while pointing to the *base* environment: where the standard library lives. It is then up to site.py to later adjust the based prefix location, as can be demonstrated by the fact pyvenv.cfg isn't processed if processing the site module is disabled:

$ /tmp/issue22213-py34/bin/python -c "import sys; print(sys.prefix, sys.exec_prefix)"
/tmp/issue22213-py34 /tmp/issue22213-py34
$ /tmp/issue22213-py34/bin/python -S -c "import sys; print(sys.prefix, sys.exec_prefix)"
/usr/local /usr/local

At this point in time, there isn't an easy way for an embedding application to say "here's the standard library, here's the virtual environment with user packages" - it's necessary to just override the path calculations entirely.

Allowing that kind of more granular configuration is one of the design goals of PEP 432, so adding that as a dependency here.
msg225742 - (view) Author: Graham Dumpleton (grahamd) Date: 2014-08-23 11:48
It is actually very easy for me to work around and I released a new mod_wsgi version today which works.

When I get a Python home option, instead of calling Py_SetPythonHome() with it, I append '/bin/python' to it and call Py_SetProgramName() instead.
msg225771 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-08-24 00:04
Excellent! If I recall correctly, that works because we resolve the symlink when looking for the standard library, but not when looking for venv configuration file.

I also suspect this is all thoroughly broken on Windows - there are so many configuration operations and platform specific considerations that need to be accounted for in getpath.c these days that it has become close to incomprehensible :(

One of my main goals with PEP 432 is actually to make it possible to rewrite the path configuration code in a more maintainable way - my unofficial subtitle for that PEP is "getpath.c must die!" :)
msg225774 - (view) Author: Graham Dumpleton (grahamd) Date: 2014-08-24 00:19
I only make the change to Py_SetProgramName() on UNIX and not Windows. This is because back in mod_wsgi 1.0 I did actually used to use Py_SetProgramName() but it didn't seem to work in sane way on Windows so changed to Py_SetPythonHome() which worked on both Windows and UNIX. Latest versions of mod_wsgi haven't been updated yet to even build on Windows, so not caring about Windows right now.
msg225890 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-08-25 19:04
That workaround would definitely deserve being wrapped in a higher-level API invokable by embedding applications, IMHO.
msg334926 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-02-06 11:19
(Added Victor, Eric, and Steve to the nosy list here, as I'd actually forgotten about this until issue #35706 reminded me)

Core of the problem: the embedding APIs don't currently offer a Windows-compatible way of setting up "use this base Python and this venv site-packages", and the way of getting it to work on other platforms is pretty obscure.
msg334948 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-02-06 16:15
Victor may be thinking about it from time to time (or perhaps it's time to make the rest of the configuration changes plans concrete so we can all help out?), but I'd like to see this as either:

* a helper function to fill out the core config structure from a pyvenv.cfg file (rather than hiding it deeper as it currently is), or better yet,
* remove the dependency on all non-frozen imports at initialization and let embedders define Python code to do the initialization

In the latter case, the main python.exe also gets to define its behavior. So for the most part, we should be able to remove getpath[p].c and move it into the site module, then make that our Python initialization step.

This would also mean that if you are embedding Python but not allowing imports (e.g. as only a calculation engine), you don't have to do the dance of _denying_ all lookups, you simply don't initialize them.

But as far as I know, we don't have a concrete vision for "how will consumers embed Python in their apps" that can translate into work - we're still all individually pulling in slightly different directions. Sorting that out is most important - having someone willing to do the customer engagement work to define an actual set of requirements and desirables would be fantastic.
msg335015 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-02-07 12:18
Yeah, I mainly cc'ed Victor and Eric since making this easier ties into one of the original design goals for PEP 432 (even though I haven't managed to persuade either of them to become co-authors of that PEP yet).
msg335468 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-02-13 17:14
PEP 432 will allow to give with fine control on parameters used to initialize Python. Sadly, I failed to agree with Nick Coghlan and Eric Snow on the API. The current implementation (_PyCoreConfig and _PyMainInterpreterConfig) has some flaw (don't separate clearly the early initialization and Unicode-ready state, the interpreter contains main and core config whereas some options are duplicated in both configs, etc.).

See also bpo-35706.
msg335470 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-02-13 17:56
I just closed 35706 as a duplicate of this one (the titles are basically identical, which feels like a good hint ;) )

It seems that the disagreement about the design is fundamentally a disagreement between a "quick, painful but complete fix" and "slow, careful improvements with a transition period". Both are valid approaches, and since Victor is putting actual effort in right now he gets to "win", but I do think we can afford to move faster.

It seems the main people who will suffer from the pain here are embedders (who are already suffering pain) and the core developers (who explicitly signed up for pain!). But without knowing the end goal, we can't accelerate.

Currently PEP 432 is the best description we have, and it looks like Victor has been heading in that direction too (deliberately? I don't know :) ). But it seems like a good time to review it, replace the "here's the current state of things" with "here's an imaginary ideal state of things" and fill the rest with "here are the steps to get there without breaking the world".

By necessity, it touches a lot of people's contributions to Python, but it also has the potential to seriously improve even more people's ability to _use_ Python (for example, I know an app that you all would recognize the name of who is working on embedding Python right now and would _love_ certain parts of this side of things to be improved).

Nick - has the steering council been thinking about ways to promote collaborative development of ideas like this? I'm thinking an Etherpad style environment for the brainstorm period (in lieu of an in-person whiteboard session) that's easy for us all to add our concerns to, that can then be turned into something more formal.

Nick, Victor, Eric, (others?) - are you interested in having a virtual whiteboard session to brainstorm how the "perfect" initialization looks? And probably a follow-up to brainstorm how to get there without breaking the world? I don't think we're going to get to be in the same room anytime before the language summit, and it would be awesome to have something concrete to discuss there.
msg335479 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-02-13 23:04
> It seems that the disagreement about the design is fundamentally a disagreement between a "quick, painful but complete fix" and "slow, careful improvements with a transition period". Both are valid approaches, and since Victor is putting actual effort in right now he gets to "win", but I do think we can afford to move faster.

Technically, the API already exists and is exposed as a private API:

* "_PyCoreConfig" structure
* "_PyInitError _Py_InitializeFromConfig(const _PyCoreConfig *config)" function
* "void _Py_FatalInitError(_PyInitError err)" function (should be called on failure)

I'm not really sure of the benefit compared to the current initialization API using Py_xxx global configuration variables (ex: Py_IgnoreEnvironmentFlag) and Py_Initialize().

_PyCoreConfig basically exposed *all* input parameters used to initialize Python, much more than jsut global configuration variables and the few function that can be called before Py_Initialize():
https://docs.python.org/dev/c-api/init.html


> Currently PEP 432 is the best description we have, and it looks like Victor has been heading in that direction too (deliberately? I don't know :) ).

Well, it's a strange story. At the beginning, I had a very simple use case... it took me more or less one year to implement it :-) My use case was to add... a new -X utf8 command line option:

* parsing the command line requires to decode bytes using an encoding
* the encoding depends on the locale, environment variable and options on the command line
* environment variables depend on the command line (-E option)

If the utf8 mode is enabled (PEP 540), the encoding must be set to UTF-8, all configuration must be removed and the whole configuration (env vars, cmdline, etc.) must be read again from scratch :-)

To be able to do that, I had to collect *every single* thing which has an impact on the Python initialization: all things that I moved into _PyCoreConfig.

... but I didn't want to break the backward compatibility, so I had to keep support for Py_xxx global configuration variables... and also the few initialization functions like Py_SetPath() or Py_SetStandardStreamEncoding().

Later it becomes very dark, my goal became very unclear and I looked at the PEP 432 :-)

Well, I wanted to expose _PyCoreConfig somehow, so I looked at the PEP 432 to see how it can be exposed.


> By necessity, it touches a lot of people's contributions to Python, but it also has the potential to seriously improve even more people's ability to _use_ Python (for example, I know an app that you all would recognize the name of who is working on embedding Python right now and would _love_ certain parts of this side of things to be improved).

_PyCoreConfig "API" makes some things way simpler. Maybe it was already possible to do them previously but it was really hard, or maybe it was just not possible.

If a _PyCoreConfig field is set: it has the priority over any other way to initialize the field. _PyCoreConfig has the highest prioririty.

For example, _PyCoreConfig allows to completely ignore the code which computes sys.path (and related variables) by setting directly the "path configuration":

* nmodule_search_path, module_search_paths: list of sys.path paths
* executable: sys.executable */
* prefix: sys.prefix
* base_prefix: sys.base_prefix
* exec_prefix: sys.exec_prefix
* base_exec_prefix sys.base_exec_prefix
* (Windows only) dll_path: Windows DLL path

The code which initializes these fields is really complex. Without _PyCoreConfig, it's hard to make sure that these fields are properly initialized as an embedder would like.




> Nick, Victor, Eric, (others?) - are you interested in having a virtual whiteboard session to brainstorm how the "perfect" initialization looks? And probably a follow-up to brainstorm how to get there without breaking the world? I don't think we're going to get to be in the same room anytime before the language summit, and it would be awesome to have something concrete to discuss there.

Sorry, I'm not sure of the API / structures, but when I discussed with Eric Snow at the latest sprint, we identified different steps in the Python initialization:

* only use bytes (no encoding), no access to the filesystem (not needed at this point)
* encoding defined, can use Unicode
* use the filesystem
* configuration converted as Python objects
* Python is fully initialized

--

Once I experimented to reorganize _PyCoreConfig and _PyMainInterpreterConfig to avoid redundancy: add a _PyPreConfig which contains only fields which are needed before _PyMainInterpreterConfig. With that change, _PyMainInterpreterConfig (and _PyPreConfig) *contained* _PyCoreConfig.

But it the change became very large, I wasn't sure that it was a good idea, I abandonned my change.

* https://github.com/python/cpython/pull/10575
* https://bugs.python.org/issue35266
* I have a more advanced version in this branch of my fork: https://github.com/vstinner/cpython/commits/pre_config_next

--

Ok, something else. _PyCoreConfig (and _PyMainInterpreterConfig) contain memory allocated on the heap. Problem: Python initialization changes the memory allocator. Code using _PyCoreConfig requires some "tricks" to ensure that the memory is *freed* with the same allocator used to *allocate* memory.

I created bpo-35265 "Internal C API: pass the memory allocator in a context" to pass a "context" to a lot of functions, context which contains the memory allocator but can contain more things later.

The idea of "a context" came during the discussion about a new C API: stop to rely on any global variable or "shared state", but *explicitly* pass a context to all functions. With that, it becomes possible to imagine to have two interpreters running in the same threads "at the same time".

Honestly, I'm not really sure that it's fully possible to implement this idea... Python has *so many* "shared state", like *everywhere*. It's really a giant project to move these shared states into structures and pass pointers to these structures.

So again, I abandonned my experimental change:
https://github.com/python/cpython/pull/10574

--

Memory allocator, context, different structures for configuration... it's really not an easy topic :-( There are so many constraints put into a single API!

The conservation option at this point is to keep the API private.

... Maybe we can explain how to use the private API but very explicitly warn that this API is experimental and can be broken anytime... And I plan to break it, to avoid redundancy between core and main configuration for example.

... I hope that these explanations give you a better idea of the big picture and the challenges :-)
msg335484 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-02-14 00:09
Thanks, Victor, that's great information.

> Memory allocator, context, different structures for configuration... it's really not an easy topic :-( There are so many constraints put into a single API!

This is why I'm keen to design the ideal *user* API first (that is, write the examples of how you would use it) and then figure out how we can make it fit. It's kind of the opposite approach from what you've been doing to adapt the existing code to suit particular needs.

For example, imagine instead of all the PySet*() functions followed by Py_Initialize() you could do this:

    PyObject *runtime = PyRuntime_Create();
    /* optional calls */
    PyRuntime_SetAllocators(runtime, &my_malloc, &my_realloc, &my_free);
    PyRuntime_SetHashSeed(runtime, 12345);

    /* sets this as the current runtime via a thread local */
    auto old_runtime = PyRuntime_Activate(runtime);
    assert(old_runtime == NULL)

    /* pretend triple quoting works in C for a minute ;) */
    const char *init = """
    import os.path
    import sys

    sys.executable = argv0
    sys.prefix = os.path.dirname(argv0)
    sys.path = [os.getcwd(), sys.prefix, os.path.join(sys.prefix, "Lib")]

    pyvenv = os.path.join(sys.prefix, "pyvenv.cfg")
    try:
        with open(pyvenv, "r", encoding="utf-8") as f:  # *only* utf-8 support at this stage
            for line in f:
                if line.startswith("home"):
                    sys.path.append(line.partition("=")[2].strip())
                    break
    except FileNotFoundError:
        pass

    if sys.platform == "win32":
        sys.stdout = open("CONOUT$", "w", encoding="utf-8")
    else:
        # no idea if this is right, but you get the idea
        sys.stdout = open("/dev/tty", "w", encoding="utf-8")
    """;

    PyObject *globals = PyDict_New();
    /* only UTF-8 support at this stage */
    PyDict_SetItemString(globals, "argv0", PyUnicode_FromString(argv[0]));
    PyRuntime_Initialize(runtime, init_code, globals);
    Py_DECREF(globals);

    /* now we've initialised, loading codecs will succeed if we can find them or fail if not,
     * so we'd have to do cleanup to avoid depending on them without the user being able to
     * avoid it... */

    PyEval_EvalString("open('file.txt', 'w', encoding='gb18030').close()");

    /* may as well reuse DECREF for consistency */
    Py_DECREF(runtime);

Maybe it's a terrible idea? Honestly I'd be inclined to do other big changes at the same time (make PyObject opaque and interface driven, for example).

My point is that if the goal is to "move the existing internals around" then that's all we'll ever achieve. If we can say "the goal is to make this example work" then we'll be able to do much more.
msg335648 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-02-15 21:30
On Wed, Feb 13, 2019 at 10:56 AM Steve Dower <report@bugs.python.org> wrote:
> Nick, Victor, Eric, (others?) - are you interested in having a virtual whiteboard session to brainstorm how the "perfect" initialization looks? And probably a follow-up to brainstorm how to get there without breaking the world? I don't think we're going to get to be in the same room anytime before the language summit, and it would be awesome to have something concrete to discuss there.

Count me in.  This is a pretty important topic and doing this would
help accelerate our efforts by giving us a clearer common
understanding and objective.  FWIW, I plan on spending at least 5
minutes of my 25 minute PyCon talk on our efforts to fix up the C-API,
and this runtime initialization stuff is an important piece.
msg335650 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-02-15 22:11
On Wed, Feb 13, 2019 at 5:09 PM Steve Dower <report@bugs.python.org> wrote:
> This is why I'm keen to design the ideal *user* API first (that is, write the examples of how you would use it) and then figure out how we can make it fit.
> It's kind of the opposite approach from what you've been doing to adapt the existing code to suit particular needs.

That makes sense. :)

> For example, imagine instead of all the PySet*() functions followed by Py_Initialize() you could do this:
>
>     PyObject *runtime = PyRuntime_Create();

FYI, we already have a _PyRuntimeState struct (see
Include/internal/pycore_pystate.h) which is where I pulled in a lot of
the static globals last year.  Now there is one process-global
_PyRuntime (created in Python/pylifecycle.c) in place of all those
globals.  Note that _PyRuntimeState is in parallel with
PyInterpreterState, so not a PyObject.

>     /* optional calls */
>     PyRuntime_SetAllocators(runtime, &my_malloc, &my_realloc, &my_free);
>     PyRuntime_SetHashSeed(runtime, 12345);

Note that one motivation behind PEP 432 (and its config structs) is to
keep all the config together.  Having the one struct means you always
clearly see what your options are.  Another motivation is to keep the
config (dense with public fields) separate from the actual run state
(opaque).  Having a bunch of config functions (and global variables in
the status quo) means a lot more surface area to deal with when
embedding, as opposed to 2 config structs + a few initialization
functions (and a couple of helpers) like in PEP 432.

I don't know that you consciously intended to move away from the dense
config struct route, so I figured I'd be clear. :)

>     /* sets this as the current runtime via a thread local */
>     auto old_runtime = PyRuntime_Activate(runtime);
>     assert(old_runtime == NULL)

Hmm, there are two ways we could go with this: keep using TLS (or
static global in the case of _PyRuntime) or update the C-API to
require explicitly passing the context (e.g. runtime, interp, tstate,
or some wrapper) into all the functions that need it.  Of course,
changing that would definitely need some kind of compatibility shim to
avoid requiring massive changes to every extension out there, which
would mean effectively 2 C-APIs mirroring each other.  So sticking
with TLS is simpler.  Personally, I'd prefer going the explicit
argument route.

>
>     /* pretend triple quoting works in C for a minute ;) */
>     const char *init_code = """
> [snip]
>     """;
>
>     PyObject *globals = PyDict_New();
>     /* only UTF-8 support at this stage */
>     PyDict_SetItemString(globals, "argv0", PyUnicode_FromString(argv[0]));
>     PyRuntime_Initialize(runtime, init_code, globals);

Nice.  I like that this keeps the init code right by where it's used,
while also making it much more concise and easier to follow (since
it's Python code).

>     PyEval_EvalString("open('file.txt', 'w', encoding='gb18030').close()");

I definitely like the approach of directly embedding the Python code
like this. :)  Are there any downsides?

> Maybe it's a terrible idea?

Nah, we definitely want to maximize simplicity and your example offers
a good shift in that direction. :)

> Honestly I'd be inclined to do other big changes at the same time (make PyObject opaque and interface driven, for example).

Definitely!  Those aren't big blockers on cleaning up initialization
though, are they?

> My point is that if the goal is to "move the existing internals around" then that's all we'll ever achieve. If we can say "the goal is to make this example work" then we'll be able to do much more.

Yep.  I suppose part of the problem is that the embedding use cases
aren't understood (or even recognized) well enough.
msg335688 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-02-16 14:20
Steve, you're describing the goals of PEP 432 - design the desired API, then write the code to implement it. So while Victor's goal was specifically to get PEP 540 implemented, mine was just to make it so working on the startup sequence was less awful (and in particular, to make it possible to rewrite getpath.c in Python at some point).

Unfortunately, it turns out that redesigning a going-on-thirty-year-old startup sequence takes a while, as we first have to discover what all the global settings actually *are* :)

https://www.python.org/dev/peps/pep-0432/#invocation-of-phases describes an older iteration of the draft API design that was reasonably accurate at the point where Eric merged the in-development refactoring as a private API (see https://bugs.python.org/issue22257 and https://www.python.org/dev/peps/pep-0432/#implementation-strategy for details).

However, that initial change was basically just a skeleton - we didn't migrate many of the settings over to the new system at that point (although we did successfully split the import system initialization into two parts, so you can enable builtin and frozen imports without necessarily enabling external ones).

The significant contribution that Victor then made was to actually start migrating settings into the new structure, adapting it as needed based on the goals of PEP 540.

Eric updated quite a few more internal APIs as he worked on improving the subinterpreter support.

Between us, we also made a number of improvements to https://docs.python.org/3/c-api/init.html based on what we learned in the process of making those changes.

So I'm completely open to changing the details of the API that PEP 432 is proposing, but the main lesson we've learned from what we've done so far is that CPython's long history of embedding support *does* constrain what we can do in practice, so it's necessary to account for feasibility of implementation when considering what we want to offer.

Ideally, the next step would be to update PEP 432 with a status report on what was learned in the development of Python 3.7 with the new configuration structures, and what the internal startup APIs actually look like now. Even though I reviewed quite a few of Victor and Eric's PR, even I don't have a clear overall picture of where we are now, and I suspect Victor and Eric are in a similar situation.
msg335692 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-02-16 16:21
Note also that Eric and I haven't failed to agree with Victor on an API, as Victor hasn't actually written a concrete proposal *for* a public API (neither as a PR updating PEP 432, nor as a separate PEP).

The current implementation does NOT follow the PEP as written, because _Py_CoreConfig ended up with all the settings in it, when it's supposed to be just the bare minimum needed to get the interpreter to a point where it can run Python code that only accesses builtin and frozen modules.
msg335749 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-02-17 04:57
Since I haven't really written them down anywhere else, noting some items I'm aware of from the Python 3.7 internals work that haven't made their way back into the PEP 432 public API proposal yet:

* If we only had to care about the pure embedding case, this would be a lot easier. We don't though: we also care about "CPython interpreter variants" that end up calling Py_Main, and hence respect all the CPython environment variables, command line arguments, and in-process global variables. So what Victor ended up having to implement was data structs for all three of those configuration sources, and then helper functions to write them into the consolidated config structs (as well as writing them back to the in-process global variables).

* Keeping the Py_Initialize and Py_Main APIs working mean that there are several API preconfiguration functions that need a way to auto-initialize the core runtime state with sensible defaults

* the current private implementation uses the PyCoreConfig/PyMainInterpreterConfig naming scheme. Based on some of Eric's work, the PEP currently suggests PyRuntimeConfig PyMainInterpreterConfig, but I don't think any of us are especially in love with the latter name. Our inability to find a good name for it may also be a sign that it needs to be broken up into three distinct pieces (PySystemInterfaceConfig, PyCompilerConfig, PyMainModuleConfig)
msg336793 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-02-28 01:53
I created bpo-36142: "Add a new _PyPreConfig step to Python initialization to setup memory allocator and encodings".
msg343636 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-27 15:04
I wrote the PEP 587 "Python Initialization Configuration" which has been accepted. It allows to completely override the "Path Configuration". I'm not sure that it fully implementation what it requested here, but it should now be easier to tune the Path Configuration. See:
https://www.python.org/dev/peps/pep-0587/#multi-phase-initialization-private-provisional-api

I implemented the PEP 587 in bpo-36763.
msg352905 - (view) Author: PyScripter (pyscripter) Date: 2019-09-20 22:34
To Victor:
So how does the implementation of PEP-587 help configure embedded python with venv?  It would be great help to provide some minimal instructions.
msg354856 - (view) Author: PyScripter (pyscripter) Date: 2019-10-17 22:00
Just in case this will be of help to anyone, I found a way to use venvs in embedded python.

- You first need to Initialize python that is referred as home in pyvenv.cfg.
- Then you execute the following script:

import sys
sys.executable = r"Path to the python executable inside the venv"
path = sys.path
for i in range(len(path)-1, -1, -1):
    if path[i].find("site-packages") > 0:
        path.pop(i)
import site
site.main()
del sys, path, i, site
msg354857 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-10-17 22:39
If you just want to be able to import modules from the venv, and you know the path to it, it's simpler to just do:

    import sys
    sys.path.append(r"path to venv\Lib\site-packages")

Updating sys.executable is only necessary if you're going to use libraries that try to re-launch itself, but any embedding application is going to have to do that anyway.
msg354858 - (view) Author: PyScripter (pyscripter) Date: 2019-10-17 23:01
To Steve:

I want the embedded venv to have the same sys.path as if you were running the venv python interpreter.  So my method takes into account for instance the include-system-site-packages option in pyvenv.cfg.  Also my method sets sys.prefix in the same way as the venv python interpreter.
History
Date User Action Args
2019-10-17 23:01:08pyscriptersetmessages: + msg354858
2019-10-17 22:39:17steve.dowersetmessages: + msg354857
2019-10-17 22:00:00pyscriptersetmessages: + msg354856
2019-09-20 22:34:52pyscriptersetnosy: + pyscripter
messages: + msg352905
2019-05-27 15:04:41vstinnersetmessages: + msg343636
2019-02-28 01:53:06vstinnersetmessages: + msg336793
2019-02-20 11:56:45inada.naokisetnosy: + inada.naoki
2019-02-17 04:57:10ncoghlansetmessages: + msg335749
2019-02-16 16:21:00ncoghlansetmessages: + msg335692
2019-02-16 14:20:45ncoghlansetmessages: + msg335688
2019-02-15 22:11:52eric.snowsetmessages: + msg335650
2019-02-15 21:30:34eric.snowsetmessages: + msg335648
2019-02-14 00:09:08steve.dowersetmessages: + msg335484
2019-02-13 23:04:32vstinnersetmessages: + msg335479
2019-02-13 17:56:56steve.dowersetmessages: + msg335470
2019-02-13 17:44:49steve.dowerlinkissue35706 superseder
2019-02-13 17:14:28vstinnersetmessages: + msg335468
2019-02-07 12:18:37ncoghlansetmessages: + msg335015
2019-02-06 16:15:22steve.dowersetmessages: + msg334948
versions: + Python 3.8, - Python 3.5
2019-02-06 11:19:43ncoghlansetnosy: + vstinner, eric.snow, steve.dower
messages: + msg334926
2014-08-25 19:04:21pitrousetnosy: + pitrou

messages: + msg225890
versions: + Python 3.5, - Python 3.4
2014-08-24 00:19:35grahamdsetmessages: + msg225774
2014-08-24 00:04:56ncoghlansettype: enhancement
messages: + msg225771
title: pyvenv style virtual environments unusable in an embedded system -> Make pyvenv style virtual environments easier to configure when embedding Python
2014-08-23 11:48:19grahamdsetmessages: + msg225742
2014-08-23 10:24:48ncoghlansetdependencies: + PEP 432 (PEP 587): Redesign the interpreter startup sequence
messages: + msg225739
2014-08-17 12:18:40ncoghlansetmessages: + msg225437
2014-08-17 11:58:40ncoghlansetmessages: + msg225436
2014-08-17 11:49:32ncoghlansetnosy: + ncoghlan
2014-08-17 11:42:41grahamdcreate