classification
Title: Make pyvenv style virtual environments easier to configure when embedding Python
Type: enhancement Stage:
Components: Versions: Python 3.5
process
Status: open Resolution:
Dependencies: 22257 Superseder:
Assigned To: Nosy List: grahamd, ncoghlan, pitrou
Priority: normal Keywords:

Created on 2014-08-17 11:42 by grahamd, last changed 2014-08-25 19:04 by pitrou.

Messages (8)
msg225434 - (view) Author: Graham Dumpleton (grahamd) Date: 2014-08-17 11:42
In am embedded system, as the 'python' executable is itself not run and the Python interpreter is initialised in process explicitly using PyInitialize(), in order to find the location of the Python installation, an elaborate sequence of checks is run as implemented in calculate_path() of Modules/getpath.c.

The primary mechanism is usually to search for a 'python' executable on PATH and use that as a starting point. From that it then back tracks up the file system from the bin directory to arrive at what would be the perceived equivalent of PYTHONHOME. The lib/pythonX.Y directory under that for the matching version X.Y of Python being initialised would then be used.

Problems can often occur with the way this search is done though.

For example, if someone is not using the system Python installation but has installed a different version of Python under /usr/local. At run time, the correct Python shared library would be getting loaded from /usr/local/lib, but because the 'python' executable is found from /usr/bin, it uses /usr as sys.prefix instead of /usr/local.

This can cause two distinct problems.

The first is that there is no Python installation at all under /usr corresponding to the Python version which was embedded, with the result of it not being able to import 'site' module and therefore failing.

The second is that there is a Python installation of the same major/minor but potentially a different patch revision, or compiled with different binary API flags or different Unicode character width. The Python interpreter in this case may well be able to start up, but the mismatch in the Python modules or extension modules and the core Python library that was actually linked can cause odd errors or crashes to occur.

Anyway, that is the background.

For an embedded system the way this problem was overcome was for it to use Py_SetPythonHome() to forcibly override what should be used for PYTHONHOME so that the correct installation was found and used at runtime.

Now this would work quite happily even for Python virtual environments constructed using 'virtualenv' allowing the embedded system to be run in that separate virtual environment distinct from the main Python installation it was created from.

Although this works for Python virtual environments created using 'virtualenv', it doesn't work if the virtual environment was created using pyvenv.

One can easily illustrate the problem without even using an embedded system.

$ which python3.4
/Library/Frameworks/Python.framework/Versions/3.4/bin/python3.4

$ pyvenv-3.4 py34-pyvenv

$ py34-pyvenv/bin/python
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 00:54:21)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.prefix
'/private/tmp/py34-pyvenv'
>>> sys.path
['', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python34.zip', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/plat-darwin', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/lib-dynload', '/private/tmp/py34-pyvenv/lib/python3.4/site-packages']

$ PYTHONHOME=/tmp/py34-pyvenv python3.4
Fatal Python error: Py_Initialize: unable to load the file system codec
ImportError: No module named 'encodings'
Abort trap: 6

The basic problem is that in a pyvenv virtual environment, there is no duplication of stuff in lib/pythonX.Y, with the only thing in there being the site-packages directory.

When you start up the 'python' executable direct from the pyvenv virtual environment, the startup sequence checks know this and consult the pyvenv.cfg to extract the:

home = /Library/Frameworks/Python.framework/Versions/3.4/bin

setting and from that derive where the actual run time files are.

When PYTHONHOME or Py_SetPythonHome() is used, then the getpath.c checks blindly believe that is the authoritative value:

 * Step 2. See if the $PYTHONHOME environment variable points to the
 * installed location of the Python libraries.  If $PYTHONHOME is set, then
 * it points to prefix and exec_prefix.  $PYTHONHOME can be a single
 * directory, which is used for both, or the prefix and exec_prefix
 * directories separated by a colon.

    /* If PYTHONHOME is set, we believe it unconditionally */
    if (home) {
        wchar_t *delim;
        wcsncpy(prefix, home, MAXPATHLEN);
        prefix[MAXPATHLEN] = L'\0';
        delim = wcschr(prefix, DELIM);
        if (delim)
            *delim = L'\0';
        joinpath(prefix, lib_python);
        joinpath(prefix, LANDMARK);
        return 1;
    }
Because of this, the problem above occurs as the proper runtime directories for files aren't included in sys.path. The result being that the 'encodings' module cannot even be found.

What I believe should occur is that PYTHONHOME should not be believed unconditionally. Instead there should be a check to see if that directory contains a pyvenv.cfg file and if there is one, realise it is a pyvenv style virtual environment and do the same sort of adjustments which would be made based on looking at what that pyvenv.cfg file contains.

For the record this issue is affecting Apache/mod_wsgi and right now the only workaround I have is to tell people that in addition to setting the configuration setting corresponding to PYTHONHOME, to use configuration settings to have the same effect as doing:

PYTHONPATH=/Library/Frameworks/Python.framework/Versions/3.4/lib/python34.zip:/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4:/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/plat-darwin:/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/lib-dynload

so that the correct runtime files are found.

I am still trying to work out a more permanent workaround I can add to mod_wsgi code itself since can't rely on a fix for existing Python versions with pyvenv support.

Only other option is to tell people not to use pyvenv and use virtualenv instead.

Right now I can offer no actual patch as that getpath.c code is scary enough that not even sure at this point where the check should be incorporated or how.

Only thing I can surmise is that the current check for pyvenv.cfg being before the search for the prefix is meaning that it isn't consulted.

    /* Search for an environment configuration file, first in the
       executable's directory and then in the parent directory.
       If found, open it for use when searching for prefixes.
    */

    {
        wchar_t tmpbuffer[MAXPATHLEN+1];
        wchar_t *env_cfg = L"pyvenv.cfg";
        FILE * env_file = NULL;

        wcscpy(tmpbuffer, argv0_path);

        joinpath(tmpbuffer, env_cfg);
        env_file = _Py_wfopen(tmpbuffer, L"r");
        if (env_file == NULL) {
            errno = 0;
            reduce(tmpbuffer);
            reduce(tmpbuffer);
            joinpath(tmpbuffer, env_cfg);
            env_file = _Py_wfopen(tmpbuffer, L"r");
            if (env_file == NULL) {
                errno = 0;
            }
        }
        if (env_file != NULL) {
            /* Look for a 'home' variable and set argv0_path to it, if found */
            if (find_env_config_value(env_file, L"home", tmpbuffer)) {
                wcscpy(argv0_path, tmpbuffer);
            }
            fclose(env_file);
            env_file = NULL;
        }
    }

    pfound = search_for_prefix(argv0_path, home, _prefix, lib_python);
msg225436 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-08-17 11:58
Yeah, PEP 432 (my proposal to redesign the startup sequence) could just as well be subtitled "getpath.c hurts my brain" :P

One tricky part here is going to be figuring out how to test this - perhaps adding a new test option to _testembed and then running it both inside and outside a venv.
msg225437 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-08-17 12:18
Graham pointed out that setting PYTHONHOME ends up triggering the same control flow through getpath.c as calling Py_SetPythonHome, so this can be tested just with pyvenv and a suitably configured environment.

It may still be a little tricky though, since we normally run the pyvenv tests in isolated mode to avoid spurious failures due to bad environment settings...
msg225739 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-08-23 10:24
Some more experiments, comparing an installed vs uninstalled Python. One failure mode is that setting PYTHONHOME just plain breaks running from a source checkout (setting PYTHONHOME to the checkout directory also fails):

$ ./python -m venv --without-pip /tmp/issue22213-py35

$ /tmp/issue22213-py35/bin/python -c "import sys; print(sys.base_prefix, sys.base_exec_prefix)"
/usr/local /usr/local

$ PYTHONHOME=/usr/local /tmp/issue22213-py35/bin/python -c "import sys; print(sys.base_prefix, sys.base_exec_prefix)"
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'
Aborted (core dumped)

Trying after running "make altinstall" (which I had previously done for 3.4) is a bit more enlightening:

$ python3.4 -m venv --without-pip /tmp/issue22213-py34

$ /tmp/issue22213-py34/bin/python -c "import sys; print(sys.base_prefix, sys.base_exec_prefix)"
/usr/local /usr/local

$ PYTHONHOME=/usr/local /tmp/issue22213-py34/bin/python -c "import sys; print(sys.base_prefix, sys.base_exec_prefix)"
/usr/local /usr/local

$ PYTHONHOME=/tmp/issue22213-py34 /tmp/issue22213-py34/bin/python -c "import sys; print(sys.base_prefix, sys.base_exec_prefix)"
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'
Aborted (core dumped)

$ PYTHONHOME=/tmp/issue22213-py34:/usr/local /tmp/issue22213-py34/bin/python -c "import sys; print(sys.base_prefix, sys.base_exec_prefix)"
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'
Aborted (core dumped)
[ncoghlan@lancre py34]$ PYTHONHOME=/usr/local:/tmp/issue22213-py34/bin /tmp/issue22213-py34/bin/python -c "import sys; print(sys.base_prefix, sys.base_exec_prefix)"
/usr/local /tmp/issue22213-py34/bin

I think what this is actually showing is that there's a fundamental conflict between mod_wsgi's expectation of being able to set PYTHONHOME to point to the virtual environment, and the way PEP 405 virtual environments actually work.

With PEP 405, all the operations in getpath.c expect to execute while pointing to the *base* environment: where the standard library lives. It is then up to site.py to later adjust the based prefix location, as can be demonstrated by the fact pyvenv.cfg isn't processed if processing the site module is disabled:

$ /tmp/issue22213-py34/bin/python -c "import sys; print(sys.prefix, sys.exec_prefix)"
/tmp/issue22213-py34 /tmp/issue22213-py34
$ /tmp/issue22213-py34/bin/python -S -c "import sys; print(sys.prefix, sys.exec_prefix)"
/usr/local /usr/local

At this point in time, there isn't an easy way for an embedding application to say "here's the standard library, here's the virtual environment with user packages" - it's necessary to just override the path calculations entirely.

Allowing that kind of more granular configuration is one of the design goals of PEP 432, so adding that as a dependency here.
msg225742 - (view) Author: Graham Dumpleton (grahamd) Date: 2014-08-23 11:48
It is actually very easy for me to work around and I released a new mod_wsgi version today which works.

When I get a Python home option, instead of calling Py_SetPythonHome() with it, I append '/bin/python' to it and call Py_SetProgramName() instead.
msg225771 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-08-24 00:04
Excellent! If I recall correctly, that works because we resolve the symlink when looking for the standard library, but not when looking for venv configuration file.

I also suspect this is all thoroughly broken on Windows - there are so many configuration operations and platform specific considerations that need to be accounted for in getpath.c these days that it has become close to incomprehensible :(

One of my main goals with PEP 432 is actually to make it possible to rewrite the path configuration code in a more maintainable way - my unofficial subtitle for that PEP is "getpath.c must die!" :)
msg225774 - (view) Author: Graham Dumpleton (grahamd) Date: 2014-08-24 00:19
I only make the change to Py_SetProgramName() on UNIX and not Windows. This is because back in mod_wsgi 1.0 I did actually used to use Py_SetProgramName() but it didn't seem to work in sane way on Windows so changed to Py_SetPythonHome() which worked on both Windows and UNIX. Latest versions of mod_wsgi haven't been updated yet to even build on Windows, so not caring about Windows right now.
msg225890 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-08-25 19:04
That workaround would definitely deserve being wrapped in a higher-level API invokable by embedding applications, IMHO.
History
Date User Action Args
2014-08-25 19:04:21pitrousetnosy: + pitrou

messages: + msg225890
versions: + Python 3.5, - Python 3.4
2014-08-24 00:19:35grahamdsetmessages: + msg225774
2014-08-24 00:04:56ncoghlansettype: enhancement
messages: + msg225771
title: pyvenv style virtual environments unusable in an embedded system -> Make pyvenv style virtual environments easier to configure when embedding Python
2014-08-23 11:48:19grahamdsetmessages: + msg225742
2014-08-23 10:24:48ncoghlansetdependencies: + PEP 432: Redesign the interpreter startup sequence
messages: + msg225739
2014-08-17 12:18:40ncoghlansetmessages: + msg225437
2014-08-17 11:58:40ncoghlansetmessages: + msg225436
2014-08-17 11:49:32ncoghlansetnosy: + ncoghlan
2014-08-17 11:42:41grahamdcreate