Message 225434 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	grahamd
Recipients	grahamd
Date	2014-08-17.11:42:38
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1408275761.36.0.915361061447.issue22213@psf.upfronthosting.co.za>
In-reply-to

Content
In am embedded system, as the 'python' executable is itself not run and the Python interpreter is initialised in process explicitly using PyInitialize(), in order to find the location of the Python installation, an elaborate sequence of checks is run as implemented in calculate_path() of Modules/getpath.c. The primary mechanism is usually to search for a 'python' executable on PATH and use that as a starting point. From that it then back tracks up the file system from the bin directory to arrive at what would be the perceived equivalent of PYTHONHOME. The lib/pythonX.Y directory under that for the matching version X.Y of Python being initialised would then be used. Problems can often occur with the way this search is done though. For example, if someone is not using the system Python installation but has installed a different version of Python under /usr/local. At run time, the correct Python shared library would be getting loaded from /usr/local/lib, but because the 'python' executable is found from /usr/bin, it uses /usr as sys.prefix instead of /usr/local. This can cause two distinct problems. The first is that there is no Python installation at all under /usr corresponding to the Python version which was embedded, with the result of it not being able to import 'site' module and therefore failing. The second is that there is a Python installation of the same major/minor but potentially a different patch revision, or compiled with different binary API flags or different Unicode character width. The Python interpreter in this case may well be able to start up, but the mismatch in the Python modules or extension modules and the core Python library that was actually linked can cause odd errors or crashes to occur. Anyway, that is the background. For an embedded system the way this problem was overcome was for it to use Py_SetPythonHome() to forcibly override what should be used for PYTHONHOME so that the correct installation was found and used at runtime. Now this would work quite happily even for Python virtual environments constructed using 'virtualenv' allowing the embedded system to be run in that separate virtual environment distinct from the main Python installation it was created from. Although this works for Python virtual environments created using 'virtualenv', it doesn't work if the virtual environment was created using pyvenv. One can easily illustrate the problem without even using an embedded system. $ which python3.4 /Library/Frameworks/Python.framework/Versions/3.4/bin/python3.4 $ pyvenv-3.4 py34-pyvenv $ py34-pyvenv/bin/python Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 00:54:21) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.prefix '/private/tmp/py34-pyvenv' >>> sys.path ['', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python34.zip', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/plat-darwin', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/lib-dynload', '/private/tmp/py34-pyvenv/lib/python3.4/site-packages'] $ PYTHONHOME=/tmp/py34-pyvenv python3.4 Fatal Python error: Py_Initialize: unable to load the file system codec ImportError: No module named 'encodings' Abort trap: 6 The basic problem is that in a pyvenv virtual environment, there is no duplication of stuff in lib/pythonX.Y, with the only thing in there being the site-packages directory. When you start up the 'python' executable direct from the pyvenv virtual environment, the startup sequence checks know this and consult the pyvenv.cfg to extract the: home = /Library/Frameworks/Python.framework/Versions/3.4/bin setting and from that derive where the actual run time files are. When PYTHONHOME or Py_SetPythonHome() is used, then the getpath.c checks blindly believe that is the authoritative value: * Step 2. See if the $PYTHONHOME environment variable points to the * installed location of the Python libraries. If $PYTHONHOME is set, then * it points to prefix and exec_prefix. $PYTHONHOME can be a single * directory, which is used for both, or the prefix and exec_prefix * directories separated by a colon. /* If PYTHONHOME is set, we believe it unconditionally / if (home) { wchar_t delim; wcsncpy(prefix, home, MAXPATHLEN); prefix[MAXPATHLEN] = L'\0'; delim = wcschr(prefix, DELIM); if (delim) delim = L'\0'; joinpath(prefix, lib_python); joinpath(prefix, LANDMARK); return 1; } Because of this, the problem above occurs as the proper runtime directories for files aren't included in sys.path. The result being that the 'encodings' module cannot even be found. What I believe should occur is that PYTHONHOME should not be believed unconditionally. Instead there should be a check to see if that directory contains a pyvenv.cfg file and if there is one, realise it is a pyvenv style virtual environment and do the same sort of adjustments which would be made based on looking at what that pyvenv.cfg file contains. For the record this issue is affecting Apache/mod_wsgi and right now the only workaround I have is to tell people that in addition to setting the configuration setting corresponding to PYTHONHOME, to use configuration settings to have the same effect as doing: PYTHONPATH=/Library/Frameworks/Python.framework/Versions/3.4/lib/python34.zip:/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4:/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/plat-darwin:/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/lib-dynload so that the correct runtime files are found. I am still trying to work out a more permanent workaround I can add to mod_wsgi code itself since can't rely on a fix for existing Python versions with pyvenv support. Only other option is to tell people not to use pyvenv and use virtualenv instead. Right now I can offer no actual patch as that getpath.c code is scary enough that not even sure at this point where the check should be incorporated or how. Only thing I can surmise is that the current check for pyvenv.cfg being before the search for the prefix is meaning that it isn't consulted. / Search for an environment configuration file, first in the executable's directory and then in the parent directory. If found, open it for use when searching for prefixes. / { wchar_t tmpbuffer[MAXPATHLEN+1]; wchar_t env_cfg = L"pyvenv.cfg"; FILE * env_file = NULL; wcscpy(tmpbuffer, argv0_path); joinpath(tmpbuffer, env_cfg); env_file = _Py_wfopen(tmpbuffer, L"r"); if (env_file == NULL) { errno = 0; reduce(tmpbuffer); reduce(tmpbuffer); joinpath(tmpbuffer, env_cfg); env_file = _Py_wfopen(tmpbuffer, L"r"); if (env_file == NULL) { errno = 0; } } if (env_file != NULL) { /* Look for a 'home' variable and set argv0_path to it, if found */ if (find_env_config_value(env_file, L"home", tmpbuffer)) { wcscpy(argv0_path, tmpbuffer); } fclose(env_file); env_file = NULL; } } pfound = search_for_prefix(argv0_path, home, _prefix, lib_python);

In am embedded system, as the 'python' executable is itself not run and the Python interpreter is initialised in process explicitly using PyInitialize(), in order to find the location of the Python installation, an elaborate sequence of checks is run as implemented in calculate_path() of Modules/getpath.c.

The primary mechanism is usually to search for a 'python' executable on PATH and use that as a starting point. From that it then back tracks up the file system from the bin directory to arrive at what would be the perceived equivalent of PYTHONHOME. The lib/pythonX.Y directory under that for the matching version X.Y of Python being initialised would then be used.

Problems can often occur with the way this search is done though.

For example, if someone is not using the system Python installation but has installed a different version of Python under /usr/local. At run time, the correct Python shared library would be getting loaded from /usr/local/lib, but because the 'python' executable is found from /usr/bin, it uses /usr as sys.prefix instead of /usr/local.

This can cause two distinct problems.

The first is that there is no Python installation at all under /usr corresponding to the Python version which was embedded, with the result of it not being able to import 'site' module and therefore failing.

The second is that there is a Python installation of the same major/minor but potentially a different patch revision, or compiled with different binary API flags or different Unicode character width. The Python interpreter in this case may well be able to start up, but the mismatch in the Python modules or extension modules and the core Python library that was actually linked can cause odd errors or crashes to occur.

Anyway, that is the background.

For an embedded system the way this problem was overcome was for it to use Py_SetPythonHome() to forcibly override what should be used for PYTHONHOME so that the correct installation was found and used at runtime.

Now this would work quite happily even for Python virtual environments constructed using 'virtualenv' allowing the embedded system to be run in that separate virtual environment distinct from the main Python installation it was created from.

Although this works for Python virtual environments created using 'virtualenv', it doesn't work if the virtual environment was created using pyvenv.

One can easily illustrate the problem without even using an embedded system.

$ which python3.4
/Library/Frameworks/Python.framework/Versions/3.4/bin/python3.4

$ pyvenv-3.4 py34-pyvenv

$ py34-pyvenv/bin/python
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 00:54:21)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.prefix
'/private/tmp/py34-pyvenv'
>>> sys.path
['', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python34.zip', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/plat-darwin', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/lib-dynload', '/private/tmp/py34-pyvenv/lib/python3.4/site-packages']

$ PYTHONHOME=/tmp/py34-pyvenv python3.4
Fatal Python error: Py_Initialize: unable to load the file system codec
ImportError: No module named 'encodings'
Abort trap: 6

The basic problem is that in a pyvenv virtual environment, there is no duplication of stuff in lib/pythonX.Y, with the only thing in there being the site-packages directory.

When you start up the 'python' executable direct from the pyvenv virtual environment, the startup sequence checks know this and consult the pyvenv.cfg to extract the:

home = /Library/Frameworks/Python.framework/Versions/3.4/bin

setting and from that derive where the actual run time files are.

When PYTHONHOME or Py_SetPythonHome() is used, then the getpath.c checks blindly believe that is the authoritative value:

 * Step 2. See if the $PYTHONHOME environment variable points to the
 * installed location of the Python libraries.  If $PYTHONHOME is set, then
 * it points to prefix and exec_prefix.  $PYTHONHOME can be a single
 * directory, which is used for both, or the prefix and exec_prefix
 * directories separated by a colon.

    /* If PYTHONHOME is set, we believe it unconditionally */
    if (home) {
        wchar_t *delim;
        wcsncpy(prefix, home, MAXPATHLEN);
        prefix[MAXPATHLEN] = L'\0';
        delim = wcschr(prefix, DELIM);
        if (delim)
            *delim = L'\0';
        joinpath(prefix, lib_python);
        joinpath(prefix, LANDMARK);
        return 1;
    }
Because of this, the problem above occurs as the proper runtime directories for files aren't included in sys.path. The result being that the 'encodings' module cannot even be found.

What I believe should occur is that PYTHONHOME should not be believed unconditionally. Instead there should be a check to see if that directory contains a pyvenv.cfg file and if there is one, realise it is a pyvenv style virtual environment and do the same sort of adjustments which would be made based on looking at what that pyvenv.cfg file contains.

For the record this issue is affecting Apache/mod_wsgi and right now the only workaround I have is to tell people that in addition to setting the configuration setting corresponding to PYTHONHOME, to use configuration settings to have the same effect as doing:

PYTHONPATH=/Library/Frameworks/Python.framework/Versions/3.4/lib/python34.zip:/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4:/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/plat-darwin:/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/lib-dynload

so that the correct runtime files are found.

I am still trying to work out a more permanent workaround I can add to mod_wsgi code itself since can't rely on a fix for existing Python versions with pyvenv support.

Only other option is to tell people not to use pyvenv and use virtualenv instead.

Right now I can offer no actual patch as that getpath.c code is scary enough that not even sure at this point where the check should be incorporated or how.

Only thing I can surmise is that the current check for pyvenv.cfg being before the search for the prefix is meaning that it isn't consulted.

    /* Search for an environment configuration file, first in the
       executable's directory and then in the parent directory.
       If found, open it for use when searching for prefixes.
    */

    {
        wchar_t tmpbuffer[MAXPATHLEN+1];
        wchar_t *env_cfg = L"pyvenv.cfg";
        FILE * env_file = NULL;

        wcscpy(tmpbuffer, argv0_path);

        joinpath(tmpbuffer, env_cfg);
        env_file = _Py_wfopen(tmpbuffer, L"r");
        if (env_file == NULL) {
            errno = 0;
            reduce(tmpbuffer);
            reduce(tmpbuffer);
            joinpath(tmpbuffer, env_cfg);
            env_file = _Py_wfopen(tmpbuffer, L"r");
            if (env_file == NULL) {
                errno = 0;
            }
        }
        if (env_file != NULL) {
            /* Look for a 'home' variable and set argv0_path to it, if found */
            if (find_env_config_value(env_file, L"home", tmpbuffer)) {
                wcscpy(argv0_path, tmpbuffer);
            }
            fclose(env_file);
            env_file = NULL;
        }
    }

    pfound = search_for_prefix(argv0_path, home, _prefix, lib_python);

History
Date	User	Action	Args
2014-08-17 11:42:41	grahamd	set	recipients: + grahamd
2014-08-17 11:42:41	grahamd	set	messageid: <1408275761.36.0.915361061447.issue22213@psf.upfronthosting.co.za>
2014-08-17 11:42:41	grahamd	link	issue22213 messages
2014-08-17 11:42:38	grahamd	create