Author ned.deily
Recipients doko, koobs, ned.deily, vstinner
Date 2014-08-12.03:13:07
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1407813190.95.0.317100889185.issue21166@psf.upfronthosting.co.za>
In-reply-to
Content
This problem has reappeared on some of the freebsd buildbots, for example:

http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.x%202.7/builds/507

Thanks to a lot of good work by koobs in investigating and documenting the problems in irc, we have figured out what is going on here (and it's a lot more difficult to explain than to fix!).

The root cause is that there is a "bootstrap" issue with the pybuilddir.txt Makefile rule.  This build step is the first step that uses the newly-built Python executable; it creates the _sysconfigdata.py source that contains the compiled Makefile variables and it also creates pybuilddir.txt which contains a platform-dependent build directory name, primarily for the benefit of cross-compile builds.  This support was added by the changes for Issue13150 and Issue17512.  They added code in getpath.c to look for and use the build directory name from pybuilddir.txt for getpath.c to determine that the interpreter is being started from a build directory rather than from an installed instance.  In the former case, the code in getpath.c is supposed to set up both sys.prefix (for pure modules) and sys.exec_prefix (for C extension modules) so that standard library modules are loaded from the build/source directories.

However, if pybuilddir.txt does not already exist when the pybuilddir.txt Makefile rule executes (such as what happens with a clean build directory), getpath.c gets confused: search_for_prefix correctly determines that python is running from a build directory but search_for_exec_prefix does not.  This means that the sys.path that is created for this initial run of the newly-built skeleton python will cause it to find the right pure python modules in the source/build directories but it will use the installed location (as set by --prefix, default /usr/local) to search for C standard library shared extension modules (.so's).  Now, at this point, no shared .so's could have been built yet (in a clean build) and the -m sysconfig --generate-posix-vars step therefore cannot depend on any such modules.  But, if sys.exec_prefix does get set (incorrectly) to an installed path (because pybuilddir.txt does not exist yet) *and* there happen to be .so's there from a previous installation, those .so's can get imported and attempted to be used.  One such case in Python 2.7.x builds is cStringIO.so which is conditionally used by pprint if it is available, falling back to StringIO.py if not.  It so happens that pprint is used by sysconfig _generate-posix-vars in that build step.

Now it seems that most of the time, the spurious import of incorrect extension modules at this point is harmless.  However, there are configurations where that is not the case.  One such scenario is that of koobs's freebsd buildbot.  In this case, there was already an installed version of Python 2.7 built via the FreeBSD ports system with --enable-shared, a default prefix of /usr/local, and with a wide (ucs4) Unicode build.  The buildbot is configured non-shared, with debug enabled, and defaulting to a narrow (ucs2) build and /usr/local prefix.  Even though the buildbot build is never installed, whenever pybuilddir.txt did not already exist in the build directory (after a manual clean), getpath's search_for_exec_prefix ended up incorrectly adding /usr/local/lib/pythonx.x/lib-dynload to sys.path and causing cStringIO.so with a conflicting build ABI from the installed system Python to be imported and used, which can be seen in gdb traces to be the cause of the bus error.  (With Python 3.x, there is a different scenario that can result in an installed _heapq.so being imported but the root cause is the same.)

After finally isolating the scenario, I tried unsuccessfully to reproduce the bus error on some other platforms (e.g. OS X) but I was able to reproduce it on a FreeBSD 10 VM.  While this may appear to be a rather obscure scenario, there is at least one other open issue (Issue21412) which seems to be due to the same root cause so it is definitely worth fixing.  Rather than adding to the complexity of getpath.c, I think the best way to deal with this is in the Makefile.  The attached patches change the pybuilddir.txt rule's recipes to unconditionally create a pybuilddir.txt with a dummy path value which is sufficient to ensure that sys.exec_prefix does not point to the installed path location during this initial step.  Further, the patches also cause ./configure to always delete an existing pybuilddir.txt so that it will be properly recreated in case the build environment has changed.

I'm cc'ing Matthias here for a review for any cross-compile implications; AFAICT, there shouldn't be any.
History
Date User Action Args
2014-08-12 03:13:11ned.deilysetrecipients: + ned.deily, doko, vstinner, koobs
2014-08-12 03:13:10ned.deilysetmessageid: <1407813190.95.0.317100889185.issue21166@psf.upfronthosting.co.za>
2014-08-12 03:13:10ned.deilylinkissue21166 messages
2014-08-12 03:13:08ned.deilycreate