classification
Title: Bus error in pybuilddir.txt 'python -m sysconfigure --generate-posix-vars' build step
Type: Stage: resolved
Components: Build Versions: Python 3.5, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: doko, haypo, jcea, koobs, ned.deily, python-dev
Priority: normal Keywords: patch

Created on 2014-04-07 09:32 by haypo, last changed 2014-09-26 02:37 by jcea. This issue is now closed.

Files
File name Uploaded Description Edit
gdb.log koobs, 2014-04-07 11:27
issue21166_27.patch ned.deily, 2014-08-12 03:13 2.7 version
issue21166_3x.patch ned.deily, 2014-08-12 03:14 3.x version review
python-buildbot-broken-debugging.txt koobs, 2014-08-12 06:08 Bus Error debug & isolation
Messages (10)
msg215683 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-07 09:32
http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.x%203.4/builds/44/steps/compile/logs/stdio

./python -E -S -m sysconfig --generate-posix-vars
Bus error (core dumped)

http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.x%203.4
msg215699 - (view) Author: Kubilay Kocak (koobs) Date: 2014-04-07 11:27
Uploading gdb output at Victors request
msg215702 - (view) Author: Kubilay Kocak (koobs) Date: 2014-04-07 11:31
Interestingly, I note the following lines from the gdb log:

#5  0x0000000801ae1e99 in PyModule_Create2 () from /usr/local/lib/libpython3.4m.so.1
#6  0x0000000801840de8 in PyInit__heapq () from /usr/local/lib/python3.4/lib-dynload/_heapq.so

I had installed Python 3.4 just prior to Victor reporting the issue.

If its at all relevant, Python 3.4 was built using clang (not gcc, which the buildbots use)

Removing Python 3.4 from the system and rebuilding makes the issue go away.

The question is, what is ./python from the buildbot build directory doing using, loading or otherwise interacting with the python installation on the system in the first place? Is a lack of isolation the root cause?
msg215704 - (view) Author: Kubilay Kocak (koobs) Date: 2014-04-07 11:33
Clarification:

a) I had just installed Python 3.4 (at the system level, via ports)

a) Removing Python 3.4 from the system and (forcing a rebuild of the buildbot) makes the issue go away.
msg215782 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-09 00:16
I still don't understand the issue but... it's now fixed (I don't understand why), so I'm closing it.
msg225217 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2014-08-12 03:13
This problem has reappeared on some of the freebsd buildbots, for example:

http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.x%202.7/builds/507

Thanks to a lot of good work by koobs in investigating and documenting the problems in irc, we have figured out what is going on here (and it's a lot more difficult to explain than to fix!).

The root cause is that there is a "bootstrap" issue with the pybuilddir.txt Makefile rule.  This build step is the first step that uses the newly-built Python executable; it creates the _sysconfigdata.py source that contains the compiled Makefile variables and it also creates pybuilddir.txt which contains a platform-dependent build directory name, primarily for the benefit of cross-compile builds.  This support was added by the changes for Issue13150 and Issue17512.  They added code in getpath.c to look for and use the build directory name from pybuilddir.txt for getpath.c to determine that the interpreter is being started from a build directory rather than from an installed instance.  In the former case, the code in getpath.c is supposed to set up both sys.prefix (for pure modules) and sys.exec_prefix (for C extension modules) so that standard library modules are loaded from the build/source directories.

However, if pybuilddir.txt does not already exist when the pybuilddir.txt Makefile rule executes (such as what happens with a clean build directory), getpath.c gets confused: search_for_prefix correctly determines that python is running from a build directory but search_for_exec_prefix does not.  This means that the sys.path that is created for this initial run of the newly-built skeleton python will cause it to find the right pure python modules in the source/build directories but it will use the installed location (as set by --prefix, default /usr/local) to search for C standard library shared extension modules (.so's).  Now, at this point, no shared .so's could have been built yet (in a clean build) and the -m sysconfig --generate-posix-vars step therefore cannot depend on any such modules.  But, if sys.exec_prefix does get set (incorrectly) to an installed path (because pybuilddir.txt does not exist yet) *and* there happen to be .so's there from a previous installation, those .so's can get imported and attempted to be used.  One such case in Python 2.7.x builds is cStringIO.so which is conditionally used by pprint if it is available, falling back to StringIO.py if not.  It so happens that pprint is used by sysconfig _generate-posix-vars in that build step.

Now it seems that most of the time, the spurious import of incorrect extension modules at this point is harmless.  However, there are configurations where that is not the case.  One such scenario is that of koobs's freebsd buildbot.  In this case, there was already an installed version of Python 2.7 built via the FreeBSD ports system with --enable-shared, a default prefix of /usr/local, and with a wide (ucs4) Unicode build.  The buildbot is configured non-shared, with debug enabled, and defaulting to a narrow (ucs2) build and /usr/local prefix.  Even though the buildbot build is never installed, whenever pybuilddir.txt did not already exist in the build directory (after a manual clean), getpath's search_for_exec_prefix ended up incorrectly adding /usr/local/lib/pythonx.x/lib-dynload to sys.path and causing cStringIO.so with a conflicting build ABI from the installed system Python to be imported and used, which can be seen in gdb traces to be the cause of the bus error.  (With Python 3.x, there is a different scenario that can result in an installed _heapq.so being imported but the root cause is the same.)

After finally isolating the scenario, I tried unsuccessfully to reproduce the bus error on some other platforms (e.g. OS X) but I was able to reproduce it on a FreeBSD 10 VM.  While this may appear to be a rather obscure scenario, there is at least one other open issue (Issue21412) which seems to be due to the same root cause so it is definitely worth fixing.  Rather than adding to the complexity of getpath.c, I think the best way to deal with this is in the Makefile.  The attached patches change the pybuilddir.txt rule's recipes to unconditionally create a pybuilddir.txt with a dummy path value which is sufficient to ensure that sys.exec_prefix does not point to the installed path location during this initial step.  Further, the patches also cause ./configure to always delete an existing pybuilddir.txt so that it will be properly recreated in case the build environment has changed.

I'm cc'ing Matthias here for a review for any cross-compile implications; AFAICT, there shouldn't be any.
msg225220 - (view) Author: Kubilay Kocak (koobs) Date: 2014-08-12 06:08
:DDD

This was an awesome experience working with you Ned, thanks for all the help.

Attaching my debugging & isolation steps for additional detail, posterity and reference.
msg225706 - (view) Author: Roundup Robot (python-dev) Date: 2014-08-22 20:36
New changeset edb6b282469e by Ned Deily in branch '2.7':
Issue #21166: Prevent possible segfaults and other random failures of
http://hg.python.org/cpython/rev/edb6b282469e

New changeset e52d85f2e284 by Ned Deily in branch '3.4':
Issue #21166: Prevent possible segfaults and other random failures of
http://hg.python.org/cpython/rev/e52d85f2e284

New changeset 599dc1304a70 by Ned Deily in branch 'default':
Issue #21166: merge from 3.4
http://hg.python.org/cpython/rev/599dc1304a70
msg225707 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2014-08-22 20:41
Committed for release in 2.7.9, 3.4.2, and 3.5.0
msg225779 - (view) Author: Roundup Robot (python-dev) Date: 2014-08-24 01:11
New changeset 5a157e3b3c47 by Ned Deily in branch '2.7':
Issue #21166: fix typo in comment
http://hg.python.org/cpython/rev/5a157e3b3c47

New changeset 9b1bd9d42cc7 by Ned Deily in branch '3.4':
Issue #21166: fix typo in comment
http://hg.python.org/cpython/rev/9b1bd9d42cc7

New changeset 5ee9c99a4ca3 by Ned Deily in branch 'default':
Issue #21166: fix typo in comment
http://hg.python.org/cpython/rev/5ee9c99a4ca3
History
Date User Action Args
2014-09-26 02:37:18jceasetnosy: + jcea
2014-08-24 01:11:11python-devsetmessages: + msg225779
2014-08-22 20:41:06ned.deilysetstatus: open -> closed
resolution: fixed
messages: + msg225707

stage: patch review -> resolved
2014-08-22 20:36:58python-devsetnosy: + python-dev
messages: + msg225706
2014-08-12 06:09:01koobssetfiles: + python-buildbot-broken-debugging.txt

messages: + msg225220
2014-08-12 03:30:29ned.deilylinkissue21412 superseder
2014-08-12 03:14:17ned.deilysetfiles: + issue21166_3x.patch
2014-08-12 03:13:10ned.deilysetstatus: closed -> open
files: + issue21166_27.patch


keywords: + patch
stage: patch review
title: Bus error on "AMD64 FreeBSD 9.x 3.4" buildbot -> Bus error in pybuilddir.txt 'python -m sysconfigure --generate-posix-vars' build step
nosy: + ned.deily, doko
versions: + Python 2.7, Python 3.5
messages: + msg225217
components: + Build
resolution: fixed -> (no value)
2014-04-09 00:16:13hayposetstatus: open -> closed
resolution: fixed
messages: + msg215782
2014-04-07 11:33:03koobssetmessages: + msg215704
2014-04-07 11:31:01koobssetmessages: + msg215702
2014-04-07 11:27:08koobssetfiles: + gdb.log
nosy: + koobs
messages: + msg215699

2014-04-07 09:32:57haypocreate