classification
Title: Solaris/Oracle Studio: Fatal Python error: PyThreadState_Get when built --with-pymalloc
Type: crash Stage: resolved
Components: Build Versions: Python 3.4
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Bus error in pybuilddir.txt 'python -m sysconfigure --generate-posix-vars' build step
View: 21166
Assigned To: Nosy List: jbeck, jcea, ned.deily, swalker, vstinner
Priority: normal Keywords:

Created on 2014-05-01 20:37 by jbeck, last changed 2014-08-18 23:11 by jbeck. This issue is now closed.

Files
File name Uploaded Description Edit
where.out jbeck, 2014-05-02 22:31 output (260 frames) of 'where' in gdb
Messages (11)
msg217723 - (view) Author: John Beck (jbeck) Date: 2014-05-01 20:37
I am porting Python 3.4.0 to Solaris 12.  The Makefile I inherited from my predecessor had --without-pymalloc as an option to be passed to configure.  Curious why, I removed this line, only to find that after python was built, it core dumped:

LD_LIBRARY_PATH=/builds/jbeck/ul-python-3/components/python/python34/build/sparcv9 ./python -E -S -m sysconfig --generate-posix-vars
Fatal Python error: PyThreadState_Get: no current thread
make[3]: *** [pybuilddir.txt] Abort (core dumped)

But if I add the --without-pymalloc line back to my Makefile, everything works fine.
 
Note that although this example was on sparc, the exact same thing occurred on x86.

I searched for a similar bug but did not find out; please feel free to close this as a duplicate if there is one that I missed.  I also suspect I have not provided enough information, out of a desire not to trigger information overload.  But I would be happy to provide whatever specifics might be requested.
msg217724 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-05-01 20:47
On SPARC/suncc the flags in http://bugs.python.org/issue15963#msg170661
appear to work.

Also, we have several Solaris build slaves that don't core dump.
Some are offline, but you can click through to the ./configure
steps of past builds to see the build flags.

http://buildbot.python.org/all/waterfall?category=3.x.stable&category=3.x.unstable
msg217733 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2014-05-01 22:49
What compiler are you using?.

I compile fine on Solaris with GCC.
msg217735 - (view) Author: John Beck (jbeck) Date: 2014-05-01 23:11
Using Oracle Studio 12.3, same as mentioned in http://bugs.python.org/issue15963#msg170661 (as Stefan pointed out).  I am using some of those flags but not all of them.  I will try the others when I have a chance, then report back.
msg217782 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-05-02 20:04
"LD_LIBRARY_PATH=/builds/jbeck/ul-python-3/components/python/python34/build/sparcv9 ./python -E -S -m sysconfig --generate-posix-vars
Fatal Python error: PyThreadState_Get: no current thread"

Could you please run this command in gdb and copy/paste the C traceback (gdb command "where") where the fatal error occurs?
msg217804 - (view) Author: John Beck (jbeck) Date: 2014-05-02 22:31
Victor: sure; see attached.
msg218386 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-05-13 00:02
> Victor: sure; see attached.

Ok, so the error occurs when Python tries to import the _heapq dynamic module: PyModule_Create2() calls PyThreadState_Get() to retrieve to current thread, but it fails. There is a current thread because PyModule_Create2() is called indirectly by PyEval_EvalFrameExReal() (and I don't see where the GIL would be released in the call stack).

It looks like a bug in PyThreadState_Get(). This function relies on _Py_atomic_load_relaxed() which is defined in Include/pyatomic.h. This file has an implementation of atomic functions for Intel processors and contains an interesting comment:

...
#else  /* !gcc x86 */
/* Fall back to other compilers and processors by assuming that simple
   volatile accesses are atomic.  This is false, so people should port
   this. */
...

It looks like John tries Python on SPARC which may explain the issue.

This is just a theory. It also looks like we had SPARC buildbots running on Solaris with system C compiler ("/opt/solarisstudio12.3/bin/cc") and it was able to run tests.

I don't understand the link with pymalloc.

@John: Did you try to build Python 3.3? Did it work?
msg218396 - (view) Author: John Beck (jbeck) Date: 2014-05-13 02:35
Victor:

* This is not a SPARC-specific issue; the exact same failure occurs
  on x86.

* I had built Python 3.3 (some time ago) but only --without-pymalloc.
  But I tried just now rebuilt Python 3.3 --with-pymalloc, and it
  failed in the exact same way.
msg218410 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-05-13 08:09
"This is not a SPARC-specific issue; the exact same failure occurs on x86."

Ah ok, good to know. To me, it looks like a compiler issue. Did you try Stefan's advices in issue #15963?

You may try to disable compiler optimizations to see if you get the same behaviour.
msg225218 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2014-08-12 03:30
This appears to be another variation on the problem recently identified in Issue21166, namely that the pybuildir.txt Makefile rule can incorrectly import a shared library module from a previously installed Python instance and, if the ABIs of the installed and being-built Pythons differ, the newly-built interpreter can fail in various ways.  From your supplied trace, one can see that _heapq.so has incorrectly been inported from the installed system Python 3.4 which was probably built with --without-pymalloc:

#7  0x00007ff2f9ee2a6d in PyInit__heapq ()
   from /usr/lib/python3.4/lib-dynload/64/_heapq.so
#8  0x00007ff2f94c7c78 in _PyImport_LoadDynamicModule ()
   from /builds/jbeck/ul-python-3/components/python/python34/build/amd64/libpython3.4m.so.1.0

The fixes for Issue21166, when applied, should prevent this problem.
msg225511 - (view) Author: John Beck (jbeck) Date: 2014-08-18 23:11
Ned: yes, I can confirm that the patch from http://bugs.python.org/issue21166 does indeed fix the problem.  Thank you very much!
History
Date User Action Args
2014-08-18 23:11:28jbecksetmessages: + msg225511
2014-08-12 03:30:29ned.deilysetstatus: open -> closed

superseder: Bus error in pybuilddir.txt 'python -m sysconfigure --generate-posix-vars' build step
components: + Build, - Interpreter Core

nosy: + ned.deily
messages: + msg225218
resolution: duplicate
stage: resolved
2014-05-13 22:03:29skrahsetnosy: - skrah
2014-05-13 08:09:21vstinnersetmessages: + msg218410
2014-05-13 08:05:11vstinnersettitle: core dump in PyThreadState_Get when built --with-pymalloc -> Solaris/Oracle Studio: Fatal Python error: PyThreadState_Get when built --with-pymalloc
2014-05-13 02:35:54jbecksetmessages: + msg218396
2014-05-13 00:02:43vstinnersetmessages: + msg218386
2014-05-02 22:31:37jbecksetfiles: + where.out

messages: + msg217804
2014-05-02 20:04:18vstinnersetmessages: + msg217782
2014-05-02 20:03:18vstinnersetnosy: + vstinner
2014-05-01 23:11:10jbecksetmessages: + msg217735
2014-05-01 22:49:32jceasetmessages: + msg217733
2014-05-01 22:46:51jceasetnosy: + jcea
2014-05-01 22:13:45swalkersetnosy: + swalker
2014-05-01 20:47:03skrahsetnosy: + skrah
messages: + msg217724
2014-05-01 20:37:58jbeckcreate