classification
Title: Intermittent segfaults on PPC64 AIX 3.x
Type: crash Stage:
Components: Extension Modules Versions: Python 3.6
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: David.Edelsohn, skrah, vstinner
Priority: normal Keywords:

Created on 2015-09-30 09:26 by vstinner, last changed 2016-03-25 23:31 by vstinner. This issue is now closed.

Messages (11)
msg251919 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-30 09:26
This buildbot has low free memory. Maybe some part of _decimal doesn't handle an allocation failure?

http://buildbot.python.org/all/builders/PPC64%20AIX%203.x/builds/4173/steps/test/logs/stdio

...
[307/399/10] test_decimal
Fatal Python error: Segmentation fault

Current thread 0x00000001 (most recent call first):
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/test_decimal.py", line 444 in eval_equation
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/test_decimal.py", line 321 in eval_line
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/test_decimal.py", line 299 in eval_file
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/test_decimal.py", line 5591 in <lambda>
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/case.py", line 600 in run
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/case.py", line 648 in __call__
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 122 in run
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 84 in __call__
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 122 in run
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 84 in __call__
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/runner.py", line 176 in run
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/support/__init__.py", line 1775 in _run_suite
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/support/__init__.py", line 1809 in run_unittest
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/test_decimal.py", line 5598 in test_main
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/runtest.py", line 160 in runtest_inner
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/runtest.py", line 113 in runtest
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/main.py", line 292 in run_tests_sequential
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/main.py", line 334 in run_tests
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/main.py", line 365 in main
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/main.py", line 407 in main
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/main.py", line 429 in main_in_temp_cwd
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/__main__.py", line 3 in <module>
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/runpy.py", line 85 in _run_code
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/runpy.py", line 170 in _run_module_as_main
make: *** [buildbottest] Segmentation fault (core dumped)
msg252039 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-10-01 16:47
Usually these segfaults are toolchain bugs (I've had at least 8,
including gcc, suncc, libc...).


Just a couple of observations:

  - The bot builds with -DCONFIG_32=1 -DANSI=1 despite being PPC64.

  - When we had an AIX snakebite machine, the xlc compile worked on
    AIX (using about 50 obscure command line arguments).

  - In the default build, libmpdec functions use a lot of stack
    memory (for optimization while avoiding alloca). But there are
    no recursive tests, so a stack overflow would seem unlikely.
msg252074 - (view) Author: David Edelsohn (David.Edelsohn) * Date: 2015-10-02 00:00
The system has 128GB of memory.  The process limits are set to unlimited for data.  AIX defaults to 32 bit, although all processors are 64 bit, so the buildbot runs as 32 bit.  What does low free memory in the buildbot mean?

I'm surprised that Python requires a huge amount of memory for the tests.  It's possible that Python needs to be built with special options to allow additional malloc space (-bmaxdata:0xN0000000).
msg252111 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-10-02 12:28
I've checked:  test_decimal does not require abnormal amounts of
memory or stack. On Linux/x86 a stack size of 256 (default 8192)
is sufficient, and memory requirements aren't that high.

We assumed that there is some memory limit on the buildbot, since
in a later run test #pwmx330 failed with MemoryError.


The easiest way to debug this is to rerun the whole test suite
under gdb with the same random seed as in


http://buildbot.python.org/all/builders/PPC64%20AIX%203.x/builds/4173/steps/test/logs/stdio
msg252113 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-10-02 12:37
And the segfaults are apparently somewhat random. This is beginning
to look like an issue unrelated to decimal that was perhaps recently
introduced (in which case "hg bisect" would be the fastest
way to debug).


http://buildbot.python.org/all/builders/PPC64%20AIX%203.x/builds/4183/steps/test/logs/stdio

[129/399/3] test_email
Fatal Python error: Segmentation fault

Current thread 0x00000001 (most recent call first):
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/email/utils.py", line 57 in _has_surrogates
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/email/message.py", line 264 in get_payload
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/test_email/test_email.py", line 3463 in test_long_lines
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/case.py", line 600 in run
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/case.py", line 648 in __call__
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 122 in run
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 84 in __call__
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 122 in run
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 84 in __call__
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 122 in run
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 84 in __call__
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 122 in run
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 84 in __call__
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 122 in run
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 84 in __call__
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/runner.py", line 176 in run
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/support/__init__.py", line 1775 in _run_suite
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/support/__init__.py", line 1809 in run_unittest
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/runtest.py", line 159 in test_runner
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/runtest.py", line 160 in runtest_inner
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/runtest.py", line 113 in runtest
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/main.py", line 289 in run_tests_sequential
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/main.py", line 331 in run_tests
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/main.py", line 362 in main
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/main.py", line 404 in main
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/main.py", line 426 in main_in_temp_cwd
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/__main__.py", line 3 in <module>
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/runpy.py", line 85 in _run_code
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/runpy.py", line 170 in _run_module_as_main
msg252114 - (view) Author: David Edelsohn (David.Edelsohn) * Date: 2015-10-02 13:42
As we have seen with similar issues on other targets, this likely is due to the random order of tests.  In another case, the timezone was not being restored properly by GLIBC.  Another test is leaving the process in a state that somehow evokes this failure from test_decimal.
msg252115 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-10-02 13:44
I suggest to isolate tests using -j1: see my issue #25285.

(Currently, -j1 doesn't use subprocesses.)
msg252116 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-10-02 13:52
If you have time, you could use an explicit seed (and gdb):

# test_email segfault:
./python -m test -j 1 -u all -W --randseed 5634141
msg252210 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-10-03 13:56
> It's possible that Python needs to be built with special options to allow additional malloc space (-bmaxdata:0xN0000000).

It seems to be the case, see Misc/README.AIX.  This could explain the
MemoryErrors, but not the segfaults.


Are computed-gotos stable on gcc-AIX?  The README recommends disabling
them for xlc.


I'm also not sure how well Python supports threads on AIX. Often
these problems go away on unsupported platforms when configuring
--without-threads.
msg252211 - (view) Author: David Edelsohn (David.Edelsohn) * Date: 2015-10-03 14:06
Misc/README.AIX comments about XLC do not apply to GCC.

One can adjust the memory space at normal link time with -Wl,-bmaxdata:0xN0000000. This trades off heap for shared memory segments. One does not need the extra ldedit stop, which stuffs the same value into the application header.
msg262464 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-03-25 23:31
The origin of the crash is unknown. Since I didn't see the crash recently, I close the issue.
History
Date User Action Args
2016-03-25 23:31:57vstinnersetstatus: open -> closed
resolution: out of date
messages: + msg262464
2015-10-03 14:06:53David.Edelsohnsetmessages: + msg252211
2015-10-03 13:56:30skrahsetmessages: + msg252210
2015-10-02 13:52:49skrahsetmessages: + msg252116
2015-10-02 13:44:35vstinnersetmessages: + msg252115
2015-10-02 13:42:46David.Edelsohnsetmessages: + msg252114
2015-10-02 12:37:03skrahsetmessages: + msg252113
title: test_decimal sometimes crash on PPC64 AIX 3.x -> Intermittent segfaults on PPC64 AIX 3.x
2015-10-02 12:28:16skrahsetmessages: + msg252111
2015-10-02 00:00:10David.Edelsohnsetmessages: + msg252074
2015-10-01 16:47:41skrahsetnosy: + David.Edelsohn
messages: + msg252039
2015-10-01 04:31:46serhiy.storchakasettype: crash
components: + Extension Modules
2015-09-30 22:43:49vstinnersetnosy: + skrah
2015-09-30 09:26:05vstinnercreate