Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent segfaults on PPC64 AIX 3.x #69463

Closed
vstinner opened this issue Sep 30, 2015 · 11 comments
Closed

Intermittent segfaults on PPC64 AIX 3.x #69463

vstinner opened this issue Sep 30, 2015 · 11 comments
Labels
extension-modules C modules in the Modules dir type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@vstinner
Copy link
Member

BPO 25276
Nosy @vstinner, @skrah

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2016-03-25.23:31:57.635>
created_at = <Date 2015-09-30.09:26:05.949>
labels = ['extension-modules', 'type-crash']
title = 'Intermittent segfaults on PPC64 AIX 3.x'
updated_at = <Date 2016-03-25.23:31:57.634>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2016-03-25.23:31:57.634>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2016-03-25.23:31:57.635>
closer = 'vstinner'
components = ['Extension Modules']
creation = <Date 2015-09-30.09:26:05.949>
creator = 'vstinner'
dependencies = []
files = []
hgrepos = []
issue_num = 25276
keywords = []
message_count = 11.0
messages = ['251919', '252039', '252074', '252111', '252113', '252114', '252115', '252116', '252210', '252211', '262464']
nosy_count = 3.0
nosy_names = ['vstinner', 'skrah', 'David.Edelsohn']
pr_nums = []
priority = 'normal'
resolution = 'out of date'
stage = None
status = 'closed'
superseder = None
type = 'crash'
url = 'https://bugs.python.org/issue25276'
versions = ['Python 3.6']

@vstinner
Copy link
Member Author

This buildbot has low free memory. Maybe some part of _decimal doesn't handle an allocation failure?

http://buildbot.python.org/all/builders/PPC64%20AIX%203.x/builds/4173/steps/test/logs/stdio

...
[307/399/10] test_decimal
Fatal Python error: Segmentation fault

Current thread 0x00000001 (most recent call first):
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/test_decimal.py", line 444 in eval_equation
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/test_decimal.py", line 321 in eval_line
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/test_decimal.py", line 299 in eval_file
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/test_decimal.py", line 5591 in <lambda>
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/case.py", line 600 in run
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/case.py", line 648 in __call__
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 122 in run
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 84 in __call__
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 122 in run
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 84 in __call__
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/runner.py", line 176 in run
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/support/init.py", line 1775 in _run_suite
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/support/init.py", line 1809 in run_unittest
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/test_decimal.py", line 5598 in test_main
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/runtest.py", line 160 in runtest_inner
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/runtest.py", line 113 in runtest
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/main.py", line 292 in run_tests_sequential
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/main.py", line 334 in run_tests
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/main.py", line 365 in main
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/main.py", line 407 in main
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/main.py", line 429 in main_in_temp_cwd
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/main.py", line 3 in <module>
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/runpy.py", line 85 in _run_code
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/runpy.py", line 170 in _run_module_as_main
make: *** [buildbottest] Segmentation fault (core dumped)

@serhiy-storchaka serhiy-storchaka added extension-modules C modules in the Modules dir type-crash A hard crash of the interpreter, possibly with a core dump labels Oct 1, 2015
@skrah
Copy link
Mannequin

skrah mannequin commented Oct 1, 2015

Usually these segfaults are toolchain bugs (I've had at least 8,
including gcc, suncc, libc...).

Just a couple of observations:

  • The bot builds with -DCONFIG_32=1 -DANSI=1 despite being PPC64.

  • When we had an AIX snakebite machine, the xlc compile worked on
    AIX (using about 50 obscure command line arguments).

  • In the default build, libmpdec functions use a lot of stack
    memory (for optimization while avoiding alloca). But there are
    no recursive tests, so a stack overflow would seem unlikely.

@DavidEdelsohn
Copy link
Mannequin

DavidEdelsohn mannequin commented Oct 2, 2015

The system has 128GB of memory. The process limits are set to unlimited for data. AIX defaults to 32 bit, although all processors are 64 bit, so the buildbot runs as 32 bit. What does low free memory in the buildbot mean?

I'm surprised that Python requires a huge amount of memory for the tests. It's possible that Python needs to be built with special options to allow additional malloc space (-bmaxdata:0xN0000000).

@skrah
Copy link
Mannequin

skrah mannequin commented Oct 2, 2015

I've checked: test_decimal does not require abnormal amounts of
memory or stack. On Linux/x86 a stack size of 256 (default 8192)
is sufficient, and memory requirements aren't that high.

We assumed that there is some memory limit on the buildbot, since
in a later run test #pwmx330 failed with MemoryError.

The easiest way to debug this is to rerun the whole test suite
under gdb with the same random seed as in

http://buildbot.python.org/all/builders/PPC64%20AIX%203.x/builds/4173/steps/test/logs/stdio

@skrah
Copy link
Mannequin

skrah mannequin commented Oct 2, 2015

And the segfaults are apparently somewhat random. This is beginning
to look like an issue unrelated to decimal that was perhaps recently
introduced (in which case "hg bisect" would be the fastest
way to debug).

http://buildbot.python.org/all/builders/PPC64%20AIX%203.x/builds/4183/steps/test/logs/stdio

[129/399/3] test_email
Fatal Python error: Segmentation fault

Current thread 0x00000001 (most recent call first):
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/email/utils.py", line 57 in _has_surrogates
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/email/message.py", line 264 in get_payload
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/test_email/test_email.py", line 3463 in test_long_lines
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/case.py", line 600 in run
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/case.py", line 648 in __call__
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 122 in run
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 84 in __call__
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 122 in run
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 84 in __call__
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 122 in run
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 84 in __call__
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 122 in run
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 84 in __call__
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 122 in run
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/suite.py", line 84 in __call__
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/unittest/runner.py", line 176 in run
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/support/init.py", line 1775 in _run_suite
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/support/init.py", line 1809 in run_unittest
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/runtest.py", line 159 in test_runner
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/runtest.py", line 160 in runtest_inner
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/runtest.py", line 113 in runtest
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/main.py", line 289 in run_tests_sequential
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/main.py", line 331 in run_tests
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/main.py", line 362 in main
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/main.py", line 404 in main
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/libregrtest/main.py", line 426 in main_in_temp_cwd
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/main.py", line 3 in <module>
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/runpy.py", line 85 in _run_code
File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/runpy.py", line 170 in _run_module_as_main

@skrah skrah mannequin changed the title test_decimal sometimes crash on PPC64 AIX 3.x Intermittent segfaults on PPC64 AIX 3.x Oct 2, 2015
@DavidEdelsohn
Copy link
Mannequin

DavidEdelsohn mannequin commented Oct 2, 2015

As we have seen with similar issues on other targets, this likely is due to the random order of tests. In another case, the timezone was not being restored properly by GLIBC. Another test is leaving the process in a state that somehow evokes this failure from test_decimal.

@vstinner
Copy link
Member Author

vstinner commented Oct 2, 2015

I suggest to isolate tests using -j1: see my issue bpo-25285.

(Currently, -j1 doesn't use subprocesses.)

@skrah
Copy link
Mannequin

skrah mannequin commented Oct 2, 2015

If you have time, you could use an explicit seed (and gdb):

# test_email segfault:
./python -m test -j 1 -u all -W --randseed 5634141

@skrah
Copy link
Mannequin

skrah mannequin commented Oct 3, 2015

It's possible that Python needs to be built with special options to allow additional malloc space (-bmaxdata:0xN0000000).

It seems to be the case, see Misc/README.AIX. This could explain the
MemoryErrors, but not the segfaults.

Are computed-gotos stable on gcc-AIX? The README recommends disabling
them for xlc.

I'm also not sure how well Python supports threads on AIX. Often
these problems go away on unsupported platforms when configuring
--without-threads.

@DavidEdelsohn
Copy link
Mannequin

DavidEdelsohn mannequin commented Oct 3, 2015

Misc/README.AIX comments about XLC do not apply to GCC.

One can adjust the memory space at normal link time with -Wl,-bmaxdata:0xN0000000. This trades off heap for shared memory segments. One does not need the extra ldedit stop, which stuffs the same value into the application header.

@vstinner
Copy link
Member Author

The origin of the crash is unknown. Since I didn't see the crash recently, I close the issue.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

2 participants