classification
Title: test_builtin crashes when runned in parallel mode on solaris
Type: Stage: resolved
Components: Tests Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: BTaskaya, pablogsal, vstinner
Priority: normal Keywords: patch

Created on 2020-04-01 19:39 by BTaskaya, last changed 2020-04-03 12:09 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 19312 closed vstinner, 2020-04-02 20:08
PR 19314 merged vstinner, 2020-04-02 21:52
PR 19316 merged vstinner, 2020-04-02 22:44
PR 19318 merged vstinner, 2020-04-03 00:15
Messages (23)
msg365505 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2020-04-01 19:39
test_builting works on serial run

0:00:00 load avg: 2.38 Run tests sequentially
0:00:00 load avg: 2.38 [1/1] test_builtin

== Tests result: SUCCESS ==

1 test OK.

Total duration: 1.3 sec
Tests result: SUCCESS

but with more then one processes, it crashes 

0:00:00 load avg: 1.71 Run tests in parallel using 2 child processes
0:00:01 load avg: 1.70 [1/1/1] test_builtin crashed (Exit code -1)
test_abs (test.test_builtin.BuiltinTest) ... ok
test_all (test.test_builtin.BuiltinTest) ... ok
test_any (test.test_builtin.BuiltinTest) ... ok
test_ascii (test.test_builtin.BuiltinTest) ... ok
test_bin (test.test_builtin.BuiltinTest) ... ok
test_bug_27936 (test.test_builtin.BuiltinTest) ... ok
test_bytearray_extend_error (test.test_builtin.BuiltinTest) ... ok
test_bytearray_translate (test.test_builtin.BuiltinTest) ... ok
test_callable (test.test_builtin.BuiltinTest) ... ok
test_chr (test.test_builtin.BuiltinTest) ... ok
test_cmp (test.test_builtin.BuiltinTest) ... ok
test_compile (test.test_builtin.BuiltinTest) ... ok
test_compile_async_generator (test.test_builtin.BuiltinTest)
With the PyCF_ALLOW_TOP_LEVEL_AWAIT flag added in 3.8, we want to ... ok
test_compile_top_level_await (test.test_builtin.BuiltinTest)
Test whether code some top level await can be compiled. ... ok
test_compile_top_level_await_invalid_cases (test.test_builtin.BuiltinTest) ... ok
test_construct_singletons (test.test_builtin.BuiltinTest) ... ok
test_delattr (test.test_builtin.BuiltinTest) ... ok
test_dir (test.test_builtin.BuiltinTest) ... ok
test_divmod (test.test_builtin.BuiltinTest) ... ok
test_eval (test.test_builtin.BuiltinTest) ... ok
test_exec (test.test_builtin.BuiltinTest) ... ok
test_exec_globals (test.test_builtin.BuiltinTest) ... ok
test_exec_redirected (test.test_builtin.BuiltinTest) ... ok
test_filter (test.test_builtin.BuiltinTest) ... ok
test_filter_pickle (test.test_builtin.BuiltinTest) ... ok
test_format (test.test_builtin.BuiltinTest) ... ok
test_general_eval (test.test_builtin.BuiltinTest) ... ok
test_getattr (test.test_builtin.BuiltinTest) ... ok
test_hasattr (test.test_builtin.BuiltinTest) ... ok
test_hash (test.test_builtin.BuiltinTest) ... ok
test_hex (test.test_builtin.BuiltinTest) ... ok
test_id (test.test_builtin.BuiltinTest) ... ok
test_import (test.test_builtin.BuiltinTest) ... ok
test_input (test.test_builtin.BuiltinTest) ... ok
test_isinstance (test.test_builtin.BuiltinTest) ... ok
test_issubclass (test.test_builtin.BuiltinTest) ... ok
test_iter (test.test_builtin.BuiltinTest) ... ok
test_len (test.test_builtin.BuiltinTest) ... ok
test_map (test.test_builtin.BuiltinTest) ... ok
test_map_pickle (test.test_builtin.BuiltinTest) ... ok
test_max (test.test_builtin.BuiltinTest) ... ok
test_min (test.test_builtin.BuiltinTest) ... ok
test_neg (test.test_builtin.BuiltinTest) ... ok
test_next (test.test_builtin.BuiltinTest) ... ok
test_oct (test.test_builtin.BuiltinTest) ... ok
test_open (test.test_builtin.BuiltinTest) ... ok
test_open_default_encoding (test.test_builtin.BuiltinTest) ... ok
test_open_non_inheritable (test.test_builtin.BuiltinTest) ... ok
test_ord (test.test_builtin.BuiltinTest) ... ok
test_pow (test.test_builtin.BuiltinTest) ... ok
test_repr (test.test_builtin.BuiltinTest) ... ok
test_round (test.test_builtin.BuiltinTest) ... ok
test_round_large (test.test_builtin.BuiltinTest) ... ok
test_setattr (test.test_builtin.BuiltinTest) ... ok
test_sum (test.test_builtin.BuiltinTest) ... ok
test_type (test.test_builtin.BuiltinTest) ... ok
test_vars (test.test_builtin.BuiltinTest) ... ok
test_warning_notimplemented (test.test_builtin.BuiltinTest) ... ok
test_zip (test.test_builtin.BuiltinTest) ... ok
test_zip_bad_iterable (test.test_builtin.BuiltinTest) ... ok
test_zip_pickle (test.test_builtin.BuiltinTest) ... ok
test_input_no_stdout_fileno (test.test_builtin.PtyTests) ...

== Tests result: FAILURE ==

1 test failed:
    test_builtin

Total duration: 1.4 sec
Tests result: FAILURE

System: SunOS gcc-solaris11 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise
Tested under both gcc (5.5.0) and solaris studio (12)
msg365511 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-04-01 21:41
Try making bigger the stack size (with ulimit -s ... or similar)
msg365513 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-01 21:48
I modified recently the test:

(1) commit 278c1e159c970da6cd6683d18c6211f5118674cc

-        os.waitpid(pid, 0)
+        support.wait_process(pid, exitcode=0)

(2) commit 16d75675d2ad2454f6dfbf333c94e6237df36018

Close the fd *after* calling support.wait_process() to prevent sending SIGHUP to the child process, which made support.wait_process(pid, exitcode=0) to fail since exitcode=-1 (-SIGHUP) != 0.

--

test_builtin.test_input_no_stdout_fileno() also hangs on AIX:
https://bugs.python.org/issue31160#msg365478
msg365514 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-04-01 21:50
I am understanding "crashing" as "segfaulting"
msg365516 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-01 21:59
> 0:00:01 load avg: 1.70 [1/1/1] test_builtin crashed (Exit code -1)

Exit code -1 looks like a process killed by SIGHUP. Which commit did you try?

Can you please check that you tested with my commit 16d75675d2ad2454f6dfbf333c94e6237df36018?
msg365587 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-02 12:46
See also bpo-40155: "AIX: test_builtin.test_input_no_stdout_fileno() hangs".
msg365589 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2020-04-02 13:01
The ulimit results with infinity and this happens on the current master.
msg365590 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2020-04-02 13:02
> I am understanding "crashing" as "segfaulting"

"crashing" as in the test result but not segfaulting
0:00:00 load avg: 1.71 Run tests in parallel using 2 child processes
0:00:01 load avg: 1.70 [1/1/1] test_builtin crashed (Exit code -1)
msg365592 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-02 13:13
> "crashing" as in the test result but not segfaulting
> 0:00:01 load avg: 1.70 [1/1/1] test_builtin crashed (Exit code -1)

What is the signal 1 on Solaris? On Linux, it's SIGHUP, not SIGSEGV:

$ python3
Python 3.7.6 (default, Jan 30 2020, 09:44:41) 
>>> import signal
>>> signal.SIGSEGV
<Signals.SIGSEGV: 11>
>>> signal.SIGHUP
<Signals.SIGHUP: 1>
msg365593 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2020-04-02 13:34
isidentical@gcc-solaris11:~$ cpython/python
Python 3.9.0a5+ (heads/master:98ff332, Apr  2 2020, 01:20:22) 
[GCC 5.5.0] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import signal
>>> signal.SIGSEGV
<Signals.SIGSEGV: 11>
>>> signal.SIGHUP
<Signals.SIGHUP: 1>
msg365621 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-02 20:12
Batuhan: Can you please test if PR 19312 fix the issue for you on Solaris?
msg365624 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2020-04-02 20:34
I tested with both PR 19312 and PR 19308 and I still have the same crash 
0:00:00 load avg: 0.80 Run tests in parallel using 2 child processes
0:00:01 load avg: 0.79 [1/1/1] test_builtin crashed (Exit code -1)
msg365629 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-02 21:44
> I tested with both PR 19312 and PR 19308 and I still have the same crash 

Which test is causing the issue? Does it still crash if you comment test_input_no_stdout_fileno()? Try to rename it "Xtest_input_no_stdout_fileno" to skip it.

What if you only run this test?

./python -m test test_builtin -m test_input_no_stdout_fileno -F -j10 -v

Maybe this test should register a signal handler for SIGHUP?

This bug looks like bpo-38547 which affected test_pty. I fixed it by registering a SIGHUP signal handler:

commit a1838ec2592e5082c75c77888f2a7a3eb21133e5
Author: Victor Stinner <vstinner@python.org>
Date:   Mon Dec 9 11:57:05 2019 +0100

    bpo-38547: Fix test_pty if the process is the session leader (GH-17519)
    
    Fix test_pty: if the process is the session leader, closing the
    master file descriptor raises a SIGHUP signal: simply ignore SIGHUP
    when running the tests.
msg365633 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2020-04-02 21:55
isidentical@gcc-solaris11:~/cpython$ ./python -m test test_builtin -m test_input_no_stdout_fileno -F -j10 -v
== CPython 3.9.0a5+ (heads/master:dc4e965, Apr 2 2020, 23:53:26) [GCC 5.5.0]
== Solaris-2.11-sun4u-sparc-32bit big-endian
== cwd: /export/home/isidentical/cpython/build/test_python_24804
== CPU count: 8
== encodings: locale=UTF-8, FS=utf-8
0:00:00 load avg: 1.56 Run tests in parallel using 10 child processes
0:00:02 load avg: 1.57 [  1/1] test_builtin crashed (Exit code -1)
test_input_no_stdout_fileno (test.test_builtin.PtyTests) ...
Kill <TestWorkerProcess #2 running test=test_builtin pid=24812 time=2.1 sec> process group
Kill <TestWorkerProcess #3 running test=test_builtin pid=24810 time=2.1 sec> process group
Kill <TestWorkerProcess #4 running test=test_builtin pid=24815 time=2.1 sec> process group
Kill <TestWorkerProcess #5 running test=test_builtin pid=24807 time=2.1 sec> process group
Kill <TestWorkerProcess #6 running test=test_builtin pid=24809 time=2.1 sec> process group
Kill <TestWorkerProcess #7 running test=test_builtin pid=24814 time=2.1 sec> process group
Kill <TestWorkerProcess #8 running test=test_builtin pid=24808 time=2.1 sec> process group
Kill <TestWorkerProcess #9 running test=test_builtin pid=24813 time=2.1 sec> process group
Kill <TestWorkerProcess #10 running test=test_builtin pid=24816 time=2.1 sec> process group

== Tests result: FAILURE ==

1 test failed:
    test_builtin

Total duration: 2.2 sec
Tests result: FAILURE
isidentical@gcc-solaris11:~/cpython$ wget https://patch-diff.githubusercontent.com/raw/python/cpython/pull/19312.patch
--2020-04-02 23:53:51--  https://patch-diff.githubusercontent.com/raw/python/cpython/pull/19312.patch
Resolving patch-diff.githubusercontent.com (patch-diff.githubusercontent.com)... 140.82.118.4
Connecting to patch-diff.githubusercontent.com (patch-diff.githubusercontent.com)|140.82.118.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Cookie coming from patch-diff.githubusercontent.com attempted to set domain to github.com
Length: unspecified [text/plain]
Saving to: ‘19312.patch’

19312.patch                                 [ <=>                                                                            ]   1.62K  --.-KB/s    in 0s      

2020-04-02 23:53:51 (3.38 MB/s) - ‘19312.patch’ saved [4252]
isidentical@gcc-solaris11:~/cpython$ git apply 19312.patch
isidentical@gcc-solaris11:~/cpython$ gmake -j8
...
isidentical@gcc-solaris11:~/cpython$ ./python -m test test_builtin -m test_input_no_stdout_fileno -F -j10 -v          
== CPython 3.9.0a5+ (heads/master:dc4e965, Apr 2 2020, 23:53:26) [GCC 5.5.0]
== Solaris-2.11-sun4u-sparc-32bit big-endian
== cwd: /export/home/isidentical/cpython/build/test_python_24850
== CPU count: 8
== encodings: locale=UTF-8, FS=utf-8
0:00:00 load avg: 1.71 Run tests in parallel using 10 child processes
0:00:02 load avg: 1.78 [  1/1] test_builtin crashed (Exit code -1)
test_input_no_stdout_fileno (test.test_builtin.PtyTests) ...
Kill <TestWorkerProcess #1 running test=test_builtin pid=24855 time=2.6 sec> process group
Kill <TestWorkerProcess #3 running test=test_builtin pid=24862 time=2.6 sec> process group
Kill <TestWorkerProcess #4 running test=test_builtin pid=24856 time=2.6 sec> process group
Kill <TestWorkerProcess #5 running test=test_builtin pid=24853 time=2.6 sec> process group
Kill <TestWorkerProcess #6 running test=test_builtin pid=24854 time=2.6 sec> process group
Kill <TestWorkerProcess #7 running test=test_builtin pid=24861 time=2.5 sec> process group
Kill <TestWorkerProcess #8 running test=test_builtin pid=24860 time=2.5 sec> process group
Kill <TestWorkerProcess #9 running test=test_builtin pid=24859 time=2.5 sec> process group
Kill <TestWorkerProcess #10 running test=test_builtin pid=24858 time=2.5 sec> process group

== Tests result: FAILURE ==

1 test failed:
    test_builtin

Total duration: 2.7 sec
Tests result: FAILURE
msg365634 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-02 21:57
Batuhan: Ok, now please test PR 19314 which registers a signal handler for SIGHUP. It should fix the issue for Solaris. Moreover, I also includes the fix for AIX (bpo-40155).
msg365637 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2020-04-02 22:14
Victor, PR 19314 works perfectly.
msg365641 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-02 22:40
New changeset 7a51a7e19f0143f75f8fc9ff68f93ed40937aec6 by Victor Stinner in branch 'master':
bpo-40140: test_builtin.PtyTests registers SIGHUP handler (GH-19314)
https://github.com/python/cpython/commit/7a51a7e19f0143f75f8fc9ff68f93ed40937aec6
msg365643 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-02 22:49
> Victor, PR 19314 works perfectly.

Thanks for testing Batuhan ;-)

By the way, Solaris is no longer officially supported by Python:
https://pythondev.readthedocs.io/platforms.html#best-effort-and-unofficial-platforms

There is no more Solaris buildbot. Solaris 11.4 will be likely the last release: Oracle no longer supports Solaris.

We may accept minor changes, but no invasive changes.
msg365648 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-03 00:11
New changeset 745bd91bab8e57c52d63a2d541465551d7551f78 by Victor Stinner in branch '3.8':
bpo-40140: test_builtin.PtyTests registers SIGHUP handler (GH-19314) (GH-19316)
https://github.com/python/cpython/commit/745bd91bab8e57c52d63a2d541465551d7551f78
msg365650 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-03 00:16
I close the issue, it's now fixed in 3.8 and master (and I'm working on a 3.7 backport: PR 19318). Thanks Batuhan for the bug report.
msg365667 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2020-04-03 01:28
> There is no more Solaris buildbot. Solaris 11.4 will be likely the last release: Oracle no longer supports Solaris.

Well, if needed I can create one but looks like it is going be an obsoleted OS soon :/
msg365692 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-03 12:09
New changeset 0961dbdea2a449fc5b7d77610d6d10e6036fbdf3 by Victor Stinner in branch '3.7':
bpo-40140: test_builtin.PtyTests registers SIGHUP handler (GH-19314) (GH-19316) (GH-19318)
https://github.com/python/cpython/commit/0961dbdea2a449fc5b7d77610d6d10e6036fbdf3
msg365693 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-03 12:09
> Well, if needed I can create one but looks like it is going be an obsoleted OS soon :/

I dislike the idea of making Python codes to support Solaris, since Solaris vendor doesn't support it anymore... Moreover, it's closed source, and most core devs don't have access to Solaris (ex: me).
History
Date User Action Args
2020-04-03 12:09:59vstinnersetmessages: + msg365693
2020-04-03 12:09:14vstinnersetmessages: + msg365692
2020-04-03 01:28:39BTaskayasetmessages: + msg365667
2020-04-03 00:16:08vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg365650

stage: patch review -> resolved
2020-04-03 00:15:04vstinnersetpull_requests: + pull_request18683
2020-04-03 00:11:58vstinnersetmessages: + msg365648
2020-04-02 22:49:28vstinnersetmessages: + msg365643
2020-04-02 22:44:16vstinnersetpull_requests: + pull_request18681
2020-04-02 22:40:32vstinnersetmessages: + msg365641
2020-04-02 22:14:10BTaskayasetmessages: + msg365637
2020-04-02 21:57:58vstinnersetmessages: + msg365634
2020-04-02 21:55:58BTaskayasetmessages: + msg365633
2020-04-02 21:52:17vstinnersetpull_requests: + pull_request18679
2020-04-02 21:44:24vstinnersetmessages: + msg365629
2020-04-02 20:34:22BTaskayasetmessages: + msg365624
2020-04-02 20:12:05vstinnersetmessages: + msg365621
2020-04-02 20:08:03vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request18676
2020-04-02 13:34:31BTaskayasetmessages: + msg365593
2020-04-02 13:13:54vstinnersetmessages: + msg365592
2020-04-02 13:02:35BTaskayasetmessages: + msg365590
2020-04-02 13:01:52BTaskayasetmessages: + msg365589
2020-04-02 12:46:58vstinnersetmessages: + msg365587
2020-04-01 21:59:33vstinnersetmessages: + msg365516
2020-04-01 21:50:39pablogsalsetmessages: + msg365514
2020-04-01 21:48:12vstinnersetnosy: + vstinner
messages: + msg365513
2020-04-01 21:41:47pablogsalsetnosy: + pablogsal
messages: + msg365511
2020-04-01 19:39:48BTaskayacreate