Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_builtin crashes when runned in parallel mode on solaris #84321

Closed
isidentical opened this issue Apr 1, 2020 · 23 comments
Closed

test_builtin crashes when runned in parallel mode on solaris #84321

isidentical opened this issue Apr 1, 2020 · 23 comments
Labels
3.9 only security fixes tests Tests in the Lib/test dir

Comments

@isidentical
Copy link
Sponsor Member

BPO 40140
Nosy @vstinner, @pablogsal, @isidentical
PRs
  • bpo-31160: Fix test_builtin.test_input_no_stdout_fileno() #19312
  • bpo-40140: test_builtin.PtyTests registers SIGHUP handler #19314
  • [3.8] bpo-40140: test_builtin.PtyTests registers SIGHUP handler (GH-19314) #19316
  • [3.7] bpo-40140: test_builtin.PtyTests registers SIGHUP handler (GH-19314) (GH-19316) #19318
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-04-03.00:16:08.307>
    created_at = <Date 2020-04-01.19:39:48.633>
    labels = ['tests', '3.9']
    title = 'test_builtin crashes when runned in parallel mode on solaris'
    updated_at = <Date 2020-04-03.12:09:59.677>
    user = 'https://github.com/isidentical'

    bugs.python.org fields:

    activity = <Date 2020-04-03.12:09:59.677>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-04-03.00:16:08.307>
    closer = 'vstinner'
    components = ['Tests']
    creation = <Date 2020-04-01.19:39:48.633>
    creator = 'BTaskaya'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 40140
    keywords = ['patch']
    message_count = 23.0
    messages = ['365505', '365511', '365513', '365514', '365516', '365587', '365589', '365590', '365592', '365593', '365621', '365624', '365629', '365633', '365634', '365637', '365641', '365643', '365648', '365650', '365667', '365692', '365693']
    nosy_count = 3.0
    nosy_names = ['vstinner', 'pablogsal', 'BTaskaya']
    pr_nums = ['19312', '19314', '19316', '19318']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue40140'
    versions = ['Python 3.9']

    @isidentical
    Copy link
    Sponsor Member Author

    test_builting works on serial run

    0:00:00 load avg: 2.38 Run tests sequentially
    0:00:00 load avg: 2.38 [1/1] test_builtin

    == Tests result: SUCCESS ==

    1 test OK.

    Total duration: 1.3 sec
    Tests result: SUCCESS

    but with more then one processes, it crashes

    0:00:00 load avg: 1.71 Run tests in parallel using 2 child processes
    0:00:01 load avg: 1.70 [1/1/1] test_builtin crashed (Exit code -1)
    test_abs (test.test_builtin.BuiltinTest) ... ok
    test_all (test.test_builtin.BuiltinTest) ... ok
    test_any (test.test_builtin.BuiltinTest) ... ok
    test_ascii (test.test_builtin.BuiltinTest) ... ok
    test_bin (test.test_builtin.BuiltinTest) ... ok
    test_bug_27936 (test.test_builtin.BuiltinTest) ... ok
    test_bytearray_extend_error (test.test_builtin.BuiltinTest) ... ok
    test_bytearray_translate (test.test_builtin.BuiltinTest) ... ok
    test_callable (test.test_builtin.BuiltinTest) ... ok
    test_chr (test.test_builtin.BuiltinTest) ... ok
    test_cmp (test.test_builtin.BuiltinTest) ... ok
    test_compile (test.test_builtin.BuiltinTest) ... ok
    test_compile_async_generator (test.test_builtin.BuiltinTest)
    With the PyCF_ALLOW_TOP_LEVEL_AWAIT flag added in 3.8, we want to ... ok
    test_compile_top_level_await (test.test_builtin.BuiltinTest)
    Test whether code some top level await can be compiled. ... ok
    test_compile_top_level_await_invalid_cases (test.test_builtin.BuiltinTest) ... ok
    test_construct_singletons (test.test_builtin.BuiltinTest) ... ok
    test_delattr (test.test_builtin.BuiltinTest) ... ok
    test_dir (test.test_builtin.BuiltinTest) ... ok
    test_divmod (test.test_builtin.BuiltinTest) ... ok
    test_eval (test.test_builtin.BuiltinTest) ... ok
    test_exec (test.test_builtin.BuiltinTest) ... ok
    test_exec_globals (test.test_builtin.BuiltinTest) ... ok
    test_exec_redirected (test.test_builtin.BuiltinTest) ... ok
    test_filter (test.test_builtin.BuiltinTest) ... ok
    test_filter_pickle (test.test_builtin.BuiltinTest) ... ok
    test_format (test.test_builtin.BuiltinTest) ... ok
    test_general_eval (test.test_builtin.BuiltinTest) ... ok
    test_getattr (test.test_builtin.BuiltinTest) ... ok
    test_hasattr (test.test_builtin.BuiltinTest) ... ok
    test_hash (test.test_builtin.BuiltinTest) ... ok
    test_hex (test.test_builtin.BuiltinTest) ... ok
    test_id (test.test_builtin.BuiltinTest) ... ok
    test_import (test.test_builtin.BuiltinTest) ... ok
    test_input (test.test_builtin.BuiltinTest) ... ok
    test_isinstance (test.test_builtin.BuiltinTest) ... ok
    test_issubclass (test.test_builtin.BuiltinTest) ... ok
    test_iter (test.test_builtin.BuiltinTest) ... ok
    test_len (test.test_builtin.BuiltinTest) ... ok
    test_map (test.test_builtin.BuiltinTest) ... ok
    test_map_pickle (test.test_builtin.BuiltinTest) ... ok
    test_max (test.test_builtin.BuiltinTest) ... ok
    test_min (test.test_builtin.BuiltinTest) ... ok
    test_neg (test.test_builtin.BuiltinTest) ... ok
    test_next (test.test_builtin.BuiltinTest) ... ok
    test_oct (test.test_builtin.BuiltinTest) ... ok
    test_open (test.test_builtin.BuiltinTest) ... ok
    test_open_default_encoding (test.test_builtin.BuiltinTest) ... ok
    test_open_non_inheritable (test.test_builtin.BuiltinTest) ... ok
    test_ord (test.test_builtin.BuiltinTest) ... ok
    test_pow (test.test_builtin.BuiltinTest) ... ok
    test_repr (test.test_builtin.BuiltinTest) ... ok
    test_round (test.test_builtin.BuiltinTest) ... ok
    test_round_large (test.test_builtin.BuiltinTest) ... ok
    test_setattr (test.test_builtin.BuiltinTest) ... ok
    test_sum (test.test_builtin.BuiltinTest) ... ok
    test_type (test.test_builtin.BuiltinTest) ... ok
    test_vars (test.test_builtin.BuiltinTest) ... ok
    test_warning_notimplemented (test.test_builtin.BuiltinTest) ... ok
    test_zip (test.test_builtin.BuiltinTest) ... ok
    test_zip_bad_iterable (test.test_builtin.BuiltinTest) ... ok
    test_zip_pickle (test.test_builtin.BuiltinTest) ... ok
    test_input_no_stdout_fileno (test.test_builtin.PtyTests) ...

    == Tests result: FAILURE ==

    1 test failed:
    test_builtin

    Total duration: 1.4 sec
    Tests result: FAILURE

    System: SunOS gcc-solaris11 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise
    Tested under both gcc (5.5.0) and solaris studio (12)

    @isidentical isidentical added 3.9 only security fixes tests Tests in the Lib/test dir labels Apr 1, 2020
    @pablogsal
    Copy link
    Member

    Try making bigger the stack size (with ulimit -s ... or similar)

    @vstinner
    Copy link
    Member

    vstinner commented Apr 1, 2020

    I modified recently the test:

    (1) commit 278c1e1

    •    os.waitpid(pid, 0)
      

    + support.wait_process(pid, exitcode=0)

    (2) commit 16d7567

    Close the fd *after* calling support.wait_process() to prevent sending SIGHUP to the child process, which made support.wait_process(pid, exitcode=0) to fail since exitcode=-1 (-SIGHUP) != 0.

    --

    test_builtin.test_input_no_stdout_fileno() also hangs on AIX:
    https://bugs.python.org/issue31160#msg365478

    @pablogsal
    Copy link
    Member

    I am understanding "crashing" as "segfaulting"

    @vstinner
    Copy link
    Member

    vstinner commented Apr 1, 2020

    0:00:01 load avg: 1.70 [1/1/1] test_builtin crashed (Exit code -1)

    Exit code -1 looks like a process killed by SIGHUP. Which commit did you try?

    Can you please check that you tested with my commit 16d7567?

    @vstinner
    Copy link
    Member

    vstinner commented Apr 2, 2020

    See also bpo-40155: "AIX: test_builtin.test_input_no_stdout_fileno() hangs".

    @isidentical
    Copy link
    Sponsor Member Author

    The ulimit results with infinity and this happens on the current master.

    @isidentical
    Copy link
    Sponsor Member Author

    I am understanding "crashing" as "segfaulting"

    "crashing" as in the test result but not segfaulting
    0:00:00 load avg: 1.71 Run tests in parallel using 2 child processes
    0:00:01 load avg: 1.70 [1/1/1] test_builtin crashed (Exit code -1)

    @vstinner
    Copy link
    Member

    vstinner commented Apr 2, 2020

    "crashing" as in the test result but not segfaulting
    0:00:01 load avg: 1.70 [1/1/1] test_builtin crashed (Exit code -1)

    What is the signal 1 on Solaris? On Linux, it's SIGHUP, not SIGSEGV:

    $ python3
    Python 3.7.6 (default, Jan 30 2020, 09:44:41) 
    >>> import signal
    >>> signal.SIGSEGV
    <Signals.SIGSEGV: 11>
    >>> signal.SIGHUP
    <Signals.SIGHUP: 1>

    @isidentical
    Copy link
    Sponsor Member Author

    isidentical@gcc-solaris11:~$ cpython/python
    Python 3.9.0a5+ (heads/master:98ff332, Apr  2 2020, 01:20:22) 
    [GCC 5.5.0] on sunos5
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import signal
    >>> signal.SIGSEGV
    <Signals.SIGSEGV: 11>
    >>> signal.SIGHUP
    <Signals.SIGHUP: 1>

    @vstinner
    Copy link
    Member

    vstinner commented Apr 2, 2020

    Batuhan: Can you please test if PR 19312 fix the issue for you on Solaris?

    @isidentical
    Copy link
    Sponsor Member Author

    I tested with both PR 19312 and PR 19308 and I still have the same crash
    0:00:00 load avg: 0.80 Run tests in parallel using 2 child processes
    0:00:01 load avg: 0.79 [1/1/1] test_builtin crashed (Exit code -1)

    @vstinner
    Copy link
    Member

    vstinner commented Apr 2, 2020

    I tested with both PR 19312 and PR 19308 and I still have the same crash

    Which test is causing the issue? Does it still crash if you comment test_input_no_stdout_fileno()? Try to rename it "Xtest_input_no_stdout_fileno" to skip it.

    What if you only run this test?

    ./python -m test test_builtin -m test_input_no_stdout_fileno -F -j10 -v

    Maybe this test should register a signal handler for SIGHUP?

    This bug looks like bpo-38547 which affected test_pty. I fixed it by registering a SIGHUP signal handler:

    commit a1838ec
    Author: Victor Stinner <vstinner@python.org>
    Date: Mon Dec 9 11:57:05 2019 +0100

    bpo-38547: Fix test_pty if the process is the session leader (GH-17519)
    
    Fix test_pty: if the process is the session leader, closing the
    master file descriptor raises a SIGHUP signal: simply ignore SIGHUP
    when running the tests.
    

    @isidentical
    Copy link
    Sponsor Member Author

    isidentical@gcc-solaris11:~/cpython$ ./python -m test test_builtin -m test_input_no_stdout_fileno -F -j10 -v
    == CPython 3.9.0a5+ (heads/master:dc4e965, Apr 2 2020, 23:53:26) [GCC 5.5.0]
    == Solaris-2.11-sun4u-sparc-32bit big-endian
    == cwd: /export/home/isidentical/cpython/build/test_python_24804
    == CPU count: 8
    == encodings: locale=UTF-8, FS=utf-8
    0:00:00 load avg: 1.56 Run tests in parallel using 10 child processes
    0:00:02 load avg: 1.57 [ 1/1] test_builtin crashed (Exit code -1)
    test_input_no_stdout_fileno (test.test_builtin.PtyTests) ...
    Kill <TestWorkerProcess #2 running test=test_builtin pid=24812 time=2.1 sec> process group
    Kill <TestWorkerProcess #3 running test=test_builtin pid=24810 time=2.1 sec> process group
    Kill <TestWorkerProcess #4 running test=test_builtin pid=24815 time=2.1 sec> process group
    Kill <TestWorkerProcess #5 running test=test_builtin pid=24807 time=2.1 sec> process group
    Kill <TestWorkerProcess #6 running test=test_builtin pid=24809 time=2.1 sec> process group
    Kill <TestWorkerProcess #7 running test=test_builtin pid=24814 time=2.1 sec> process group
    Kill <TestWorkerProcess #8 running test=test_builtin pid=24808 time=2.1 sec> process group
    Kill <TestWorkerProcess #9 running test=test_builtin pid=24813 time=2.1 sec> process group
    Kill <TestWorkerProcess #10 running test=test_builtin pid=24816 time=2.1 sec> process group

    == Tests result: FAILURE ==

    1 test failed:
    test_builtin

    Total duration: 2.2 sec
    Tests result: FAILURE
    isidentical@gcc-solaris11:~/cpython$ wget https://patch-diff.githubusercontent.com/raw/python/cpython/pull/19312.patch
    --2020-04-02 23:53:51-- https://patch-diff.githubusercontent.com/raw/python/cpython/pull/19312.patch
    Resolving patch-diff.githubusercontent.com (patch-diff.githubusercontent.com)... 140.82.118.4
    Connecting to patch-diff.githubusercontent.com (patch-diff.githubusercontent.com)|140.82.118.4|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Cookie coming from patch-diff.githubusercontent.com attempted to set domain to github.com
    Length: unspecified [text/plain]
    Saving to: ‘19312.patch’

    19312.patch [ <=> ] 1.62K --.-KB/s in 0s

    2020-04-02 23:53:51 (3.38 MB/s) - ‘19312.patch’ saved [4252]
    isidentical@gcc-solaris11:/cpython$ git apply 19312.patch
    isidentical@gcc-solaris11:
    /cpython$ gmake -j8
    ...
    isidentical@gcc-solaris11:~/cpython$ ./python -m test test_builtin -m test_input_no_stdout_fileno -F -j10 -v
    == CPython 3.9.0a5+ (heads/master:dc4e965, Apr 2 2020, 23:53:26) [GCC 5.5.0]
    == Solaris-2.11-sun4u-sparc-32bit big-endian
    == cwd: /export/home/isidentical/cpython/build/test_python_24850
    == CPU count: 8
    == encodings: locale=UTF-8, FS=utf-8
    0:00:00 load avg: 1.71 Run tests in parallel using 10 child processes
    0:00:02 load avg: 1.78 [ 1/1] test_builtin crashed (Exit code -1)
    test_input_no_stdout_fileno (test.test_builtin.PtyTests) ...
    Kill <TestWorkerProcess #1 running test=test_builtin pid=24855 time=2.6 sec> process group
    Kill <TestWorkerProcess #3 running test=test_builtin pid=24862 time=2.6 sec> process group
    Kill <TestWorkerProcess #4 running test=test_builtin pid=24856 time=2.6 sec> process group
    Kill <TestWorkerProcess #5 running test=test_builtin pid=24853 time=2.6 sec> process group
    Kill <TestWorkerProcess #6 running test=test_builtin pid=24854 time=2.6 sec> process group
    Kill <TestWorkerProcess #7 running test=test_builtin pid=24861 time=2.5 sec> process group
    Kill <TestWorkerProcess #8 running test=test_builtin pid=24860 time=2.5 sec> process group
    Kill <TestWorkerProcess #9 running test=test_builtin pid=24859 time=2.5 sec> process group
    Kill <TestWorkerProcess #10 running test=test_builtin pid=24858 time=2.5 sec> process group

    == Tests result: FAILURE ==

    1 test failed:
    test_builtin

    Total duration: 2.7 sec
    Tests result: FAILURE

    @vstinner
    Copy link
    Member

    vstinner commented Apr 2, 2020

    Batuhan: Ok, now please test PR 19314 which registers a signal handler for SIGHUP. It should fix the issue for Solaris. Moreover, I also includes the fix for AIX (bpo-40155).

    @isidentical
    Copy link
    Sponsor Member Author

    Victor, PR 19314 works perfectly.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 2, 2020

    New changeset 7a51a7e by Victor Stinner in branch 'master':
    bpo-40140: test_builtin.PtyTests registers SIGHUP handler (GH-19314)
    7a51a7e

    @vstinner
    Copy link
    Member

    vstinner commented Apr 2, 2020

    Victor, PR 19314 works perfectly.

    Thanks for testing Batuhan ;-)

    By the way, Solaris is no longer officially supported by Python:
    https://pythondev.readthedocs.io/platforms.html#best-effort-and-unofficial-platforms

    There is no more Solaris buildbot. Solaris 11.4 will be likely the last release: Oracle no longer supports Solaris.

    We may accept minor changes, but no invasive changes.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 3, 2020

    New changeset 745bd91 by Victor Stinner in branch '3.8':
    bpo-40140: test_builtin.PtyTests registers SIGHUP handler (GH-19314) (GH-19316)
    745bd91

    @vstinner
    Copy link
    Member

    vstinner commented Apr 3, 2020

    I close the issue, it's now fixed in 3.8 and master (and I'm working on a 3.7 backport: PR 19318). Thanks Batuhan for the bug report.

    @isidentical
    Copy link
    Sponsor Member Author

    There is no more Solaris buildbot. Solaris 11.4 will be likely the last release: Oracle no longer supports Solaris.

    Well, if needed I can create one but looks like it is going be an obsoleted OS soon :/

    @vstinner
    Copy link
    Member

    vstinner commented Apr 3, 2020

    New changeset 0961dbd by Victor Stinner in branch '3.7':
    bpo-40140: test_builtin.PtyTests registers SIGHUP handler (GH-19314) (GH-19316) (GH-19318)
    0961dbd

    @vstinner
    Copy link
    Member

    vstinner commented Apr 3, 2020

    Well, if needed I can create one but looks like it is going be an obsoleted OS soon :/

    I dislike the idea of making Python codes to support Solaris, since Solaris vendor doesn't support it anymore... Moreover, it's closed source, and most core devs don't have access to Solaris (ex: me).

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes tests Tests in the Lib/test dir
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants