Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_asyncio: test_subprocess_send_signal() hangs on buildbot "AMD64 Snow Leop 3.x" #63851

Closed
vstinner opened this issue Nov 19, 2013 · 9 comments

Comments

@vstinner
Copy link
Member

BPO 19652
Nosy @gvanrossum, @db3l, @ncoghlan, @pitrou, @vstinner, @larryhastings, @skrah, @koobs
Files
  • asyncio_subprocess_watchdog.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-02-06.14:51:21.191>
    created_at = <Date 2013-11-19.11:20:25.729>
    labels = []
    title = 'test_asyncio: test_subprocess_send_signal() hangs on buildbot "AMD64 Snow Leop 3.x"'
    updated_at = <Date 2014-02-06.14:51:21.190>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2014-02-06.14:51:21.190>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-02-06.14:51:21.191>
    closer = 'vstinner'
    components = []
    creation = <Date 2013-11-19.11:20:25.729>
    creator = 'vstinner'
    dependencies = []
    files = ['32750']
    hgrepos = []
    issue_num = 19652
    keywords = ['patch']
    message_count = 9.0
    messages = ['203358', '203392', '203393', '203404', '203455', '203458', '203602', '203636', '210394']
    nosy_count = 12.0
    nosy_names = ['gvanrossum', 'db3l', 'ncoghlan', 'pitrou', 'vstinner', 'larry', 'skrah', 'neologix', 'python-dev', 'sbt', 'koobs', 'David.Edelsohn']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue19652'
    versions = ['Python 3.4']

    @vstinner
    Copy link
    Member Author

    I didn't understand the purpose of the issue bpo-19334: it doesn't mention the OS nor the test. I prefer to create a more specific issue, so here you have one specific hang.

    http://buildbot.python.org/all/builders/AMD64%20Snow%20Leop%203.x/builds/517/steps/test/logs/stdio

    [385/385] test_asyncio
    Timeout (1:00:00)!
    Thread 0x00007fff71296cc0 (most recent call first):
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/selectors.py", line 291 in select
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/asyncio/base_events.py", line 614 in _run_once
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/asyncio/base_events.py", line 153 in run_forever
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/asyncio/base_events.py", line 172 in run_until_complete
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/test_asyncio/test_events.py", line 1155 in test_subprocess_send_signal
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/unittest/case.py", line 571 in run
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/unittest/case.py", line 610 in __call__
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/unittest/suite.py", line 117 in run
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/unittest/suite.py", line 79 in __call__
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/unittest/suite.py", line 117 in run
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/unittest/suite.py", line 79 in __call__
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/unittest/suite.py", line 117 in run
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/unittest/suite.py", line 79 in __call__
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/unittest/runner.py", line 168 in run
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/support/__init__.py", line 1685 in _run_suite
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/support/__init__.py", line 1719 in run_unittest
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/test_asyncio/__init__.py", line 31 in test_main
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/regrtest.py", line 1276 in runtest_inner
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/regrtest.py", line 965 in runtest
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/regrtest.py", line 532 in main
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/regrtest.py", line 1560 in main_in_temp_cwd
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/regrtest.py", line 1585 in <module>
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/runpy.py", line 73 in _run_code
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/runpy.py", line 160 in _run_module_as_main
    Traceback (most recent call last):
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/runpy.py", line 160, in _run_module_as_main
        "__main__", fname, loader, pkg_name)
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/runpy.py", line 73, in _run_code
        exec(code, run_globals)
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/__main__.py", line 3, in <module>
        regrtest.main_in_temp_cwd()
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/regrtest.py", line 1560, in main_in_temp_cwd
        main()
      File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/regrtest.py", line 736, in main
        raise Exception("Child error on {}: {}".format(test, result[1]))
    Exception: Child error on test_asyncio: Exit code 1
    make: *** [buildbottest] Error 1

    @gvanrossum
    Copy link
    Member

    The traceback is pretty useless. So is the title of the other bug report "hangs for 1 hour" (which just means it hangs forever but the test runner kills it after one hour).

    We would need to run the tests with -v so at least we can pinpoint which of the 600 or so tests in test_asyncio is failing.

    PS. Off-topic: "raise Exception(...)" is poor coding style:
    File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/regrtest.py", line 736, in main
    raise Exception("Child error on {}: {}".format(test, result[1]))

    @vstinner
    Copy link
    Member Author

    Ah, it looks like the test uses a child process. The following issue has a similar problem, we don't know that status of the child process:
    http://bugs.python.org/issue19564

    I proposed a patch in this issue for multiprocessing, to enable also faulthandler in the child process.

    We can probably do something similar in asyncio with child process (enable faulthandler in unit tests with a short timeout).

    @gvanrossum
    Copy link
    Member

    I cannot help you unless you tell me which specific test is failing. But once you have a proposed fix I will review it. Thanks for advocating for the minority platforms!

    @vstinner
    Copy link
    Member Author

    I cannot help you unless you tell me which specific test is failing.

    Oh, I didn't notice that test_subprocess_send_signal() is part of a mixin: SubprocessTestsMixin. Configure output:

    checking sys/devpoll.h usability... no
    checking sys/devpoll.h presence... no
    checking for sys/devpoll.h... no
    checking sys/epoll.h usability... no
    checking sys/epoll.h presence... no
    checking for sys/epoll.h... no
    checking poll.h usability... yes
    checking poll.h presence... yes
    checking for broken poll()... no
    checking for kqueue... yes

    I don't know which selector was used: select, poll or maybe kqueue?

    @gvanrossum
    Copy link
    Member

    Oh, sorry, I didn't realize the name of the failing test was in the issue title. But even that's no excuse, because it's also in the log. :-(

    Fortunately the line where we're hanging is also in the log: line 291 in selectors.py is in PollSelector. So that's settled. But we don't know if that's the first of the three tests that runs or not.

    What's also strange is that several other tests (test_subprocess_kill() and test_subprocess_terminate()) also send signals, so they should work exactly the same way. But again we don't know if any of them are being run before the signal test.

    Do you have any other traces from the same bot? Do they all fail in the same test? Is the test order randomized?

    @vstinner
    Copy link
    Member Author

    asyncio_subprocess_watchdog.patch:

    • add test_asyncio.watchdog.setup_watchdog(): use faulthandler.dump_traceback_later() to kill the process if it hangs more than 'timeout' seconds (do nothing if faulthandler is not present, eg. on other Python VMs than CPython)
    • add SubprocessTestsMixin.debug_info(): return returncode, stdout, stderr
    • use debug_info() on functions using subprocess may may hang

    The patch should give more information on the child process if it hangs. I didn't see the test_asyncio hang on "AMD64 Snow Leop 3.x" recently. The hang is probably random :-/

    I chose a default timeout of 5 minutes, it should be enough for the short tests of test_asyncio. You may set the timeout to 15 minutes if a buildbot is too slow. Or set the timeout to 1 minute if you prefer to not hang a buildbot too long.

    I tested asyncio_subprocess_watchdog.patch manually with a shorter timeout and by adding time.sleep(60) in echo.py, echo2.py and echo3.py.

    @gvanrossum
    Copy link
    Member

    Hmm... When did you last see the hang? We've had a variety of checkins that might have affected this code.

    If you've seen at least one traceback later than the commit below, feel free to commit. Otherwise, let's wait.

    changeset: 87088:eb42adc53923
    user: Guido van Rossum <guido@dropbox.com>
    date: Wed Nov 13 15:50:08 2013 -0800
    summary: asyncio: Fix from Anthony Baire for CPython bpo-19566 (replaces earlier fix).

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 6, 2014

    I didn't see this bug recently, so I close the issue.

    @vstinner vstinner closed this as completed Feb 6, 2014
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants