classification
Title: test_asyncio: test_subprocess_send_signal() hangs on buildbot "AMD64 Snow Leop 3.x"
Type: Stage:
Components: Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: David.Edelsohn, db3l, gvanrossum, koobs, larry, ncoghlan, neologix, pitrou, python-dev, sbt, skrah, vstinner
Priority: normal Keywords: patch

Created on 2013-11-19 11:20 by vstinner, last changed 2014-02-06 14:51 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
asyncio_subprocess_watchdog.patch vstinner, 2013-11-21 10:17 review
Messages (9)
msg203358 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-19 11:20
I didn't understand the purpose of the issue #19334: it doesn't mention the OS nor the test. I prefer to create a more specific issue, so here you have one specific hang.

http://buildbot.python.org/all/builders/AMD64%20Snow%20Leop%203.x/builds/517/steps/test/logs/stdio

[385/385] test_asyncio
Timeout (1:00:00)!
Thread 0x00007fff71296cc0 (most recent call first):
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/selectors.py", line 291 in select
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/asyncio/base_events.py", line 614 in _run_once
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/asyncio/base_events.py", line 153 in run_forever
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/asyncio/base_events.py", line 172 in run_until_complete
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/test_asyncio/test_events.py", line 1155 in test_subprocess_send_signal
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/unittest/case.py", line 571 in run
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/unittest/case.py", line 610 in __call__
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/unittest/suite.py", line 117 in run
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/unittest/suite.py", line 79 in __call__
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/unittest/suite.py", line 117 in run
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/unittest/suite.py", line 79 in __call__
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/unittest/suite.py", line 117 in run
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/unittest/suite.py", line 79 in __call__
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/unittest/runner.py", line 168 in run
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/support/__init__.py", line 1685 in _run_suite
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/support/__init__.py", line 1719 in run_unittest
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/test_asyncio/__init__.py", line 31 in test_main
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/regrtest.py", line 1276 in runtest_inner
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/regrtest.py", line 965 in runtest
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/regrtest.py", line 532 in main
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/regrtest.py", line 1560 in main_in_temp_cwd
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/regrtest.py", line 1585 in <module>
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/runpy.py", line 73 in _run_code
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/runpy.py", line 160 in _run_module_as_main
Traceback (most recent call last):
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/runpy.py", line 160, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/runpy.py", line 73, in _run_code
    exec(code, run_globals)
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/__main__.py", line 3, in <module>
    regrtest.main_in_temp_cwd()
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/regrtest.py", line 1560, in main_in_temp_cwd
    main()
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/regrtest.py", line 736, in main
    raise Exception("Child error on {}: {}".format(test, result[1]))
Exception: Child error on test_asyncio: Exit code 1
make: *** [buildbottest] Error 1
msg203392 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-11-19 15:43
The traceback is pretty useless. So is the title of the other bug report "hangs for 1 hour" (which just means it hangs forever but the test runner kills it after one hour).

We would need to run the tests with -v so at least we can pinpoint which of the 600 or so tests in test_asyncio is failing.

PS. Off-topic: "raise Exception(...)" is poor coding style:
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/regrtest.py", line 736, in main
    raise Exception("Child error on {}: {}".format(test, result[1]))
msg203393 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-19 15:51
Ah, it looks like the test uses a child process. The following issue has a similar problem, we don't know that status of the child process:
http://bugs.python.org/issue19564

I proposed a patch in this issue for multiprocessing, to enable also faulthandler in the child process.

We can probably do something similar in asyncio with child process (enable faulthandler in unit tests with a short timeout).
msg203404 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-11-19 17:43
I cannot help you unless you tell me which specific test is failing. But once you have a proposed fix I will review it. Thanks for advocating for the minority platforms!
msg203455 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-20 01:23
> I cannot help you unless you tell me which specific test is failing.

Oh, I didn't notice that test_subprocess_send_signal() is part of a mixin: SubprocessTestsMixin. Configure output:

checking sys/devpoll.h usability... no
checking sys/devpoll.h presence... no
checking for sys/devpoll.h... no
checking sys/epoll.h usability... no
checking sys/epoll.h presence... no
checking for sys/epoll.h... no
checking poll.h usability... yes
checking poll.h presence... yes
checking for broken poll()... no
checking for kqueue... yes

I don't know which selector was used: select, poll or maybe kqueue?
msg203458 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-11-20 03:06
Oh, sorry, I didn't realize the name of the failing test was in the issue title. But even that's no excuse, because it's also in the log. :-(

Fortunately the line where we're hanging is also in the log: line 291 in selectors.py is in PollSelector. So that's settled. But we don't know if that's the first of the three tests that runs or not.

What's also strange is that several other tests (test_subprocess_kill() and test_subprocess_terminate()) also send signals, so they should work exactly the same way. But again we don't know if any of them are being run before the signal test.

Do you have any other traces from the same bot? Do they all fail in the same test? Is the test order randomized?
msg203602 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-21 10:17
asyncio_subprocess_watchdog.patch:

* add test_asyncio.watchdog.setup_watchdog(): use faulthandler.dump_traceback_later() to kill the process if it hangs more than 'timeout' seconds (do nothing if faulthandler is not present, eg. on other Python VMs than CPython)
* add SubprocessTestsMixin.debug_info(): return returncode, stdout, stderr
* use debug_info() on functions using subprocess may may hang

The patch should give more information on the child process if it hangs. I didn't see the test_asyncio hang on "AMD64 Snow Leop 3.x" recently. The hang is probably random :-/

I chose a default timeout of 5 minutes, it should be enough for the short tests of test_asyncio. You may set the timeout to 15 minutes if a buildbot is too slow. Or set the timeout to 1 minute if you prefer to not hang a buildbot too long.

I tested asyncio_subprocess_watchdog.patch manually with a shorter timeout and by adding time.sleep(60) in echo.py, echo2.py and echo3.py.
msg203636 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-11-21 15:16
Hmm... When did you last see the hang?  We've had a variety of checkins that might have affected this code.

If you've seen at least one traceback later than the commit below, feel free to commit.  Otherwise, let's wait.

changeset:   87088:eb42adc53923
user:        Guido van Rossum <guido@dropbox.com>
date:        Wed Nov 13 15:50:08 2013 -0800
summary:     asyncio: Fix from Anthony Baire for CPython issue 19566 (replaces earlier fix).
msg210394 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-06 14:51
I didn't see this bug recently, so I close the issue.
History
Date User Action Args
2014-02-06 14:51:21vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg210394
2013-11-21 15:16:05gvanrossumsetmessages: + msg203636
2013-11-21 10:17:15vstinnersetfiles: + asyncio_subprocess_watchdog.patch
keywords: + patch
messages: + msg203602
2013-11-20 03:06:10gvanrossumsetmessages: + msg203458
2013-11-20 01:23:09vstinnersetmessages: + msg203455
2013-11-19 17:43:34gvanrossumsetmessages: + msg203404
2013-11-19 15:51:43vstinnersetmessages: + msg203393
2013-11-19 15:43:44gvanrossumsetmessages: + msg203392
2013-11-19 11:20:25vstinnercreate