classification
Title: test suite: enable faulthandler timeout in assert_python
Type: enhancement Stage: resolved
Components: Tests Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: neologix, vstinner
Priority: normal Keywords:

Created on 2013-09-08 09:52 by neologix, last changed 2020-01-07 12:27 by vstinner. This issue is now closed.

Messages (3)
msg197239 - (view) Author: Charles-Fran├žois Natali (neologix) * (Python committer) Date: 2013-09-08 09:52
Currently, the test suite, as well as processes spawned by the script_helper.assert_python family, are run with faulthandler enabled.
That's great to debug crashes, but it would be even better if those processes were started with faulthandler's timeout:

1) Most deadlock-prone tests are run in child processes, so in case of deadlock, you don't get any trace:

http://buildbot.python.org/all/builders/AMD64 FreeBSD 10.0 3.x/builds/353/steps/test/logs/stdio
"""
[269/380] test_threading
Timeout (1:00:00)!
Thread 0x0000000801c06400:
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/subprocess.py", line 1615 in _communicate_with_poll
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/subprocess.py", line 1535 in _communicate
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/subprocess.py", line 945 in communicate
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/test/script_helper.py", line 36 in _assert_python
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/test/script_helper.py", line 55 in assert_python_ok
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/test/test_threading.py", line 617 in assertScriptHasOutput
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/test/test_threading.py", line 692 in test_4_joining_across_fork_in_worker_thread
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/unittest/case.py", line 496 in run
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/unittest/case.py", line 535 in __call__
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/unittest/suite.py", line 117 in run
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/unittest/suite.py", line 79 in __call__
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/unittest/suite.py", line 117 in run
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/unittest/suite.py", line 79 in __call__
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/unittest/suite.py", line 117 in run
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/unittest/suite.py", line 79 in __call__
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/unittest/runner.py", line 168 in run
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/test/support/__init__.py", line 1649 in _run_suite
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/test/support/__init__.py", line 1683 in run_unittest
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/test/regrtest.py", line 1275 in <lambda>
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/test/regrtest.py", line 1276 in runtest_inner
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/test/regrtest.py", line 965 in runtest
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/test/regrtest.py", line 761 in main
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/test/regrtest.py", line 1560 in main_in_temp_cwd
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/test/__main__.py", line 3 in <module>
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/runpy.py", line 73 in _run_code
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/runpy.py", line 160 in _run_module_as_main
*** Error code 1
"""

Here, we just see that the main process is waiting for its child to complete, but we don't know anything about the child process stack.

2) As an added benefit, this would prevent dangling child processes: when the parent is killed, they're reparented to init, and can keep running arbitrarily long, consuming memory/CPU/process table entry (well, maybe the buildbot scripts kill the whole process group, I don't know).
msg197242 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-09-08 09:58
I see two options:

* faulthandler calls killpg(SIGABRT) on timeout to kill child processes (but it should ignore temporary the signal to not kill itself)
* use a timeout, but shorter than the global timeout, for child processes

Not all tests use script_helper. But this is probably a different issue ;-)
msg359505 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-01-07 12:27
I modified regrtest to use process groups in bpo-38502. It doesn't solve exactly this issue, but it does fix the overall problem of leaking running processes when a test fails for various reasons. For example, when using regrtest in multiprocessing (-jN) mode), if a test times out, child processes of this test will now be killed.
History
Date User Action Args
2020-01-07 12:27:20vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg359505

stage: resolved
2013-09-08 09:58:36vstinnersetmessages: + msg197242
2013-09-08 09:52:48neologixcreate