classification
Title: asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)
Type: Stage:
Components: asyncio, Tests Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: aeros, asvetlov, vstinner, yselivanov
Priority: normal Keywords:

Created on 2019-09-30 12:57 by vstinner, last changed 2019-10-25 12:30 by vstinner.

Messages (7)
msg353568 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-30 12:57
test_asyncio fails once on AMD64 RHEL7 Refleaks 3.x, and then test_close_kill_running() was killed after 3h 15 min. I guess that it hangs, but I'm not 100% sure. When test_asyncio was re-run, it seems like test_asyncio was run 3x successful but it hanged at the 4rd run. Refleaks runs each test 6 times.

https://buildbot.python.org/all/#/builders/264/builds/10

0:17:04 load avg: 3.01 [416/419/1] test_asyncio failed (12 min 21 sec) -- running: test_multiprocessing_fork (6 min 34 sec), test_concurrent_futures (10 min 52 sec), test_multiprocessing_spawn (13 min 53 sec)
beginning 6 repetitions
123456
.Unknown child process pid 25032, will report returncode 255
Loop <_UnixSelectorEventLoop running=False closed=True debug=False> that handles pid 25032 is closed
.Unknown child process pid 19349, will report returncode 255
Child watcher got an unexpected pid: 19349
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/asyncio/unix_events.py", line 1213, in _do_waitpid
    loop, callback, args = self._callbacks.pop(pid)
KeyError: 19349
test test_asyncio failed -- Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/test/test_asyncio/test_subprocess.py", line 156, in test_shell
    self.assertEqual(exitcode, 7)
AssertionError: 255 != 7

(...)
0:21:59 load avg: 0.52 [419/419/1] test_concurrent_futures passed (15 min 46 sec)
(...)
== Tests result: FAILURE ==

404 tests OK.

10 slowest tests:
- test_concurrent_futures: 15 min 46 sec
- test_multiprocessing_spawn: 14 min 7 sec
- test_asyncio: 12 min 21 sec
- test_multiprocessing_forkserver: 8 min 59 sec
- test_multiprocessing_fork: 7 min 13 sec
- test_io: 3 min 47 sec
- test_subprocess: 3 min 30 sec
- test_pickle: 2 min 51 sec
- test_capi: 2 min 47 sec
- test_zipfile: 2 min 41 sec

1 test failed:
    test_asyncio

14 tests skipped:
    test_devpoll test_gdb test_ioctl test_kqueue test_msilib
    test_ossaudiodev test_startfile test_tix test_tk test_ttk_guionly
    test_winconsoleio test_winreg test_winsound test_zipfile64

Re-running failed tests in verbose mode

Re-running test_asyncio in verbose mode
beginning 6 repetitions
123456
(...)

test_close_kill_running (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
...
test_close_kill_running (test.test_asyncio.test_subprocess.SubprocessMultiLoopWatcherTests) ... ok
...
test_close_kill_running (test.test_asyncio.test_subprocess.SubprocessSafeWatcherTests) ... ok
...
test_close_kill_running (test.test_asyncio.test_subprocess.SubprocessThreadedWatcherTests) ... ok
...


test_close_kill_running (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
...
test_close_kill_running (test.test_asyncio.test_subprocess.SubprocessMultiLoopWatcherTests) ... ok
...
test_close_kill_running (test.test_asyncio.test_subprocess.SubprocessSafeWatcherTests) ... ok
...
test_close_kill_running (test.test_asyncio.test_subprocess.SubprocessThreadedWatcherTests) ... ok
...


test_close_kill_running (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
...
test_close_kill_running (test.test_asyncio.test_subprocess.SubprocessMultiLoopWatcherTests) ... ok
...
test_close_kill_running (test.test_asyncio.test_subprocess.SubprocessSafeWatcherTests) ... ok
...
test_close_kill_running (test.test_asyncio.test_subprocess.SubprocessThreadedWatcherTests) ... ok
...


test_close_kill_running (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
...
test_close_kill_running (test.test_asyncio.test_subprocess.SubprocessMultiLoopWatcherTests) ... ok
...
test_close_kill_running (test.test_asyncio.test_subprocess.SubprocessSafeWatcherTests) ... ok
...
test_close_kill_running (test.test_asyncio.test_subprocess.SubprocessThreadedWatcherTests) ... ok
...
test_close_kill_running (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_communicate (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_communicate_ignore_broken_pipe (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_create_subprocess_exec_text_mode_fails (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_create_subprocess_exec_with_path (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_create_subprocess_shell_text_mode_fails (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_devnull_error (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_devnull_input (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_devnull_output (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_empty_input (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_exec_loop_deprecated (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_kill (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_pause_reading (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_popen_error (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_popen_error_with_stdin_pipe (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_process_create_warning (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_read_stdout_after_process_exit (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_send_signal (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_shell (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_shell_loop_deprecated (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_start_new_session (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_stdin_broken_pipe (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_stdin_not_inheritable (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_stdin_stdout (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_subprocess_protocol_create_warning (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_terminate (test.test_asyncio.test_subprocess.SubprocessFastWatcherTests) ... ok
test_cancel_make_subprocess_transport_exec (test.test_asyncio.test_subprocess.SubprocessMultiLoopWatcherTests) ... ok
test_cancel_post_init (test.test_asyncio.test_subprocess.SubprocessMultiLoopWatcherTests) ... ok
test_cancel_process_wait (test.test_asyncio.test_subprocess.SubprocessMultiLoopWatcherTests) ... ok
test_close_dont_kill_finished (test.test_asyncio.test_subprocess.SubprocessMultiLoopWatcherTests) ... ok
...Timeout (3:15:00)!
Thread 0x00007f666a7a1740 (most recent call first):
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/selectors.py", line 468 in select
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/asyncio/base_events.py", line 1837 in _run_once
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/asyncio/base_events.py", line 589 in run_forever
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/asyncio/base_events.py", line 621 in run_until_complete
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/test/test_asyncio/test_subprocess.py", line 484 in test_close_kill_running
msg354079 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-07 11:04
Fail on x86 Gentoo Refleaks 3.x:
https://buildbot.python.org/all/#/builders/1/builds/737

(...)
Loop <_UnixSelectorEventLoop running=False closed=True debug=False> that handles pid 8654 is closed
.Timeout (3:15:00)!
Thread 0xb7b92700 (most recent call first):
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/selectors.py", line 468 in select
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/asyncio/base_events.py", line 1837 in _run_once
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/asyncio/base_events.py", line 589 in run_forever
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/asyncio/base_events.py", line 621 in run_until_complete
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/test_asyncio/test_subprocess.py", line 484 in test_close_kill_running
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/case.py", line 616 in _callTestMethod
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/case.py", line 659 in run
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/case.py", line 719 in __call__
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 122 in run
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 84 in __call__
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 122 in run
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 84 in __call__
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 122 in run
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 84 in __call__
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 122 in run
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 84 in __call__
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 122 in run
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 84 in __call__
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/support/testresult.py", line 162 in run
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/support/__init__.py", line 2032 in _run_suite
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/support/__init__.py", line 2128 in run_unittest
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/libregrtest/runtest.py", line 209 in _test_module
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/libregrtest/refleak.py", line 87 in dash_R
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/libregrtest/runtest.py", line 232 in _runtest_inner2
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/libregrtest/runtest.py", line 270 in _runtest_inner
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/libregrtest/runtest.py", line 153 in _runtest
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/libregrtest/runtest.py", line 193 in runtest
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/libregrtest/runtest_mp.py", line 67 in run_tests_worker
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/libregrtest/main.py", line 654 in _main
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/libregrtest/main.py", line 634 in main
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/libregrtest/main.py", line 712 in main
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 43 in _main
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 47 in <module>
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/runpy.py", line 85 in _run_code
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/runpy.py", line 192 in _run_module_as_main

== Tests result: FAILURE ==

410 tests OK.

10 slowest tests:
- test_concurrent_futures: 23 min 2 sec
- test_multiprocessing_spawn: 22 min 6 sec
- test_multiprocessing_forkserver: 15 min 51 sec
- test_multiprocessing_fork: 10 min 15 sec
- test_regrtest: 10 min 10 sec
- test_gdb: 8 min 26 sec
- test_lib2to3: 7 min 31 sec
- test_pickle: 7 min
- test_io: 6 min 50 sec
- test_subprocess: 6 min 37 sec

1 test failed:
    test_asyncio

8 tests skipped:
    test_devpoll test_kqueue test_msilib test_startfile
    test_winconsoleio test_winreg test_winsound test_zipfile64
4:25:59 load avg: 1.55
4:25:59 load avg: 1.55 Re-running failed tests in verbose mode
4:25:59 load avg: 1.55 Re-running test_asyncio in verbose mode
(...)
test_close_kill_running (test.test_asyncio.test_subprocess.SubprocessMultiLoopWatcherTests) ... Timeout (3:15:00)!
Thread 0xb7c51700 (most recent call first):
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/selectors.py", line 468 in select
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/asyncio/base_events.py", line 1837 in _run_once
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/asyncio/base_events.py", line 589 in run_forever
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/asyncio/base_events.py", line 621 in run_until_complete
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/test_asyncio/test_subprocess.py", line 484 in test_close_kill_running
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/case.py", line 616 in _callTestMethod
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/case.py", line 659 in run
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/case.py", line 719 in __call__
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 122 in run
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 84 in __call__
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 122 in run
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 84 in __call__
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 122 in run
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 84 in __call__
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 122 in run
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 84 in __call__
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 122 in run
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 84 in __call__
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/unittest/runner.py", line 176 in run
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/support/__init__.py", line 2032 in _run_suite
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/support/__init__.py", line 2128 in run_unittest
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/libregrtest/runtest.py", line 209 in _test_module
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/libregrtest/refleak.py", line 87 in dash_R
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/libregrtest/runtest.py", line 232 in _runtest_inner2
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/libregrtest/runtest.py", line 270 in _runtest_inner
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/libregrtest/runtest.py", line 153 in _runtest
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/libregrtest/runtest.py", line 193 in runtest
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/libregrtest/main.py", line 318 in rerun_failed_tests
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/libregrtest/main.py", line 691 in _main
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/libregrtest/main.py", line 634 in main
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/libregrtest/main.py", line 712 in main
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/__main__.py", line 2 in <module>
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/runpy.py", line 85 in _run_code
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/runpy.py", line 192 in _run_module_as_main
make: *** [Makefile:1163: buildbottest] Error 1
msg355150 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-22 20:24
Also on x86 Gentoo Refleaks 3.x:

https://buildbot.python.org/all/#/builders/1/builds/754

6:11:43 load avg: 1.44 [419/419/2] test_asyncio crashed (Exit code 1)
beginning 6 repetitions
123456
Timeout (3:15:00)!
Thread 0xb7c4d700 (most recent call first):
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/selectors.py", line 468 in select
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/asyncio/base_events.py", line 1837 in _run_once
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/asyncio/base_events.py", line 589 in run_forever
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/asyncio/base_events.py", line 621 in run_until_complete
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.refleak/build/Lib/test/test_asyncio/test_subprocess.py", line 484 in test_close_kill_running
msg355157 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-22 21:38
I'm able to reproduce the issue locally using the command:

./python -m test test_asyncio --match=test.test_asyncio.test_subprocess.SubprocessMultiLoopWatcherTests.test_close_kill_running -v -F -j20 --timeout=30.0

Example:

...
0:00:47 load avg: 14.35 [287] test_asyncio passed -- running: test_asyncio (31.1 sec)
0:00:47 load avg: 14.35 [288] test_asyncio passed -- running: test_asyncio (31.3 sec)
0:00:47 load avg: 14.35 [289/1] test_asyncio crashed (Exit code 1)
Timeout (0:00:30)!
Thread 0x00007f3bbd098740 (most recent call first):
  File "/home/vstinner/python/master/Lib/selectors.py", line 468 in select
  File "/home/vstinner/python/master/Lib/asyncio/base_events.py", line 1837 in _run_once
  File "/home/vstinner/python/master/Lib/asyncio/base_events.py", line 589 in run_forever
  File "/home/vstinner/python/master/Lib/asyncio/base_events.py", line 621 in run_until_complete
  File "/home/vstinner/python/master/Lib/test/test_asyncio/test_subprocess.py", line 484 in test_close_kill_running
...


I added some debug traces.

Logs when it works:
---
0:01:39 load avg: 21.21 [512] test_asyncio passed -- running: test_asyncio (31.0 sec)
test_close_kill_running (test.test_asyncio.test_subprocess.SubprocessMultiLoopWatcherTests) ... MultiLoopChildWatcher: attach_loop <_UnixSelectorEventLoop running=False closed=False debug=False>
MultiLoopChildWatcher: SIGCHLD signal registered
MultiLoopChildWatcher: attach_loop None
MultiLoopChildWatcher: attach_loop None
MultiLoopChildWatcher: SIGCHLD signal registered
ok

----------------------------------------------------------------------

Ran 1 test in 0.054s

OK
MultiLoopChildWatcher: _do_waitpid 27887
MultiLoopChildWatcher: _do_waitpid (0, 0)
kill 27887 9
MultiLoopChildWatcher: _sig_chld
MultiLoopChildWatcher: _do_waitpid_all
MultiLoopChildWatcher: _do_waitpid 27887
MultiLoopChildWatcher: _do_waitpid (27887, 9)
MultiLoopChildWatcher: _do_waitpid: call (<bound method _UnixSelectorEventLoop._child_watcher_callback of <_UnixSelectorEventLoop running=True closed=False debug=False>>, <_UnixSelectorEventLoop running=True closed=False debug=False>)
---

Logs when it hangs:
---
0:01:40 load avg: 21.21 [514/1] test_asyncio crashed (Exit code 1)
test_close_kill_running (test.test_asyncio.test_subprocess.SubprocessMultiLoopWatcherTests) ... MultiLoopChildWatcher: attach_loop <_UnixSelectorEventLoop running=False closed=False debug=False>
MultiLoopChildWatcher: SIGCHLD signal registered
MultiLoopChildWatcher: _do_waitpid 26010
MultiLoopChildWatcher: _do_waitpid (0, 0)
kill 26010 9
Timeout (0:00:30)!
Thread 0x00007f4b73f35740 (most recent call first):
  File "/home/vstinner/python/master/Lib/selectors.py", line 468 in select
  File "/home/vstinner/python/master/Lib/asyncio/base_events.py", line 1837 in _run_once
  File "/home/vstinner/python/master/Lib/asyncio/base_events.py", line 589 in run_forever
  File "/home/vstinner/python/master/Lib/asyncio/base_events.py", line 621 in run_until_complete
  File "/home/vstinner/python/master/Lib/test/test_asyncio/test_subprocess.py", line 484 in test_close_kill_running
...
---

It seems like sometimes, MultiLoopChildWatcher._sig_chld() is not called.
msg355159 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-22 21:58
It seems like MultiLoopChildWatcher doesn't respect the PEP 475: siginterrupt(False) must not be used. But the following change isn't enough to fix this issue (test_close_kill_running hang).

diff --git a/Lib/asyncio/unix_events.py b/Lib/asyncio/unix_events.py
index d8f653045a..887f837bad 100644
--- a/Lib/asyncio/unix_events.py
+++ b/Lib/asyncio/unix_events.py
@@ -1192,9 +1192,6 @@ class MultiLoopChildWatcher(AbstractChildWatcher):
                                "restore to default handler on watcher close.")
                 self._saved_sighandler = signal.SIG_DFL
 
-            # Set SA_RESTART to limit EINTR occurrences.
-            signal.siginterrupt(signal.SIGCHLD, False)
-
     def _do_waitpid_all(self):
         for pid in list(self._callbacks):
             self._do_waitpid(pid)
msg355362 - (view) Author: Kyle Stanley (aeros) * (Python triager) Date: 2019-10-25 12:27
> I'm able to reproduce the issue locally using the command:
./python -m test test_asyncio --match=test.test_asyncio.test_subprocess.SubprocessMultiLoopWatcherTests.test_close_kill_running -v -F -j20 --timeout=30.0

I was unable to reproduce this locally on OS: Arch Linux 5.3.7 x86_64 and CPU: Intel i5-4460 from the latest commit (heads/master:96b06aefe2) within a reasonable number of tests (~10000). I also tried differing number of jobs, up to 100. What OS were you able to reproduce it locally on using that configuration?

This is of course still an issue since it's still occurring on multiple buildbots intermittently and Victor was able to reproduce it locally, but I was not able to do so on my local setup. This will be significantly harder to debug without a reliable means of reproducing the failure.
msg355363 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-25 12:30
I did my tests on Fedora 30 on my laptop which has 4 cores (8 logical CPUs).
History
Date User Action Args
2019-10-25 12:30:39vstinnersetmessages: + msg355363
2019-10-25 12:27:24aerossetmessages: + msg355362
2019-10-24 09:28:51aerossetnosy: + aeros
2019-10-22 21:58:23vstinnersetmessages: + msg355159
2019-10-22 21:43:36vstinnersettitle: test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x -> asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)
2019-10-22 21:38:49vstinnersetmessages: + msg355157
2019-10-22 20:24:39vstinnersettitle: test_ayncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x -> test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x
2019-10-22 20:24:32vstinnersetmessages: + msg355150
2019-10-07 11:04:35vstinnersetmessages: + msg354079
2019-09-30 12:57:35vstinnercreate