Issue39995
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2020-03-17 17:01 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 19739 | merged | vstinner, 2020-04-27 19:06 | |
PR 19751 | closed | aeros, 2020-04-28 08:44 | |
PR 19758 | closed | vstinner, 2020-04-28 15:58 | |
PR 19760 | closed | vstinner, 2020-04-28 16:37 | |
PR 19788 | open | tomMoral, 2020-04-29 09:23 |
Messages (34) | |||
---|---|---|---|
msg364451 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-03-17 17:01 | |
AMD64 Ubuntu Shared 3.x: https://buildbot.python.org/all/#/builders/101/builds/532 test_crash (test.test_concurrent_futures.ProcessPoolSpawnExecutorDeadlockTest) ... Stdout: 15.51s Stderr: Warning -- threading_cleanup() failed to cleanup 0 threads (count: 0, dangling: 3) Dangling thread: <Thread(QueueFeederThread, stopped daemon 140540322510592)> Dangling thread: <_MainThread(MainThread, started 140540525950528)> Dangling thread: <_ExecutorManagerThread(Thread-111, stopped daemon 140540401628928)> (...) ====================================================================== ERROR: test_crash (test.test_concurrent_futures.ProcessPoolSpawnExecutorDeadlockTest) [exit at task unpickle] ---------------------------------------------------------------------- Traceback (most recent call last): File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_concurrent_futures.py", line 1119, in test_crash executor.shutdown(wait=True) File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/concurrent/futures/process.py", line 721, in shutdown self._executor_manager_thread_wakeup.wakeup() File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/concurrent/futures/process.py", line 93, in wakeup self._writer.send_bytes(b"") File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/multiprocessing/connection.py", line 205, in send_bytes self._send_bytes(m[offset:offset + size]) File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/multiprocessing/connection.py", line 416, in _send_bytes self._send(header + buf) File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/multiprocessing/connection.py", line 373, in _send n = write(self._handle, buf) OSError: [Errno 9] Bad file descriptor Stdout: 15.51s Stderr: Warning -- threading_cleanup() failed to cleanup 0 threads (count: 0, dangling: 3) Dangling thread: <Thread(QueueFeederThread, stopped daemon 140540322510592)> Dangling thread: <_MainThread(MainThread, started 140540525950528)> Dangling thread: <_ExecutorManagerThread(Thread-111, stopped daemon 140540401628928)> -- On the same build, test_concurrent_futures timed out after 15 min, while running test_ressources_gced_in_workers(): 0:29:08 load avg: 1.46 Re-running test_concurrent_futures in verbose mode (...) test_map_exception (test.test_concurrent_futures.ProcessPoolForkProcessPoolExecutorTest) ... ok test_map_timeout (test.test_concurrent_futures.ProcessPoolForkProcessPoolExecutorTest) ... ok test_max_workers_negative (test.test_concurrent_futures.ProcessPoolForkProcessPoolExecutorTest) ... ok test_max_workers_too_large (test.test_concurrent_futures.ProcessPoolForkProcessPoolExecutorTest) ... skipped 'Windows-only process limit' test_no_stale_references (test.test_concurrent_futures.ProcessPoolForkProcessPoolExecutorTest) ... ok Timeout (0:15:00)! Thread 0x00007f38bf766700 (most recent call first): File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/threading.py", line 303 in wait File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/multiprocessing/queues.py", line 227 in _feed File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/threading.py", line 882 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/threading.py", line 944 in _bootstrap_inner File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/threading.py", line 902 in _bootstrap Thread 0x00007f38bff67700 (most recent call first): File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/selectors.py", line 415 in select File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/multiprocessing/connection.py", line 936 in wait File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/concurrent/futures/process.py", line 372 in wait_result_broken_or_wakeup File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/concurrent/futures/process.py", line 319 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/threading.py", line 944 in _bootstrap_inner File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/threading.py", line 902 in _bootstrap Thread 0x00007f38c7128640 (most recent call first): File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/threading.py", line 303 in wait File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/concurrent/futures/_base.py", line 434 in result File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_concurrent_futures.py", line 955 in test_ressources_gced_in_workers File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 616 in _callTestMethod File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 659 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 719 in __call__ File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 122 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 84 in __call__ File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 122 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 84 in __call__ File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 122 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 84 in __call__ File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/runner.py", line 176 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support/__init__.py", line 2079 in _run_suite File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support/__init__.py", line 2201 in run_unittest File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/libregrtest/runtest.py", line 209 in _test_module File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/libregrtest/runtest.py", line 234 in _runtest_inner2 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/libregrtest/runtest.py", line 270 in _runtest_inner File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/libregrtest/runtest.py", line 153 in _runtest File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/libregrtest/runtest.py", line 193 in runtest File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/libregrtest/main.py", line 318 in rerun_failed_tests File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/libregrtest/main.py", line 691 in _main File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/libregrtest/main.py", line 634 in main File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/libregrtest/main.py", line 712 in main File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/__main__.py", line 2 in <module> File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/runpy.py", line 87 in _run_code File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/runpy.py", line 194 in _run_module_as_main test_ressources_gced_in_workers (test.test_concurrent_futures.ProcessPoolForkProcessPoolExecutorTest) ... Makefile:1171: recipe for target 'buildbottest' failed make: *** [buildbottest] Error 1 command timed out: 1200 seconds without output running [b'make', b'buildbottest', b'TESTOPTS=-j2 --junit-xml test-results.xml ${BUILDBOT_TESTOPTS}', b'TESTPYTHONOPTS=', b'TESTTIMEOUT=900'], attempting to kill program finished with exit code 2 elapsedTime=3849.667593 |
|||
msg364546 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-03-18 18:24 | |
Same bug on AMD64 FreeBSD Non-Debug 3.x: https://buildbot.python.org/all/#/builders/214/builds/472 ====================================================================== ERROR: test_crash (test.test_concurrent_futures.ProcessPoolSpawnExecutorDeadlockTest) [crash during func execution on worker] ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/home/buildbot/python/3.x.koobs-freebsd-9e36.nondebug/build/Lib/test/test_concurrent_futures.py", line 1119, in test_crash executor.shutdown(wait=True) File "/usr/home/buildbot/python/3.x.koobs-freebsd-9e36.nondebug/build/Lib/concurrent/futures/process.py", line 721, in shutdown self._executor_manager_thread_wakeup.wakeup() File "/usr/home/buildbot/python/3.x.koobs-freebsd-9e36.nondebug/build/Lib/concurrent/futures/process.py", line 93, in wakeup self._writer.send_bytes(b"") File "/usr/home/buildbot/python/3.x.koobs-freebsd-9e36.nondebug/build/Lib/multiprocessing/connection.py", line 205, in send_bytes self._send_bytes(m[offset:offset + size]) File "/usr/home/buildbot/python/3.x.koobs-freebsd-9e36.nondebug/build/Lib/multiprocessing/connection.py", line 416, in _send_bytes self._send(header + buf) File "/usr/home/buildbot/python/3.x.koobs-freebsd-9e36.nondebug/build/Lib/multiprocessing/connection.py", line 373, in _send n = write(self._handle, buf) OSError: [Errno 9] Bad file descriptor Stdout: 8.35s Stderr: Warning -- threading_cleanup() failed to cleanup 0 threads (count: 0, dangling: 3) Dangling thread: <_MainThread(MainThread, started 34369908736)> Dangling thread: <_ExecutorManagerThread(Thread-114, stopped daemon 34396434432)> Dangling thread: <Thread(QueueFeederThread, stopped daemon 34377508352)> ---------------------------------------------------------------------- |
|||
msg364552 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-03-18 18:31 | |
> Same bug on AMD64 FreeBSD Non-Debug 3.x: > https://buildbot.python.org/all/#/builders/214/builds/472 Oh, test_crash failed twice, but not on the same test case: * ERROR: test_crash (test.test_concurrent_futures.ProcessPoolSpawnExecutorDeadlockTest) [crash during func execution on worker] * ERROR: test_crash (test.test_concurrent_futures.ProcessPoolForkExecutorDeadlockTest) [crash during func execution on worker] The second failure was when test_concurrent_futures was re-run sequentially. |
|||
msg367434 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-04-27 16:02 | |
See also bpo-30966 "Add multiprocessing.SimpleQueue.close()". |
|||
msg367449 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-04-27 19:36 | |
New changeset 5d1f32d33ba24d0aa87235ae40207bb57778388b by Victor Stinner in branch 'master': bpo-39995: Split test_concurrent_futures.test_crash() into sub-tests (GH-19739) https://github.com/python/cpython/commit/5d1f32d33ba24d0aa87235ae40207bb57778388b |
|||
msg367450 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-04-27 19:46 | |
AMD64 Ubuntu Shared 3.x: https://buildbot.python.org/all/#/builders/101/builds/809 ====================================================================== ERROR: test_shutdown_no_wait (test.test_concurrent_futures.ProcessPoolForkserverProcessPoolShutdownTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_concurrent_futures.py", line 542, in test_shutdown_no_wait executor.shutdown(wait=False) File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/concurrent/futures/process.py", line 724, in shutdown self._executor_manager_thread_wakeup.wakeup() File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/concurrent/futures/process.py", line 80, in wakeup self._writer.send_bytes(b"") File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/multiprocessing/connection.py", line 188, in send_bytes self._check_closed() File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/multiprocessing/connection.py", line 141, in _check_closed raise OSError("handle is closed") OSError: handle is closed (...) 0:32:37 load avg: 1.64 Re-running test_concurrent_futures in verbose mode (...) ====================================================================== ERROR: test_shutdown_no_wait (test.test_concurrent_futures.ProcessPoolForkProcessPoolShutdownTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_concurrent_futures.py", line 542, in test_shutdown_no_wait executor.shutdown(wait=False) File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/concurrent/futures/process.py", line 724, in shutdown self._executor_manager_thread_wakeup.wakeup() File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/concurrent/futures/process.py", line 80, in wakeup self._writer.send_bytes(b"") File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/multiprocessing/connection.py", line 205, in send_bytes self._send_bytes(m[offset:offset + size]) File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/multiprocessing/connection.py", line 416, in _send_bytes self._send(header + buf) File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/multiprocessing/connection.py", line 373, in _send n = write(self._handle, buf) OSError: [Errno 9] Bad file descriptor |
|||
msg367451 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-04-27 19:54 | |
> See also bpo-30966 "Add multiprocessing.SimpleQueue.close()". I pushed a commit 1a275013d1ecc2e3778d64fda86174b2f13d6969: "Process.shutdown(wait=True) of concurrent.futures now closes explicitly the result queue." test_shutdown_deadlock_pickle() still rely on the queue to be closed implicitly. Queue created at: (...) File "/home/vstinner/python/master/Lib/test/test_concurrent_futures.py", lineno 1196 with self.executor_type(max_workers=2, File "/home/vstinner/python/master/Lib/concurrent/futures/process.py", lineno 637 self._result_queue = mp_context.SimpleQueue() File "/home/vstinner/python/master/Lib/multiprocessing/context.py", lineno 113 return SimpleQueue(ctx=self.get_context()) File "/home/vstinner/python/master/Lib/multiprocessing/queues.py", lineno 341 self._reader, self._writer = connection.Pipe(duplex=False) File "/home/vstinner/python/master/Lib/multiprocessing/connection.py", lineno 539 c2 = Connection(fd2, readable=False) |
|||
msg367453 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-04-27 20:09 | |
AMD64 Fedora Stable Clang Installed 3.x: https://buildbot.python.org/all/#/builders/127/builds/679 0:04:21 load avg: 1.29 [423/423/1] test_concurrent_futures failed (2 min 39 sec) Warning -- threading_cleanup() failed to cleanup -1 threads (count: 0, dangling: 3) Warning -- Dangling thread: <_MainThread(MainThread, started 139673296918336)> Warning -- Dangling thread: <Thread(QueueFeederThread, stopped daemon 139673045362432)> Warning -- Dangling thread: <_ExecutorManagerThread(Thread-145, stopped 139673053914880)> Warning -- threading_cleanup() failed to cleanup 0 threads (count: 0, dangling: 3) Warning -- Dangling thread: <_MainThread(MainThread, started 139673296918336)> Warning -- Dangling thread: <Thread(QueueFeederThread, stopped daemon 139673045362432)> Warning -- Dangling thread: <_ExecutorManagerThread(Thread-145, stopped 139673053914880)> /home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 5 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' Warning -- multiprocessing.process._dangling was modified by test_concurrent_futures Warning -- threading._dangling was modified by test_concurrent_futures test_cancel (test.test_concurrent_futures.FutureTests) ... ok test_cancelled (test.test_concurrent_futures.FutureTests) ... ok test_done (test.test_concurrent_futures.FutureTests) ... ok (...) test_first_exception_some_already_complete (test.test_concurrent_futures.ThreadPoolWaitTests) ... 1.60s ok test_pending_calls_race (test.test_concurrent_futures.ThreadPoolWaitTests) ... 0.11s ok test_timeout (test.test_concurrent_futures.ThreadPoolWaitTests) ... 6.11s ok Traceback (most recent call last): File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/multiprocessing/util.py", line 300, in _run_finalizers finalizer() File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/multiprocessing/util.py", line 224, in __call__ res = self._callback(*self._args, **self._kwargs) File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/multiprocessing/synchronize.py", line 87, in _cleanup sem_unlink(name) FileNotFoundError: [Errno 2] No such file or directory Traceback (most recent call last): File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/multiprocessing/util.py", line 300, in _run_finalizers finalizer() File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/multiprocessing/util.py", line 224, in __call__ res = self._callback(*self._args, **self._kwargs) File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/multiprocessing/synchronize.py", line 87, in _cleanup sem_unlink(name) FileNotFoundError: [Errno 2] No such file or directory Traceback (most recent call last): File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/multiprocessing/util.py", line 300, in _run_finalizers finalizer() File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/multiprocessing/util.py", line 224, in __call__ res = self._callback(*self._args, **self._kwargs) File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/multiprocessing/synchronize.py", line 87, in _cleanup sem_unlink(name) FileNotFoundError: [Errno 2] No such file or directory Traceback (most recent call last): File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/multiprocessing/util.py", line 300, in _run_finalizers finalizer() File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/multiprocessing/util.py", line 224, in __call__ res = self._callback(*self._args, **self._kwargs) File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/multiprocessing/synchronize.py", line 87, in _cleanup sem_unlink(name) FileNotFoundError: [Errno 2] No such file or directory Traceback (most recent call last): File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/multiprocessing/util.py", line 300, in _run_finalizers finalizer() File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/multiprocessing/util.py", line 224, in __call__ res = self._callback(*self._args, **self._kwargs) File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/multiprocessing/synchronize.py", line 87, in _cleanup sem_unlink(name) FileNotFoundError: [Errno 2] No such file or directory ====================================================================== ERROR: test_killed_child (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/test/test_concurrent_futures.py", line 130, in tearDown self.executor.shutdown(wait=True) File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/concurrent/futures/process.py", line 724, in shutdown self._executor_manager_thread_wakeup.wakeup() File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/concurrent/futures/process.py", line 80, in wakeup self._writer.send_bytes(b"") File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/multiprocessing/connection.py", line 205, in send_bytes self._send_bytes(m[offset:offset + size]) File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes self._send(header + buf) File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/multiprocessing/connection.py", line 373, in _send n = write(self._handle, buf) OSError: [Errno 9] Bad file descriptor ---------------------------------------------------------------------- Ran 193 tests in 159.805s FAILED (errors=1, skipped=6) Before: set() After: {<weakref at 0x7f0828b17bd0; to 'SpawnProcess' at 0x7f082ace7400>, <weakref at 0x7f0828b17b80; to 'SpawnProcess' at 0x7f082ace7880>, <weakref at 0x7f0828b17b30; to 'SpawnProcess' at 0x7f0828bbc0a0>, <weakref at 0x7f0828b17720; to 'SpawnProcess' at 0x7f082ace72e0>, <weakref at 0x7f0828b17a90; to 'SpawnProcess' at 0x7f082ace7910>} Before: {<weakref at 0x7f082b5ad400; to '_MainThread' at 0x7f082bcd8850>} After: {<weakref at 0x7f0828b4d900; to 'Thread' at 0x7f0828bbc4c0>, <weakref at 0x7f0828b4d0e0; to '_MainThread' at 0x7f082bcd8850>, <weakref at 0x7f0828b17a40; to '_ExecutorManagerThread' at 0x7f0828bbc8e0>} test test_concurrent_futures failed == Tests result: FAILURE == |
|||
msg367455 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-04-27 20:34 | |
x86 Gentoo Installed with X 3.x: https://buildbot.python.org/all/#/builders/128/builds/726 test_del_shutdown (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolShutdownTest) ... Warning -- Unraisable exception Exception ignored in: <function _ExecutorManagerThread.__init__.<locals>.weakref_cb at 0xb5067898> Traceback (most recent call last): File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.installed/build/target/lib/python3.9/concurrent/futures/process.py", line 281, in weakref_cb thread_wakeup.wakeup() File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.installed/build/target/lib/python3.9/concurrent/futures/process.py", line 80, in wakeup self._writer.send_bytes(b"") File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.installed/build/target/lib/python3.9/multiprocessing/connection.py", line 205, in send_bytes self._send_bytes(m[offset:offset + size]) File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.installed/build/target/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes self._send(header + buf) File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.installed/build/target/lib/python3.9/multiprocessing/connection.py", line 373, in _send n = write(self._handle, buf) OSError: [Errno 9] Bad file descriptor 0.04s ok |
|||
msg367458 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-04-27 21:15 | |
ERROR: test_killed_child (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) (...) File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.9/multiprocessing/connection.py", line 373, in _send n = write(self._handle, buf) OSError: [Errno 9] Bad file descriptor It seems like Connection.close() was called while Connection._send() was called. I added debug logs: * self._handle was equal to 4 at the function entry * self._handle was equal to None when write() was called |
|||
msg367460 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-04-27 21:26 | |
> It seems like Connection.close() was called while Connection._send() was called. I added debug logs: The connection was closed by terminate_broken() called by _ExecutorManagerThread.run() thread: test_killed_child (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... close handle 4 File "/home/vstinner/python/master/Lib/threading.py", line 908, in _bootstrap self._bootstrap_inner() File "/home/vstinner/python/master/Lib/threading.py", line 950, in _bootstrap_inner self.run() File "/home/vstinner/python/master/Lib/concurrent/futures/process.py", line 313, in run self.terminate_broken(cause) File "/home/vstinner/python/master/Lib/concurrent/futures/process.py", line 456, in terminate_broken self.join_executor_internals() File "/home/vstinner/python/master/Lib/concurrent/futures/process.py", line 503, in join_executor_internals self.thread_wakeup.close() File "/home/vstinner/python/master/Lib/concurrent/futures/process.py", line 75, in close self._writer.close() |
|||
msg367462 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-04-27 21:46 | |
terminate_broken() method was added by: commit 0e89076247580ba0e570c4816f0e5628a7e36e83 Author: Thomas Moreau <thomas.moreau.2010@gmail.com> Date: Sun Mar 1 21:49:14 2020 +0100 bpo-39678: refactor queue manager thread (GH-18551) |
|||
msg367463 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-04-27 21:48 | |
> ERROR: test_killed_child (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) The patch below makes this test failure more likely: diff --git a/Lib/multiprocessing/connection.py b/Lib/multiprocessing/connection.py index 510e4b5aba..63518e55d9 100644 --- a/Lib/multiprocessing/connection.py +++ b/Lib/multiprocessing/connection.py @@ -370,6 +370,7 @@ class Connection(_ConnectionBase): def _send(self, buf, write=_write): remaining = len(buf) while True: + time.sleep(0.050) n = write(self._handle, buf) remaining -= n if remaining == 0: |
|||
msg367465 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-04-27 22:05 | |
It seems like test_killed_child() race condition was introduced by: commit a5cbab552d294d99fde864306632d7e511a75d3c (refs/bisect/bad) Author: Thomas Moreau <thomas.moreau.2010@gmail.com> Date: Sun Feb 16 19:09:26 2020 +0100 bpo-39104: Fix hanging ProcessPoolExecutor on shutdown nowait with pickling failure (GH-17670) |
|||
msg367486 - (view) | Author: Kyle Stanley (aeros) * | Date: 2020-04-28 03:09 | |
I can't be certain for the other failures, but I'm currently exploring a potential solution for addressing the `test_killed_child` failure. To me, it seems to be a race condition with attempting to call _ThreadWakeup.close() while there are still bytes being sent. IMO, we should wait until closing the pipe's reader and writer until it's finished sending or receiving bytes. Here's one way to implement that w/ threading.Event: diff --git a/Lib/concurrent/futures/process.py b/Lib/concurrent/futures/process.py index 8e9b69a8f0..9bf073fc34 100644 --- a/Lib/concurrent/futures/process.py +++ b/Lib/concurrent/futures/process.py @@ -68,21 +68,30 @@ class _ThreadWakeup: def __init__(self): self._closed = False self._reader, self._writer = mp.Pipe(duplex=False) + # Used to ensure pipe is not closed while sending or receiving bytes + self._not_running = threading.Event() + # Initialize event as True + self._not_running.set() def close(self): if not self._closed: + self._not_running.wait() self._closed = True self._writer.close() self._reader.close() def wakeup(self): if not self._closed: + self._not_running.clear() self._writer.send_bytes(b"") + self._not_running.set() def clear(self): if not self._closed: + self._not_running.clear() while self._reader.poll(): self._reader.recv_bytes() + self._not_running.set() From using Victor's method of replicating the failure with inserting a `time.sleep(0.050)` in multiprocessing.Connection._send(), it appears to fix the failure in test_killed_child. I believe it would also fix the other failures since they appear to be caused by the same core issue, but I've been unable to replicate them locally so I'm not 100% certain. |
|||
msg367487 - (view) | Author: Kyle Stanley (aeros) * | Date: 2020-04-28 03:16 | |
After writing the above out and a bit of further consideration, I think it might make more sense to wait for the event after setting `self._closed = True` so that it prevents future wakeup() and clear() calls from reading/writing to the pipe, while still allowing ones that are currently ongoing to finish. Thoughts? |
|||
msg367503 - (view) | Author: Thomas Moreau (tomMoral) * | Date: 2020-04-28 07:05 | |
Sorry I just saw this. It seems that I introduced this regression. One of the goal of having a `ThreadWakeup` and not a `SimpleQueue` is to avoid using locks that can hinder the performance of the executor. I don't remember the exact details but I think I did some benchmark and it was giving lower performances for large set of small tasks (not so sure anymore). If a fully synchronous method is chosen, maybe it is safer to rely on a `SimpleQueue` as it will be lower maintenance. If the solution proposed by aeros is chosen, the event can probably be replaced by a lock no? It would be easier to understand I guess. From the failures, it seems to be a race condition between `shutdown` and `terminate_broken`. The race condition should not be possible in `submit` as in principle, the `join_executor_internals` is only called when no new task can be submitted. One solution would be to use the `self._shutdown_lock` from the executor to protect the call to `close` in `terminate_broken` and the call to `self._thread_wakeup.wakeup` in `shutdown`. That way, the lock is only acquired at critical points without being used all the time. This could also be done by adding `lock=True/False` to only lock the potentially dangerous calls. |
|||
msg367508 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2020-04-28 08:08 | |
How about the following (untested): diff --git a/Lib/concurrent/futures/process.py b/Lib/concurrent/futures/process.py index 8e9b69a8f0..c0c2eb3032 100644 --- a/Lib/concurrent/futures/process.py +++ b/Lib/concurrent/futures/process.py @@ -66,23 +66,29 @@ _global_shutdown = False class _ThreadWakeup: def __init__(self): - self._closed = False self._reader, self._writer = mp.Pipe(duplex=False) def close(self): - if not self._closed: - self._closed = True - self._writer.close() - self._reader.close() + r, w = self._reader, self._writer + self._reader = self._writer = None + if r is not None: + r.close() + w.close() def wakeup(self): - if not self._closed: + try: self._writer.send_bytes(b"") + except AttributeError: + # Closed + pass def clear(self): - if not self._closed: + try: while self._reader.poll(): self._reader.recv_bytes() + except AttributeError: + # Closed + pass def _python_exit(): |
|||
msg367512 - (view) | Author: Kyle Stanley (aeros) * | Date: 2020-04-28 09:10 | |
Oops, it seems that I opened PR-19751 a bit preemptively. When I get the chance, I'll see if Antoine's implementation can address the failures and do some comparisons. |
|||
msg367513 - (view) | Author: Kyle Stanley (aeros) * | Date: 2020-04-28 09:21 | |
I decided to close PR-19751. Both because it does not correctly address the race condition (due to an oversight on my part) and it would add substantial overhead to _ThreadWakeup. Instead, I agree that we should explore a non-locking solution. |
|||
msg367539 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-04-28 16:06 | |
Antoine Pitrou: "How about the following (untested): (...)" Using Antoine's patch, test_killed_child() still fails (I used my msg367463 patch to make the failure more likely). |
|||
msg367540 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2020-04-28 16:09 | |
With the same traceback and error message? |
|||
msg367542 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-04-28 16:38 | |
With my msg367463 patch (add sleep), test_cancel_futures() fails. Example: ====================================================================== FAIL: test_cancel_futures (test.test_concurrent_futures.ProcessPoolForkProcessPoolShutdownTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/vstinner/python/master/Lib/test/test_concurrent_futures.py", line 353, in test_cancel_futures self.assertTrue(len(cancelled) >= 35, msg=f"{len(cancelled)=}") AssertionError: False is not true : len(cancelled)=0 |
|||
msg367543 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-04-28 16:43 | |
Thomas Moreau: "One solution would be to use the `self._shutdown_lock` from the executor to protect the call to `close` in `terminate_broken` and the call to `self._thread_wakeup.wakeup` in `shutdown`. That way, the lock is only acquired at critical points without being used all the time. This could also be done by adding `lock=True/False` to only lock the potentially dangerous calls." I wrote a conservative PR 19760 which always lock ProcessPoolExecutor._shutdown_lock while accessing _ThreadWakeup. PR 19760 fix test_killed_child(): it doesn't fail anymore, even with my msg367463 patch (add sleep). |
|||
msg367544 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-04-28 17:00 | |
> With my msg367463 patch (add sleep), test_cancel_futures() fails. The test uses sleep() as a synchronization primitive: executor.submit(time.sleep, .1). That's bad, but it doesn't *have to* be fixed now. My msg367463 patch adds an artifical sleep: the test looks fine in practice. I prefer to wait until it fails on a buildbot worker before spending time to make the test more reliable. |
|||
msg367605 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-04-29 01:32 | |
New changeset a4dfe8ede5a37576e17035dccfe109ba7752237e by Victor Stinner in branch 'master': bpo-39995: Fix concurrent.futures _ThreadWakeup (GH-19760) https://github.com/python/cpython/commit/a4dfe8ede5a37576e17035dccfe109ba7752237e |
|||
msg367608 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-04-29 01:34 | |
I'm still getting more and more buildbot emails about test_concurrent_futures, so I merged my PR 19760 to fix buildbots. Please revert or modify my PR 19760 if you have a better approach, but please check that test_killed_child() and ProcessPoolForkExecutorDeadlockTest tests don't fail with my msg367463 patch. I would still appreciated a post-commit review of my change, since I don't know well concurrent.futures code : bpo-39995: Fix concurrent.futures _ThreadWakeup (GH-19760) https://github.com/python/cpython/commit/a4dfe8ede5a37576e17035dccfe109ba7752237e |
|||
msg367616 - (view) | Author: Kyle Stanley (aeros) * | Date: 2020-04-29 03:00 | |
I am a bit concerned about the performance implication of accessing the thread wakeup through a lock in the call queue, but for now I think it's reasonable to address the buildbot failure with a locking solution while we try to find a better one. I'm not certain if we'll be able to find one that correctly addresses the failures with zero additional locking, but I think we may be able to reduce the number of times we use the lock compared to the implementation in GH-19760. I'll spend some time exploring that as I find the time to, and report back with any significant findings. |
|||
msg367617 - (view) | Author: Kyle Stanley (aeros) * | Date: 2020-04-29 03:05 | |
Also, thanks to Victor for coming up with an interim solution, and for the feedback from Antoine and Thomas. GH-19760 is a significant improvement from my initial proposal in GH-19751. |
|||
msg367631 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2020-04-29 08:02 | |
I looked at the change and it seemed ok to me. Perhaps Thomas can give it a look too. |
|||
msg367632 - (view) | Author: Thomas Moreau (tomMoral) * | Date: 2020-04-29 08:20 | |
I think this is a reasonable way to move on.Some of the locks can probably be removed but this needs careful investigation and in the mean time, it hinders everyone. Thanks victor for the fast fix up! To me, an interesting observation is that the failure seems to only happen when the executor is in broken state. If we can find a way to adapt the behavior to be more conservative on these states (which are not impacting perf) that would be nice. I will try to look at it today and see if I can remove some of the locks while still not failing with Victor's patch. We can probably remove the lock on `clear` safely. For the others, it might be more complex. |
|||
msg367635 - (view) | Author: Thomas Moreau (tomMoral) * | Date: 2020-04-29 09:27 | |
I did GH 19788 with a few modifications. There is only one lock that seems to mater for the perf, and I actually added one other (the one in _python_exit, which necessitate another bug fix for fork context). I did not benchmark to see if it was worth it in term of perf. |
|||
msg394494 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-05-26 22:49 | |
"test_concurrent_futures: ProcessPoolSpawnExecutorDeadlockTest.test_crash() fails with OSError: [Errno 9] Bad file descriptor" I didn't see this failure recently, I close the issue. Since changes were pushed, I mark the issue as fixed. If someone has ideas to enhance the code, I suggest to open a new more specific issue. I consider the initial issue (buildot failure) as fixed. |
|||
msg394707 - (view) | Author: Kyle Stanley (aeros) * | Date: 2021-05-29 02:25 | |
Thanks for closing up the issue, Victor :) |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:28 | admin | set | github: 84176 |
2021-05-29 02:25:33 | aeros | set | messages: + msg394707 |
2021-05-26 22:49:56 | vstinner | set | status: open -> closed resolution: fixed messages: + msg394494 stage: patch review -> resolved |
2020-04-29 09:27:53 | tomMoral | set | messages: + msg367635 |
2020-04-29 09:23:39 | tomMoral | set | pull_requests: + pull_request19111 |
2020-04-29 08:20:15 | tomMoral | set | messages: + msg367632 |
2020-04-29 08:02:04 | pitrou | set | assignee: aeros -> vstinner messages: + msg367631 |
2020-04-29 03:05:55 | aeros | set | messages: + msg367617 |
2020-04-29 03:00:37 | aeros | set | messages: + msg367616 |
2020-04-29 02:39:57 | corona10 | set | nosy:
+ corona10 |
2020-04-29 01:34:52 | vstinner | set | messages: + msg367608 |
2020-04-29 01:32:14 | vstinner | set | messages: + msg367605 |
2020-04-28 17:00:11 | vstinner | set | messages: + msg367544 |
2020-04-28 16:43:01 | vstinner | set | messages: + msg367543 |
2020-04-28 16:38:42 | vstinner | set | messages: + msg367542 |
2020-04-28 16:37:48 | vstinner | set | pull_requests: + pull_request19082 |
2020-04-28 16:09:22 | pitrou | set | messages: + msg367540 |
2020-04-28 16:06:39 | vstinner | set | messages: + msg367539 |
2020-04-28 15:58:40 | vstinner | set | pull_requests: + pull_request19080 |
2020-04-28 09:21:17 | aeros | set | messages: + msg367513 |
2020-04-28 09:10:32 | aeros | set | messages: + msg367512 |
2020-04-28 08:44:37 | aeros | set | pull_requests: + pull_request19072 |
2020-04-28 08:08:36 | pitrou | set | messages: + msg367508 |
2020-04-28 07:05:56 | tomMoral | set | nosy:
+ tomMoral messages: + msg367503 |
2020-04-28 03:16:43 | aeros | set | messages: + msg367487 |
2020-04-28 03:09:33 | aeros | set | assignee: aeros messages: + msg367486 |
2020-04-28 01:08:28 | aeros | set | nosy:
+ pitrou |
2020-04-28 00:52:59 | aeros | set | nosy:
+ aeros |
2020-04-27 22:05:35 | vstinner | set | messages: + msg367465 |
2020-04-27 21:48:28 | vstinner | set | messages: + msg367463 |
2020-04-27 21:46:19 | vstinner | set | messages: + msg367462 |
2020-04-27 21:26:50 | vstinner | set | messages: + msg367460 |
2020-04-27 21:15:57 | vstinner | set | messages: + msg367458 |
2020-04-27 20:34:21 | vstinner | set | messages: + msg367455 |
2020-04-27 20:09:04 | vstinner | set | messages: + msg367453 |
2020-04-27 19:54:18 | vstinner | set | messages: + msg367451 |
2020-04-27 19:46:17 | vstinner | set | messages: + msg367450 |
2020-04-27 19:36:58 | vstinner | set | messages: + msg367449 |
2020-04-27 19:06:22 | vstinner | set | keywords:
+ patch stage: patch review pull_requests: + pull_request19061 |
2020-04-27 16:02:06 | vstinner | set | messages: + msg367434 |
2020-03-18 18:31:26 | vstinner | set | messages: + msg364552 |
2020-03-18 18:24:03 | vstinner | set | messages: + msg364546 |
2020-03-17 17:02:14 | vstinner | set | title: test_concurrent_futures: ProcessPoolSpawnExecutorDeadlockTest.test_crash() fails with -> test_concurrent_futures: ProcessPoolSpawnExecutorDeadlockTest.test_crash() fails with OSError: [Errno 9] Bad file descriptor |
2020-03-17 17:01:54 | vstinner | create |