Issue12310
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011-06-10 09:11 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Messages (13) | |||
---|---|---|---|
msg138060 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-06-10 09:11 | |
test_multiprocessing segfaults in a loop. The crash occurs in _Condition.release() on waiter.release(), called from Queue._finalize_close(). Possible related changes: - a5c8b6ebe895: new sigwait() test using thread (issue #8407) - 6d6099f7fe89: add sentinels to multiprocessing (issue #9205) Example of a crash: [333/356/1] test_multiprocessing Fatal Python error: Segmentation fault Thread 0x01cde800: File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/threading.py", line 237 in wait File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/queues.py", line 252 in _feed File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/threading.py", line 690 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/threading.py", line 737 in _bootstrap_inner File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/threading.py", line 710 in _bootstrap Current thread 0xa000d000: File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/threading.py", line 298 in notify File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/queues.py", line 227 in _finalize_close File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/util.py", line 202 in __call__ File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/process.py", line 270 in _bootstrap File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/forking.py", line 133 in __init__ File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/process.py", line 134 in start File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/pool.py", line 220 in _repopulate_pool File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/pool.py", line 157 in __init__ File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/__init__.py", line 231 in Pool File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_multiprocessing.py", line 2186 in test_main File "./Lib/test/regrtest.py", line 1043 in runtest_inner File "./Lib/test/regrtest.py", line 841 in runtest File "./Lib/test/regrtest.py", line 668 in main File "./Lib/test/regrtest.py", line 1618 in <module> There are approximatively 698 crashes in last tests on "x86 Tiger 3.x"! http://www.python.org/dev/buildbot/all/builders/x86%20Tiger%203.x/builds/2722/steps/test/logs/stdio Most occured on Queue._finalize_close() -> _Condition.release() -> waiter.release(). |
|||
msg138064 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-06-10 09:26 | |
> a5c8b6ebe895: new sigwait() test using thread (issue #8407) > 6d6099f7fe89: add sentinels to multiprocessing (issue #9205) The first segfaults occured in build #2719 (cedceeb45030): http://www.python.org/dev/buildbot/all/builders/x86%20Tiger%203.x/builds/2719 In the timeline, 6d6099f7fe89 < cedceeb45030 < a5c8b6ebe895, so the change is more likely coming from 6d6099f7fe89. |
|||
msg138067 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-06-10 09:40 | |
Victor, how can there be hundreds of crashes? Isn't the process supposed to terminate when a crash occurs? There are several crashes in test_signal, so it's not only test_multiprocessing: Thread 0xa000d000: File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 172 in test_main File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 407 in _executeTestPart File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 462 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 514 in __call__ File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__ File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__ File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1166 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1254 in _run_suite File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1280 in run_unittest File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 687 in test_main File "./Lib/test/regrtest.py", line 1043 in runtest_inner File "./Lib/test/regrtest.py", line 841 in runtest File "./Lib/test/regrtest.py", line 668 in main File "./Lib/test/regrtest.py", line 1618 in <module> Fatal Python error: Segmentation fault Thread 0xa000d000: File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 248 in test_wakeup_fd_early File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 407 in _executeTestPart File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 462 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 514 in __call__ File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__ File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__ File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1166 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1254 in _run_suite File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1280 in run_unittest File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 687 in test_main File "./Lib/test/regrtest.py", line 1043 in runtest_inner File "./Lib/test/regrtest.py", line 841 in runtest File "./Lib/test/regrtest.py", line 668 in main File "./Lib/test/regrtest.py", line 1618 in <module> Fatal Python error: Segmentation fault Thread 0xa000d000: File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 372 in readpipe_interrupted File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 394 in test_siginterrupt_on File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 407 in _executeTestPart File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 462 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 514 in __call__ File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__ File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__ File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1166 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1254 in _run_suite File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1280 in run_unittest File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 687 in test_main File "./Lib/test/regrtest.py", line 1043 in runtest_inner File "./Lib/test/regrtest.py", line 841 in runtest File "./Lib/test/regrtest.py", line 668 in main File "./Lib/test/regrtest.py", line 1618 in <module> Fatal Python error: Segmentation fault Thread 0xa000d000: File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 495 in test_itimer_prof File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 407 in _executeTestPart File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 462 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 514 in __call__ File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__ File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__ File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1166 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1254 in _run_suite File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1280 in run_unittest File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 687 in test_main File "./Lib/test/regrtest.py", line 1043 in runtest_inner File "./Lib/test/regrtest.py", line 841 in runtest File "./Lib/test/regrtest.py", line 668 in main File "./Lib/test/regrtest.py", line 1618 in <module> Fatal Python error: Segmentation fault Thread 0xa000d000: File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 472 in test_itimer_virtual File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 407 in _executeTestPart File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 462 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 514 in __call__ File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__ File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__ File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1166 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1254 in _run_suite File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1280 in run_unittest File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 687 in test_main File "./Lib/test/regrtest.py", line 1043 in runtest_inner File "./Lib/test/regrtest.py", line 841 in runtest File "./Lib/test/regrtest.py", line 668 in main File "./Lib/test/regrtest.py", line 1618 in <module> Fatal Python error: Segmentation fault |
|||
msg138069 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-06-10 09:42 | |
> Victor, how can there be hundreds of crashes? > Isn't the process supposed to terminate when a crash occurs? Yes, a process does terminate on SIGSEGV, but multiprocessing creates subprocesses: I suppose that crashes occur in child processes. For test_signal, I have to investigate this. |
|||
msg138074 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-06-10 11:02 | |
Le vendredi 10 juin 2011 à 09:40 +0000, Antoine Pitrou a écrit : > There are several crashes in test_signal Commit a17710e27ea2 should fix some (all?) test_signal crashes. |
|||
msg138132 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-06-10 21:10 | |
It looks like the sentinel doesn't handle fatal death of the child process: test test_multiprocessing crashed -- Traceback (most recent call last): File "./Lib/test/regrtest.py", line 1043, in runtest_inner File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_multiprocessing.py", line 2189, in test_main ManagerMixin.manager.start() File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/managers.py", line 531, in start self._address = reader.recv() File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/connection.py", line 273, in recv buf = self._recv_bytes(sentinels=sentinels) File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/connection.py", line 430, in _recv_bytes buf = self._recv(4, sentinels) File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/connection.py", line 413, in _recv raise EOFError EOFError |
|||
msg138154 - (view) | Author: Charles-François Natali (neologix) * | Date: 2011-06-11 09:59 | |
I think it might be related to Issue #6721. Using a mutex/condition variable after fork (from the child process) is unsafe: it can lead to deadlocks, and on OS-X, it seems like it can lead to segfaults. Normally, Queue's synchronization primitives are reinitialized after fork, see Queue._after_fork() method. But here, what happens is the following (well, that's an hypothesis): Lib/multiprocessing/process.py", line 270 in _bootstrap : _current_process = self util._finalizer_registry.clear() util._run_after_forkers() - the old _current_process' reference count drops to zero - it's deallocated, and since it holds the last reference to a Queue, the queue is finalized - as a consequence, the Queue._finalize_close() callback (registered through a Finalize object) is called - Queue._finalize_close() tries to acquire, signal and release the _notempty Condition, but the Condition hasn't been reinitialized yet, because util._run_after_forkers() is called 2 lines below - segfault It's probably been triggered by Antoine's patches, but I'm pretty sure this bug has always been there. I think that moving util._run_after_forkers() up 2 lines should fix the segfaults, but with that change test_number_of_objects fails (I didn't investigate why). |
|||
msg138163 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-06-11 16:12 | |
Less disruptive approach: old_process = _current_process _current_process = self try: util._finalizer_registry.clear() util._run_after_forkers() finally: del old_process This will delay finalization of the old process object until after _run_after_forkers() is executed, without (hopefully) messing with semantics. |
|||
msg138168 - (view) | Author: Charles-François Natali (neologix) * | Date: 2011-06-11 16:44 | |
Yes, I also tried this. This fixed the test_multiprocessing failure, but I think it triggered a failure in test_concurrent_futures (didin't look in more detail). Would it be possible to try this on the buildbot to see if it fixes the segfaults? |
|||
msg138190 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-06-11 23:34 | |
> Would it be possible to try this on the buildbot to see if it fixes > the segfaults? You can fork cpython, modify the code, and run a custom buildbot on your clone. |
|||
msg138494 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-06-17 10:38 | |
New changeset e6e7e42efdc2 by Victor Stinner in branch '3.2': Issue #12310: finalize the old process after _run_after_forkers() http://hg.python.org/cpython/rev/e6e7e42efdc2 New changeset a73e5c1f57d7 by Victor Stinner in branch 'default': (Merge 3.2) Issue #12310: finalize the old process after _run_after_forkers() http://hg.python.org/cpython/rev/a73e5c1f57d7 |
|||
msg138495 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-06-17 10:39 | |
Let's try on "real" buildbots. If the commit fixes the issue on 3.x, I will port the fix to Python 2.7. |
|||
msg138511 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-06-17 14:10 | |
test_multiprocessing pass with success on PPC Tiger 3.x (and x86 Tiger 3.x, but the segfaults only occurred on PPC), but this issue is a sporadic issue. I close the issue because I hope that it is closed, but reopen it if you still see segfaults on PPC Tiger 3.x. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:18 | admin | set | github: 56519 |
2011-06-17 14:10:27 | vstinner | set | status: open -> closed resolution: fixed messages: + msg138511 |
2011-06-17 10:39:25 | vstinner | set | messages: + msg138495 |
2011-06-17 10:38:34 | python-dev | set | nosy:
+ python-dev messages: + msg138494 |
2011-06-11 23:34:21 | vstinner | set | messages: + msg138190 |
2011-06-11 16:44:45 | neologix | set | messages: + msg138168 |
2011-06-11 16:12:38 | pitrou | set | messages: + msg138163 |
2011-06-11 09:59:03 | neologix | set | nosy:
+ neologix messages: + msg138154 |
2011-06-10 21:10:43 | vstinner | set | messages: + msg138132 |
2011-06-10 11:02:09 | vstinner | set | messages: + msg138074 |
2011-06-10 09:42:49 | vstinner | set | messages: + msg138069 |
2011-06-10 09:40:29 | pitrou | set | messages: + msg138067 |
2011-06-10 09:26:48 | vstinner | set | messages: + msg138064 |
2011-06-10 09:11:53 | vstinner | create |