New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in test_multiprocessing #56519
Comments
test_multiprocessing segfaults in a loop. The crash occurs in _Condition.release() on waiter.release(), called from Queue._finalize_close(). Possible related changes:
Example of a crash: [333/356/1] test_multiprocessing Thread 0x01cde800: Current thread 0xa000d000: There are approximatively 698 crashes in last tests on "x86 Tiger 3.x"! Most occured on Queue._finalize_close() -> _Condition.release() -> waiter.release(). |
The first segfaults occured in build bpo-2719 (cedceeb45030): In the timeline, 6d6099f7fe89 < cedceeb45030 < a5c8b6ebe895, so the change is more likely coming from 6d6099f7fe89. |
Victor, how can there be hundreds of crashes? Isn't the process supposed to terminate when a crash occurs? There are several crashes in test_signal, so it's not only test_multiprocessing: Thread 0xa000d000: Thread 0xa000d000: Thread 0xa000d000: Thread 0xa000d000: Thread 0xa000d000: |
Yes, a process does terminate on SIGSEGV, but multiprocessing creates subprocesses: I suppose that crashes occur in child processes. For test_signal, I have to investigate this. |
Le vendredi 10 juin 2011 à 09:40 +0000, Antoine Pitrou a écrit :
Commit a17710e27ea2 should fix some (all?) test_signal crashes. |
It looks like the sentinel doesn't handle fatal death of the child process: test test_multiprocessing crashed -- Traceback (most recent call last):
File "./Lib/test/regrtest.py", line 1043, in runtest_inner
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_multiprocessing.py", line 2189, in test_main
ManagerMixin.manager.start()
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/managers.py", line 531, in start
self._address = reader.recv()
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/connection.py", line 273, in recv
buf = self._recv_bytes(sentinels=sentinels)
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/connection.py", line 430, in _recv_bytes
buf = self._recv(4, sentinels)
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/connection.py", line 413, in _recv
raise EOFError
EOFError |
I think it might be related to Issue bpo-6721. Using a mutex/condition variable after fork (from the child process) is unsafe: it can lead to deadlocks, and on OS-X, it seems like it can lead to segfaults. Normally, Queue's synchronization primitives are reinitialized after fork, see Queue._after_fork() method. But here, what happens is the following (well, that's an hypothesis): Lib/multiprocessing/process.py", line 270 in _bootstrap
It's probably been triggered by Antoine's patches, but I'm pretty sure this bug has always been there. I think that moving util._run_after_forkers() up 2 lines should fix the segfaults, but with that change test_number_of_objects fails (I didn't investigate why). |
Less disruptive approach: old_process = _current_process
_current_process = self
try:
util._finalizer_registry.clear()
util._run_after_forkers()
finally:
del old_process This will delay finalization of the old process object until after _run_after_forkers() is executed, without (hopefully) messing with semantics. |
Yes, I also tried this. |
You can fork cpython, modify the code, and run a custom buildbot on your |
New changeset e6e7e42efdc2 by Victor Stinner in branch '3.2': New changeset a73e5c1f57d7 by Victor Stinner in branch 'default': |
Let's try on "real" buildbots. If the commit fixes the issue on 3.x, I will port the fix to Python 2.7. |
test_multiprocessing pass with success on PPC Tiger 3.x (and x86 Tiger 3.x, but the segfaults only occurred on PPC), but this issue is a sporadic issue. I close the issue because I hope that it is closed, but reopen it if you still see segfaults on PPC Tiger 3.x. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: