Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in test_multiprocessing #56519

Closed
vstinner opened this issue Jun 10, 2011 · 13 comments
Closed

Segfault in test_multiprocessing #56519

vstinner opened this issue Jun 10, 2011 · 13 comments

Comments

@vstinner
Copy link
Member

BPO 12310
Nosy @pitrou, @vstinner

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2011-06-17.14:10:27.128>
created_at = <Date 2011-06-10.09:11:53.378>
labels = []
title = 'Segfault in test_multiprocessing'
updated_at = <Date 2011-06-17.14:10:27.127>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2011-06-17.14:10:27.127>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2011-06-17.14:10:27.128>
closer = 'vstinner'
components = []
creation = <Date 2011-06-10.09:11:53.378>
creator = 'vstinner'
dependencies = []
files = []
hgrepos = []
issue_num = 12310
keywords = []
message_count = 13.0
messages = ['138060', '138064', '138067', '138069', '138074', '138132', '138154', '138163', '138168', '138190', '138494', '138495', '138511']
nosy_count = 4.0
nosy_names = ['pitrou', 'vstinner', 'neologix', 'python-dev']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue12310'
versions = []

@vstinner
Copy link
Member Author

test_multiprocessing segfaults in a loop. The crash occurs in _Condition.release() on waiter.release(), called from Queue._finalize_close(). Possible related changes:

  • a5c8b6ebe895: new sigwait() test using thread (issue bpo-8407)
  • 6d6099f7fe89: add sentinels to multiprocessing (issue bpo-9205)

Example of a crash:

[333/356/1] test_multiprocessing
Fatal Python error: Segmentation fault

Thread 0x01cde800:
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/threading.py", line 237 in wait
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/queues.py", line 252 in _feed
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/threading.py", line 690 in run
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/threading.py", line 737 in _bootstrap_inner
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/threading.py", line 710 in _bootstrap

Current thread 0xa000d000:
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/threading.py", line 298 in notify
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/queues.py", line 227 in _finalize_close
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/util.py", line 202 in __call__
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/process.py", line 270 in _bootstrap
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/forking.py", line 133 in __init__
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/process.py", line 134 in start
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/pool.py", line 220 in _repopulate_pool
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/pool.py", line 157 in __init__
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/init.py", line 231 in Pool
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_multiprocessing.py", line 2186 in test_main
File "./Lib/test/regrtest.py", line 1043 in runtest_inner
File "./Lib/test/regrtest.py", line 841 in runtest
File "./Lib/test/regrtest.py", line 668 in main
File "./Lib/test/regrtest.py", line 1618 in <module>

There are approximatively 698 crashes in last tests on "x86 Tiger 3.x"!
http://www.python.org/dev/buildbot/all/builders/x86%20Tiger%203.x/builds/2722/steps/test/logs/stdio

Most occured on Queue._finalize_close() -> _Condition.release() -> waiter.release().

@vstinner
Copy link
Member Author

a5c8b6ebe895: new sigwait() test using thread (issue bpo-8407)
6d6099f7fe89: add sentinels to multiprocessing (issue bpo-9205)

The first segfaults occured in build bpo-2719 (cedceeb45030):
http://www.python.org/dev/buildbot/all/builders/x86%20Tiger%203.x/builds/2719

In the timeline, 6d6099f7fe89 < cedceeb45030 < a5c8b6ebe895, so the change is more likely coming from 6d6099f7fe89.

@pitrou
Copy link
Member

pitrou commented Jun 10, 2011

Victor, how can there be hundreds of crashes? Isn't the process supposed to terminate when a crash occurs?

There are several crashes in test_signal, so it's not only test_multiprocessing:

Thread 0xa000d000:
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 172 in test_main
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 407 in _executeTestPart
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 462 in run
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 514 in __call__
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1166 in run
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1254 in _run_suite
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1280 in run_unittest
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 687 in test_main
File "./Lib/test/regrtest.py", line 1043 in runtest_inner
File "./Lib/test/regrtest.py", line 841 in runtest
File "./Lib/test/regrtest.py", line 668 in main
File "./Lib/test/regrtest.py", line 1618 in <module>
Fatal Python error: Segmentation fault

Thread 0xa000d000:
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 248 in test_wakeup_fd_early
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 407 in _executeTestPart
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 462 in run
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 514 in __call__
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1166 in run
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1254 in _run_suite
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1280 in run_unittest
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 687 in test_main
File "./Lib/test/regrtest.py", line 1043 in runtest_inner
File "./Lib/test/regrtest.py", line 841 in runtest
File "./Lib/test/regrtest.py", line 668 in main
File "./Lib/test/regrtest.py", line 1618 in <module>
Fatal Python error: Segmentation fault

Thread 0xa000d000:
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 372 in readpipe_interrupted
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 394 in test_siginterrupt_on
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 407 in _executeTestPart
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 462 in run
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 514 in __call__
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1166 in run
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1254 in _run_suite
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1280 in run_unittest
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 687 in test_main
File "./Lib/test/regrtest.py", line 1043 in runtest_inner
File "./Lib/test/regrtest.py", line 841 in runtest
File "./Lib/test/regrtest.py", line 668 in main
File "./Lib/test/regrtest.py", line 1618 in <module>
Fatal Python error: Segmentation fault

Thread 0xa000d000:
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 495 in test_itimer_prof
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 407 in _executeTestPart
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 462 in run
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 514 in __call__
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1166 in run
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1254 in _run_suite
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1280 in run_unittest
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 687 in test_main
File "./Lib/test/regrtest.py", line 1043 in runtest_inner
File "./Lib/test/regrtest.py", line 841 in runtest
File "./Lib/test/regrtest.py", line 668 in main
File "./Lib/test/regrtest.py", line 1618 in <module>
Fatal Python error: Segmentation fault

Thread 0xa000d000:
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 472 in test_itimer_virtual
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 407 in _executeTestPart
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 462 in run
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 514 in __call__
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1166 in run
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1254 in _run_suite
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1280 in run_unittest
File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 687 in test_main
File "./Lib/test/regrtest.py", line 1043 in runtest_inner
File "./Lib/test/regrtest.py", line 841 in runtest
File "./Lib/test/regrtest.py", line 668 in main
File "./Lib/test/regrtest.py", line 1618 in <module>
Fatal Python error: Segmentation fault

@vstinner
Copy link
Member Author

Victor, how can there be hundreds of crashes?
Isn't the process supposed to terminate when a crash occurs?

Yes, a process does terminate on SIGSEGV, but multiprocessing creates subprocesses: I suppose that crashes occur in child processes.

For test_signal, I have to investigate this.

@vstinner
Copy link
Member Author

Le vendredi 10 juin 2011 à 09:40 +0000, Antoine Pitrou a écrit :

There are several crashes in test_signal

Commit a17710e27ea2 should fix some (all?) test_signal crashes.

@vstinner
Copy link
Member Author

It looks like the sentinel doesn't handle fatal death of the child process:

test test_multiprocessing crashed -- Traceback (most recent call last):
  File "./Lib/test/regrtest.py", line 1043, in runtest_inner
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_multiprocessing.py", line 2189, in test_main
    ManagerMixin.manager.start()
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/managers.py", line 531, in start
    self._address = reader.recv()
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/connection.py", line 273, in recv
    buf = self._recv_bytes(sentinels=sentinels)
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/connection.py", line 430, in _recv_bytes
    buf = self._recv(4, sentinels)
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/connection.py", line 413, in _recv
    raise EOFError
EOFError

@neologix
Copy link
Mannequin

neologix mannequin commented Jun 11, 2011

I think it might be related to Issue bpo-6721.

Using a mutex/condition variable after fork (from the child process) is unsafe: it can lead to deadlocks, and on OS-X, it seems like it can lead to segfaults.

Normally, Queue's synchronization primitives are reinitialized after fork, see Queue._after_fork() method.

But here, what happens is the following (well, that's an hypothesis):

Lib/multiprocessing/process.py", line 270 in _bootstrap
:
_current_process = self
util._finalizer_registry.clear()
util._run_after_forkers()

  • the old _current_process' reference count drops to zero
  • it's deallocated, and since it holds the last reference to a Queue, the queue is finalized
  • as a consequence, the Queue._finalize_close() callback (registered through a Finalize object) is called
  • Queue._finalize_close() tries to acquire, signal and release the _notempty Condition, but the Condition hasn't been reinitialized yet, because util._run_after_forkers() is called 2 lines below
  • segfault

It's probably been triggered by Antoine's patches, but I'm pretty sure this bug has always been there.

I think that moving util._run_after_forkers() up 2 lines should fix the segfaults, but with that change test_number_of_objects fails (I didn't investigate why).

@pitrou
Copy link
Member

pitrou commented Jun 11, 2011

Less disruptive approach:

        old_process = _current_process
        _current_process = self
        try:
            util._finalizer_registry.clear()
            util._run_after_forkers()
        finally:
            del old_process

This will delay finalization of the old process object until after _run_after_forkers() is executed, without (hopefully) messing with semantics.

@neologix
Copy link
Mannequin

neologix mannequin commented Jun 11, 2011

Yes, I also tried this.
This fixed the test_multiprocessing failure, but I think it triggered a failure in test_concurrent_futures (didin't look in more detail).
Would it be possible to try this on the buildbot to see if it fixes the segfaults?

@vstinner
Copy link
Member Author

Would it be possible to try this on the buildbot to see if it fixes
the segfaults?

You can fork cpython, modify the code, and run a custom buildbot on your
clone.

@python-dev
Copy link
Mannequin

python-dev mannequin commented Jun 17, 2011

New changeset e6e7e42efdc2 by Victor Stinner in branch '3.2':
Issue bpo-12310: finalize the old process after _run_after_forkers()
http://hg.python.org/cpython/rev/e6e7e42efdc2

New changeset a73e5c1f57d7 by Victor Stinner in branch 'default':
(Merge 3.2) Issue bpo-12310: finalize the old process after _run_after_forkers()
http://hg.python.org/cpython/rev/a73e5c1f57d7

@vstinner
Copy link
Member Author

Let's try on "real" buildbots. If the commit fixes the issue on 3.x, I will port the fix to Python 2.7.

@vstinner
Copy link
Member Author

test_multiprocessing pass with success on PPC Tiger 3.x (and x86 Tiger 3.x, but the segfaults only occurred on PPC), but this issue is a sporadic issue. I close the issue because I hope that it is closed, but reopen it if you still see segfaults on PPC Tiger 3.x.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants