classification
Title: Deadlock in test_concurrent_futures
Type: behavior Stage: resolved
Components: Tests Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rosslagerwall Nosy List: haypo, neologix, pitrou, python-dev, rosslagerwall
Priority: normal Keywords: patch

Created on 2011-06-19 17:25 by haypo, last changed 2012-01-08 09:35 by rosslagerwall. This issue is now closed.

Files
File name Uploaded Description Edit
issue.patch rosslagerwall, 2012-01-03 06:12 patch review
itest.py rosslagerwall, 2012-01-03 06:12 test program
Messages (9)
msg138648 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-06-19 17:25
[271/356/1] test_concurrent_futures
Traceback (most recent call last):
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/queues.py", line 268, in _feed
    send(obj)
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/connection.py", line 229, in send
    self._send_bytes(memoryview(buf))
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/connection.py", line 423, in _send_bytes
    self._send(struct.pack("=i", len(buf)))
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/connection.py", line 392, in _send
    n = write(self._handle, buf)
OSError: [Errno 32] Broken pipe
Timeout (1:00:00)!
Thread 0x00000954:
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 237 in wait
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/queues.py", line 252 in _feed
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 690 in run
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 737 in _bootstrap_inner
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 710 in _bootstrap

Thread 0x00000953:
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/forking.py", line 146 in poll
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/forking.py", line 166 in wait
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/process.py", line 150 in join
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/concurrent/futures/process.py", line 208 in shutdown_worker
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/concurrent/futures/process.py", line 264 in _queue_management_worker
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 690 in run
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 737 in _bootstrap_inner
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 710 in _bootstrap

Thread 0x00000001:
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 237 in wait
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 851 in join
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/concurrent/futures/process.py", line 395 in shutdown
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test_concurrent_futures.py", line 67 in tearDown
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 407 in _executeTestPart
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 463 in run
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 514 in __call__
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1166 in run
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1254 in _run_suite
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1280 in run_unittest
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test_concurrent_futures.py", line 628 in test_main
  File "./Lib/test/regrtest.py", line 1043 in runtest_inner
  File "./Lib/test/regrtest.py", line 841 in runtest
  File "./Lib/test/regrtest.py", line 668 in main
  File "./Lib/test/regrtest.py", line 1618 in <module>
*** Error code 1
make: Fatal error: Command failed for target `buildbottest'
program finished with exit code 1

See commit e6e7e42efdc2 of the issue #12310.
msg138765 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-06-20 23:27
Message on a stackoverflow thread:

"I have suffered from the same problem, even if connecting on localhost in python 2.7.1. After a day of debugging i found the cause and a workaround:

Cause: BaseProxy class has thread local storage which caches the connection, which is reused for future connections causing "broken pipe" errors even on creating a new Manager

Workaround: Delete the cached connection before reconnecting

if address in BaseProxy._address_to_local:
    del BaseProxy._address_to_local[self.address][0].connection"

http://stackoverflow.com/questions/3649458/broken-pipe-when-using-python-multiprocessing-managers-basemanager-syncmanager/5884967#5884967

---

See also maybe the (closed) issue #11663: multiprocessing doesn't detect killed processes
msg138766 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-06-20 23:37
Connection._send_bytes() has a comment about broken pipes:

    def _send_bytes(self, buf):
        # For wire compatibility with 3.2 and lower
        n = len(buf)
        self._send(struct.pack("=i", len(buf)))
        # The condition is necessary to avoid "broken pipe" errors
        # when sending a 0-length buffer if the other end closed the pipe.
        if n > 0:
            self._send(buf)

But the OSError(32, "Broken pipe") occurs on sending the buffer size (a chunk of 4 bytes: self._send(struct.pack("=i", len(buf)))), not on sending the buffer content.

See also maybe the (closed) issue #9205: Parent process hanging in multiprocessing if children terminate unexpectedly
msg138767 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-06-21 00:42
Ah, submit a new task after the manager shutdown fails with OSError(32, 'Broken pipe'). Example:
---------------
from multiprocessing.managers import BaseManager

class MathsClass(object):
    def foo(self):
        return 42

class MyManager(BaseManager):
    pass

MyManager.register('Maths', MathsClass)

if __name__ == '__main__':
    manager = MyManager()
    manager.start()
    maths = manager.Maths()
    maths.foo()
    manager.shutdown()
    try:
        maths.foo()
    finally:
        manager.shutdown()
---------------
This example doesn't hang, but this issue is about concurrent.futures, not multiprocessing.
msg138768 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-06-21 01:05
Oh, I think that I found a deadlock (or something like that):
----------------------------
import concurrent.futures
import faulthandler
import os
import signal
import time

def work(n):
    time.sleep(0.1)

def main():
    faulthandler.register(signal.SIGUSR1)
    print("pid: %s" % os.getpid())
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for number, prime in executor.map(work, range(100)):
            print("shutdown")
            executor.shutdown()
            print("shutdown--")

if __name__ == '__main__':
    main()
----------------------------

Trace:
----------------------------
Thread 0x00007fbfc83bd700:
  File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 237 in wait
  File "/home/haypo/prog/HG/cpython/Lib/multiprocessing/queues.py", line 252 in _feed
  File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 690 in run
  File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 737 in _bootstrap_inner
  File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 710 in _bootstrap

Thread 0x00007fbfc8bbe700:
  File "/home/haypo/prog/HG/cpython/Lib/multiprocessing/queues.py", line 101 in put
  File "/home/haypo/prog/HG/cpython/Lib/concurrent/futures/process.py", line 268 in _queue_management_worker
  File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 690 in run
  File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 737 in _bootstrap_inner
  File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 710 in _bootstrap

Current thread 0x00007fbfcc2e3700:
  File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 237 in wait
  File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 851 in join
  File "/home/haypo/prog/HG/cpython/Lib/concurrent/futures/process.py", line 395 in shutdown
  File "/home/haypo/prog/HG/cpython/Lib/concurrent/futures/_base.py", line 570 in __exit__
  File "y.py", line 17 in main
  File "y.py", line 20 in <module>
----------------------------
There are two child processes, but both are zombies (displayed as "<defunct>" by ps). Send SIGUSR1 signal to the frozen process to display the traceback (thanks to faulthandler).
msg150497 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2012-01-03 06:12
Retrieving the result of a future after the executor has been shut down can cause a hang.

It seems like this regression was introduced in a76257a99636. This regression exists only for ProcessPoolExecutor.

The problem is that even if there are pending work items, the processes are still signaled to shut down leaving the pending work items permanently unfinished. The patch simply removes the call to shut down the processes when there are pending work items.

Attached is a patch.
msg150498 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-01-03 14:33
Well I was sure I had added this code for a reason, but the tests seem to run without...
Just a comment: the test isn't ProcessPoolExecutor-specific, so it should really be in the generic tests.
msg150851 - (view) Author: Roundup Robot (python-dev) Date: 2012-01-08 06:43
New changeset 26389e9efa9c by Ross Lagerwall in branch '3.2':
Issue #12364: Fix a hang in concurrent.futures.ProcessPoolExecutor.
http://hg.python.org/cpython/rev/26389e9efa9c

New changeset 25f879011102 by Ross Lagerwall in branch 'default':
Merge with 3.2 for #12364.
http://hg.python.org/cpython/rev/25f879011102
msg150853 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2012-01-08 09:35
Thanks!
History
Date User Action Args
2012-01-08 09:35:49rosslagerwallsetstatus: open -> closed
type: behavior
messages: + msg150853

assignee: rosslagerwall
resolution: fixed
stage: resolved
2012-01-08 06:43:54python-devsetnosy: + python-dev
messages: + msg150851
2012-01-03 14:33:51pitrousetmessages: + msg150498
2012-01-03 06:12:54rosslagerwallsetfiles: + itest.py
2012-01-03 06:12:19rosslagerwallsetfiles: + issue.patch

nosy: + rosslagerwall
messages: + msg150497

keywords: + patch
2011-07-05 12:22:26hayposettitle: Timeout (1 hour) in test_concurrent_futures.tearDown() on sparc solaris10 gcc 3.x -> Deadlock in test_concurrent_futures
2011-06-21 01:05:34hayposetmessages: + msg138768
2011-06-21 00:42:34hayposetmessages: + msg138767
2011-06-20 23:37:03hayposetmessages: + msg138766
2011-06-20 23:27:39hayposetmessages: + msg138765
2011-06-19 17:25:15haypocreate