classification
Title: socket.setdefaulttimeout affecting multiprocessing Manager
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4, Python 3.3, Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jnoller Nosy List: Jim.Jewett, asksol, jnoller, python-dev, ryles, sbt, underrun
Priority: normal Keywords: patch

Created on 2009-05-18 17:48 by ryles, last changed 2012-07-29 12:28 by sbt. This issue is now closed.

Files
File name Uploaded Description Edit
mpsock.patch underrun, 2011-06-16 06:06 make python2.7 mp sockets blocking review
mp6056fix.patch underrun, 2012-07-24 13:49 updated patch against 3.3b1(ish) review
Messages (12)
msg88040 - (view) Author: Ryan Leslie (ryles) Date: 2009-05-18 17:48
Terminal 1:
Python 2.6.1 (r261:67515, Apr  2 2009, 18:25:55)
[GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing.managers import SyncManager
>>> manager = SyncManager(authkey="mykey")
>>> manager.start()
>>> queue = manager.Queue()
>>> import pickle
>>> pickle.dump(queue, open("myqueue.pkl", "w"))
>>>

Terminal 2:
Python 2.6.1 (r261:67515, Apr  2 2009, 18:25:55)
[GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.setdefaulttimeout(30)
>>> import multiprocessing, pickle
>>> multiprocessing.current_process().authkey = "mykey"
>>> queue = pickle.load(open("myqueue.pkl"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "python2.6/pickle.py", line 1370, in load
    return Unpickler(file).load()
  File "python2.6/pickle.py", line 858, in load
    dispatch[key](self)
  File "python2.6/pickle.py", line 1133, in load_reduce
    value = func(*args)
  File "python2.6/multiprocessing/managers.py", line 845, in RebuildProxy
    return func(token, serializer, incref=incref, **kwds)
  File "python2.6/multiprocessing/managers.py", line 894, in AutoProxy
    incref=incref)
  File "python2.6/multiprocessing/managers.py", line 700, in __init__
    self._incref()
  File "python2.6/multiprocessing/managers.py", line 749, in _incref
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "python2.6/multiprocessing/connection.py", line 140, in Client
    answer_challenge(c, authkey)
  File "python2.6/multiprocessing/connection.py", line 376, in
answer_challenge
    response = connection.recv_bytes(256)        # reject large message
IOError: [Errno 11] Resource temporarily unavailable
>>> 

This works as expected without socket.setdefaulttimeout(). However, the
timeout is useful since if the listening process on terminal 1 goes to
sleep, e.g. ^Z, it would avoid blocking.

I suspect the cause is similar to http://bugs.python.org/issue976613
msg88045 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009-05-18 19:18
Pickling the queue and then unpickling it in a new process is something I 
never thought of. That's interesting in and of itself ;)
msg88064 - (view) Author: Ryan Leslie (ryles) Date: 2009-05-19 05:19
Yeah, storing pickled queues in the file system makes for some easy IPC
:) It wasn't a very original idea, I took the pickling comments in the
documentation at face value:
http://docs.python.org/library/multiprocessing.html#proxy-objects

So, from what I can tell this issue is related to the mixing of standard
python socket I/O with multiprocessing socket I/O, with state not being
carried from the former to the latter.

In multiprocessing/connection.py, SocketClient() creates a familiar
python socket object which, when a default timeout has been set in the
module, will be made non-blocking. In addition, the timeout is
remembered in the socket object, and when calling socket.recv(), the
function internal_select() will use this to perform the expected poll()
or select().

However, after a connection is established, SocketClient() will not use
python's socket implementation any further, and instead pass its
low-level socket descriptor to a multiprocessing Connection object. This
object has its own specialized socket I/O implementation, which is not
at all aware of the timeout previously associated with the socket. As a
result no select/poll occurs and, due to the socket's non-blocking
status, recv() may return EAGAIN immediately. I suspect this is what's
happening.

There might be a number of ways to make SocketClient() more timeout
friendly, but possibility could be to simply check if the python socket
has a timeout associated, and if so, use connection.poll() in addition
to connection.recv(). There may be other places in the code where
similar changes would occur.

You obviously have more experience with this code base so I'll be
curious to see the outcome.
msg88078 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009-05-19 13:10
Well; I'm pretty tapped out right now - I think your idea of checking to 
see if a timeout has been set elsewhere makes sense. If you have the time 
to put together a patch (with a unit test or three :)) I can review it. 
Might take me a bit of time to get to this.
msg137830 - (view) Author: Derek Wilson (underrun) Date: 2011-06-07 14:52
This should be higher priority as one of the major benefits of the multiprocessing module is remote process management in a completely transparent manner. socket timeouts are very important in this context as blocking forever waiting for a connection is not always an option. 

The problem of not being able to use a default socket timeout for other purposes in combination with multiprocessing managers is definitely an issue, but making multiprocessing actually use the timeout itself if set would be a huge advantage.

This might not be the place to ask for it, but it would make sense for manager objects to gain a timeout attribute to be used as a timeout for local or remote communications. At the very least, the manager.connect() method should accept a timeout argument.
msg137860 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2011-06-07 17:14
I agree derek, I think that would be a fine addition, however we lack a patch and I don't have the current bandwidth to add it.
msg138415 - (view) Author: Derek Wilson (underrun) Date: 2011-06-16 06:06
While having multiprocessing use a timeout would be great, I didn't really have the time to fiddle with the c code.

Instead of using the socket timeout, I'm modifying all the sockets created by the socket module to have no timeout (and thus to be blocking) which makes the multiprocessing module 'immune' to the socket module's default timeout.

For testing, I just run the test suite twice, once with the initial default socket timeout and once with a 60 second timeout. Nothing there failed with this issue.

It is worth noting, however, that when using a default socket timeout, for some reason processes that have have put data into a queue no longer block at exit waiting for the data to be consumed. I'm not sure if there is some additional cleanup code that uses sockets and might need to block? Or maybe whatever the issue was with blocking sockets is not an issue with non-blocking sockets?
msg138489 - (view) Author: Derek Wilson (underrun) Date: 2011-06-17 04:39
I was wrong about exit behavior of a process that has put to a queue -- it seems to behave as expected. i had been playing with a proxy to a queue rather than a queue (to which, if you put, the process can exit right away because the actual put happens in the process that owns the queue).

I think this works as intended, but lmk. Also, I haven't really played with the tests that much, so that bit could use some review. It hasn't broken anything in any of my real world tests though.

Also, have I mentioned that the multiprocessing module is amazing? Cause it is. I sort of feel like antigravity == multiprocessing ...
msg151968 - (view) Author: Derek Wilson (underrun) Date: 2012-01-25 19:24
Any chance this patch will be accepted (or at least reviewed) soon?
msg151982 - (view) Author: Jim Jewett (Jim.Jewett) Date: 2012-01-25 22:55
The wording in 138415 suggested this patch was changing socket to not support timeouts -- which would be unacceptable.  

But the actual patch only seems to touch multiprocessing/connection.py -- a far more reasonable change.

Unfortunately, the patch no longer applies to the development tip.  I *think* the places you wanted to change are still there, and just moved.

(1)  Is it sufficiently clear that this is not-a-feature to justify a backport?

(2)  Are the problems already fixed by some of the other changes?  (It doesn't look like it, but I'm not sure.)

(3)  Can you produce an updated patch?  (The current tip is http://hg.python.org/cpython/file/fec45282dc28/Lib/multiprocessing/connection.py  )

(4)  If I understand the intent, then s.setblocking(True) would be slightly more clear than s.settimeout(None), though that change obviously isn't essential.
msg166292 - (view) Author: Derek Wilson (underrun) Date: 2012-07-24 13:49
Thanks, Jim, here is an updated patch. 

1) I feel like it is clearly not-a-feature. Currently 2.7 crashes if remote managers are used and socket.setdefaulttimeout is anything other than None. Crashing seems bad and all this does is keep multiprocessing connection sockets non-blocking even if a default timeout is specified (so it maintains current behavior rather than crashing).

2) This problem is still evident on 2.7, 3.2 and 3.3 beta 1. This patch is against the current dev tip as of a few days ago.

3) here it is!

4) I agree that setblocking is more clear. I made the change.

My test modifications cover the entire suite twice, once without a default timeout and once with. This may be excessive? I'm not sure where non-blocking sockets might pop up as an issue since there is C code that relies on blocking sockets and I haven't dug that deep.
msg166567 - (view) Author: Roundup Robot (python-dev) Date: 2012-07-27 13:57
New changeset 4e85e4743757 by Richard Oudkerk in branch '2.7':
Issue #6056: Make multiprocessing use setblocking(True) on the sockets it uses.
http://hg.python.org/cpython/rev/4e85e4743757

New changeset 290f04722be3 by Richard Oudkerk in branch '3.2':
Issue #6056: Make multiprocessing use setblocking(True) on the sockets it uses.
http://hg.python.org/cpython/rev/290f04722be3

New changeset f03839401420 by Richard Oudkerk in branch 'default':
Issue #6056: Make multiprocessing use setblocking(True) on the sockets it uses.
http://hg.python.org/cpython/rev/f03839401420
History
Date User Action Args
2012-07-29 12:28:11sbtsetstatus: open -> closed
type: crash -> behavior
resolution: fixed
stage: needs patch -> resolved
2012-07-27 13:57:31python-devsetnosy: + python-dev
messages: + msg166567
2012-07-24 14:37:45sbtsetnosy: + sbt
2012-07-24 13:49:22underrunsetfiles: + mp6056fix.patch

messages: + msg166292
2012-01-25 22:55:28Jim.Jewettsetnosy: + Jim.Jewett
messages: + msg151982
2012-01-25 19:24:10underrunsettype: behavior -> crash
messages: + msg151968
versions: + Python 3.3, Python 3.4
2011-11-01 14:27:50neologixlinkissue9897 superseder
2011-06-17 04:39:59underrunsetmessages: + msg138489
2011-06-16 06:06:03underrunsetfiles: + mpsock.patch
keywords: + patch
messages: + msg138415
2011-06-07 17:14:04jnollersetmessages: + msg137860
2011-06-07 14:52:09underrunsetnosy: + underrun
messages: + msg137830
2010-08-27 15:28:21BreamoreBoysetstage: needs patch
versions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6
2010-08-27 13:51:47asksolsetnosy: + asksol
2009-05-19 13:10:36jnollersetmessages: + msg88078
2009-05-19 05:20:18rylessettype: behavior
messages: + msg88064
2009-05-18 19:18:19jnollersetmessages: + msg88045
2009-05-18 18:46:59benjamin.petersonsetassignee: jnoller

nosy: + jnoller
2009-05-18 17:48:59rylescreate