classification
Title: multiprocessing Manager.connect() aggressively retries refused connections
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: bgilbert, neologix, pitrou, python-dev
Priority: normal Keywords: needs review, patch

Created on 2011-10-18 19:16 by bgilbert, last changed 2011-11-19 14:58 by neologix. This issue is now closed.

Files
File name Uploaded Description Edit
connection_retry.diff neologix, 2011-11-17 21:16 review
Messages (9)
msg145857 - (view) Author: Benjamin Gilbert (bgilbert) Date: 2011-10-18 19:16
multiprocessing.managers.BaseManager.connect() takes 20 seconds to return on failure, even if the server refuses the connection.  This is because the function that creates the connection, multiprocessing.connection.SocketClient(), handles ECONNREFUSED by retrying the connection attempt every 10 ms for 20 seconds.

While a 20 second timeout may make sense for *unresponsive* servers, ECONNREFUSED probably indicates that the server is not listening on this port, so hammering it with 1,999 more connection attempts isn't going to help.  In this case, SocketClient() should fail immediately, or -- at most -- retry a small number of times with exponential backoff.
msg146267 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-23 23:05
I'm not sure, but I think that would be for the case where you are spawning the server yourself and the child process takes time to start up.
msg146310 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-10-24 17:16
>  While a 20 second timeout may make sense for *unresponsive* servers,
> ECONNREFUSED probably indicates that the server is not listening on this port, so
> hammering it with 1,999 more connection attempts isn't going to help.

That's funny, I noticed this a couple days ago, and it also puzzled me...

> I'm not sure, but I think that would be for the case where you are spawning the
> server yourself and the child process takes time to start up.

That's also what I think.
But that's strange, since:
- this holds for every client/server communication (e.g. why not do
that for smtplib, telnetlib, etc. ?)
- it's against the classical connect() semantics
- some code may prefer failing immediately (instead of "hammering" the
remote host) if the remote server is down, or the address is
incorrect: it can still handle the ECONNREFUSED if it wants to retry,
with a custom retry timeout
I removed the retry code and run test_multiprocessing and
test_concurrent_futures in loop, and didn't see any failure (on
Linux), so I'd say we could probably remove that.
OTOH, I would feel bad if this broke someone's code (even though code
relying on the automatic retries is probably broken).
So I'm +1 on removing the retry logic altogether, unless of course
someone comes up with a good reason to keep it (I digged a little
through the logs to see when this was introduced, but apparently it
was there in the original import).
If we don't remove it, I agree we should at least reduce the timeout
and increase the period (an exponential backoff may be a bit
overkill).
msg147703 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-11-15 20:40
Here's a patch removing the automatic retry on ECONNREFUSED: I tested it on Linux and other Unices, and it seems to work just fine without this hammering.
Note that there's a similar mechanism for Windows (ERROR_PIPE_BUSY), but it seems to be necessary there (test_multiprocessing blows up without this). Note that I'd rather only apply this to default (not 2.7 and 3.2), since this is more of a behavior change than a bug fix.
msg147782 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-17 00:32
Apparently you forgot to upload the patch...
msg147822 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-11-17 21:16
> Apparently you forgot to upload the patch...

Told you I couldn't think straight :-)
msg147823 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-17 21:43
Ok for me. Nice to see that piece of ugliness removed.
msg147930 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-11-19 09:00
New changeset 34fcc0d5c3c5 by Charles-François Natali in branch 'default':
Issue #13215: multiprocessing.Connection: don't hammer the remote end with
http://hg.python.org/cpython/rev/34fcc0d5c3c5
msg147947 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-11-19 14:58
Benjamin, thanks for the report.
History
Date User Action Args
2011-11-19 14:58:11neologixsetstatus: open -> closed
resolution: fixed
messages: + msg147947

stage: patch review -> resolved
2011-11-19 09:00:27python-devsetnosy: + python-dev
messages: + msg147930
2011-11-17 21:43:02pitrousetmessages: + msg147823
2011-11-17 21:16:58neologixsetfiles: + connection_retry.diff

messages: + msg147822
2011-11-17 00:32:55pitrousetmessages: + msg147782
2011-11-15 20:40:07neologixsetkeywords: + patch, needs review

stage: patch review
messages: + msg147703
versions: + Python 3.3, - Python 2.7, Python 3.2
2011-10-24 17:16:04neologixsetmessages: + msg146310
2011-10-23 23:05:01pitrousetnosy: + neologix, pitrou
messages: + msg146267
2011-10-18 19:16:57bgilbertcreate