classification
Title: Explain the default timeout in http-client-related libraries
Type: behavior Stage:
Components: Documentation, Library (Lib) Versions: Python 3.1, Python 2.7, Python 2.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: asandvig, docs@python, eric.smith, oddthinking, orsenthil, pitrou
Priority: normal Keywords:

Created on 2010-05-02 02:54 by oddthinking, last changed 2010-08-19 13:16 by eric.smith.

Messages (7)
msg104763 - (view) Author: Julian (oddthinking) Date: 2010-05-02 02:54
Since Python 2.6, httplib has offered a timeout parameter for fetches. As the documentation explains, if this parameter is not provided, it uses the global default.

What the document doesn't explain is httplib builds on top of the socket library. The socket library has a default timeout of None (i.e. forever). This may be an appropriate default for general sockets, but it is a poor default for httplib; typical http clients would use a timeout in the 2-10 second range.

This problem is propagated up to urllib2, which sits on httplib, and further obscures that the default might be unsuitable.

From an inspection of the manuals, Python 3.0.1 suffers from the same problem except, the names have changed. urllib.response sits on http.client.

I, for one, made a brutal mistake of assuming that the "global default" would be some reasonable default for fetching web pages; I didn't have any specific timeout in mind, and was happy for the library to take care of it. Several million successful http downloads later, my server application thread froze waiting forever when talking to a recalcitrant web-server. I imagine others have fallen for the same trap.

While an ideal solution would be for httplib and http.client to use a more generally acceptable default, I can see it might be far too late to make such a change without breaking existing applications. Failing that, I would recommend that the documentation for httplib, urllib, urllib2, http.client and urllib.request (+ any other similar libraries sitting on socket? FTP, SMTP?) be changed to highlight that the default global timeout, sans deliberate override, is to wait a surprisingly long time.
msg104765 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-05-02 03:23
I am not sure, there can be a default timeout value for client libraries like httplib and urllib2. 

Socket connection do have timeout and as you may have figured out already, the option in httplib and urllib methods is to set/override the socket._GLOBAL_DEFALT_TIMEOUT which is None by default (Wait indefinitely).

Since client libraries are using a global, setting it at once place (say at httplib) has same timeout applicable for other modules within the same process.

I see docs can highlight it more or perhaps link to sockets timeout information.
msg104767 - (view) Author: Julian (oddthinking) Date: 2010-05-02 03:45
@orsenthil:

Consider the definition of httplib.HTTPConnection.__init__(), in Python 2.6.

   def __init__(self, host, port=None, strict=None,
                timeout=socket._GLOBAL_DEFAULT_TIMEOUT):


This could be replaced with:

   def __init__(self, host, port=None, strict=None,
                timeout=10):

or, perhaps better, 

   def __init__(self, host, port=None, strict=None,
                timeout=httplib._HTTP_DEFAULT_TIMEOUT):

This timeout value is passed to the call in socket.create_connection, so I believe if it is overriden, it only applies to the relevant sockets and not to all sockets globally.

Note: I am not arguing here that this SHOULD be done - it would break existing applications, especially those that were written before Python 2.6 - merely that it COULD be done.
msg104769 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-05-02 04:04
On Sun, May 02, 2010 at 03:45:09AM +0000, Julian wrote:
> Note: I am not arguing here that this SHOULD be done - it would
> break existing applications, especially those that were written
> before Python 2.6 - merely that it COULD be done.

I get your point, Julian. What I was worried about is, is it the
"correct thing" to do? Which I am not sure and I believe it is not as
httplib and urllib are not client themselves but are libraries to
build clients. httplib and urllib can be considered as convenient
interfaces over underlying sockets.

Breaking of existing apps is the next question, which should
definitely be avoided. And an explanation in the docs certainly seems
to be the way to go.
msg109052 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-07-01 14:13
I think you could preserve backward compatibility by doing something like the following (in httplib):

_sentinel = object()
__HTTP_DEFAULT_TIMEOUT = _sentinel

In httplib.HTTPConnection.__init__(), in Python 2.6.

   def __init__(self, host, port=None, strict=None,
                timeout=None):
      if timeout is None:
         if _HTTP_DEFAULT_TIMEOUT is _sentinel:
            timeout = socket._GLOBAL_DEFAULT_TIMEOUT
         else:
            timeout = _HTTP_DEFAULT_TIMEOUT

That way, if _HTTP_DEFAULT_TIMEOUT is never set, it will use the the socket timeout. Admittedly I'd rather see all uses of module globals go away, but I think this would be a good compromise.
msg114361 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-19 13:14
> That way, if _HTTP_DEFAULT_TIMEOUT is never set, it will use the the
> socket timeout. Admittedly I'd rather see all uses of module globals go 
> away, but I think this would be a good compromise.

Why not provide {httplib,urllib}.{set,get}defaulttimeout() instead?
msg114362 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-08-19 13:16
On 8/19/2010 9:14 AM, Antoine Pitrou wrote:
> Why not provide {httplib,urllib}.{set,get}defaulttimeout() instead?

Yes, I'm assuming that's how _HTTP_DEFAULT_TIMEOUT would be set and queried.
History
Date User Action Args
2010-08-19 13:16:45eric.smithsetmessages: + msg114362
2010-08-19 13:14:05pitrousetnosy: + pitrou
messages: + msg114361
2010-08-19 10:55:12asandvigsetnosy: + asandvig
2010-07-01 14:13:55eric.smithsetnosy: + eric.smith
messages: + msg109052
2010-05-02 04:04:59orsenthilsetmessages: + msg104769
2010-05-02 03:45:06oddthinkingsetmessages: + msg104767
2010-05-02 03:23:04orsenthilsetnosy: + orsenthil

messages: + msg104765
title: Unexpected default timeout in http-client-related libraries -> Explain the default timeout in http-client-related libraries
2010-05-02 02:54:42oddthinkingcreate