classification
Title: httplib persistent connections violate MUST in RFC2616 sec 8.1.4.
Type: Stage:
Components: Library (Lib) Versions: Python 3.3, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: Yuri.Bochkarev, agriffis, alanjds, amak, cananian, demian.brecht, gregory.p.smith, icordasc, jcea, jhylton, mhammond, orsenthil, vadmium
Priority: normal Keywords:

Created on 2008-08-16 15:41 by cananian, last changed 2014-11-20 04:37 by vadmium.

Messages (7)
msg71221 - (view) Author: C. Scott Ananian (cananian) Date: 2008-08-16 15:41
Although HTTP/1.1 says that servers SHOULD send a Connection: close
header if they intend to close a persistent connection (sec 8.1.2.1),
clients (like httplib) "MUST be able to recover from asynchronous close
events. Client software SHOULD reopen the transport connection and
retransmit the aborted sequence of requests without user interaction so
long as the request sequence is idempotent" (sec 8.1.4) since servers
MAY close a persistent connection after a request due to a timeout or
other reason.  Further, "Clients and servers SHOULD both constantly
watch for the other side of the transport close, and respond to it as
appropriate." (sec 8.1.4).

httplib currently does not detect when the server closes its side of the
connection, until the following bit of HTTPResponse in httplib.py
(python2.5.2):

    def _read_status(self):
        # Initialize with Simple-Response defaults
        line = self.fp.readline()
        ...
        if not line:
            # Presumably, the server closed the connection before
            # sending a valid response.
            raise BadStatusLine(line)
        ...

So we end up raising a BadStatusLine exception for a completely
permissible case: the server closed a persistent connection.  This
causes persistent connections to fail for users of httplib in mysterious
ways, especially if they are behind proxies: Squid, for example, seems
to limit persistent connections to a maximum of 3 requests, and then
closes the connection, causing future requests to raise the BadStatusLine.

There appears to be code attempting to fix this problem in
HTTPConnection.request(), but it doesn't always work.  RFC793 says, "If
an unsolicited FIN arrives from the network, the receiving TCP can ACK
it and tell the user that the connection is closing.  The user will
respond with a CLOSE, upon which the TCP can send a FIN to the other TCP
after sending any remaining data." (sec 3.5 case 2)  Key phrase here is
"after sending any remaining data": python is usually allowed to put the
request on the network without raising a socket.error; the close is not
signaled to python until HTTPResponse.begin() invokes
HTTPResponse._read_status.

It would be best to extend the retry logic in request() to cover this
case as well (store away the previous parameters to request() and if
_read_status() fails invoke HTTPConnection.close(),
HTTPConnection.connect(), HTTPConnection.request(stored_params), and
retry the HTTPConnection.getresponse().  But at the very least python
should document and raise a EAGAIN exception of some kind so that the
caller can distinguish this case from an actual bad status line and
retry the request.
msg71806 - (view) Author: C. Scott Ananian (cananian) Date: 2008-08-23 13:44
FWIW, the following code works around the issue for me:

class HTTPConnection(httplib.HTTPConnection):
    def request(self, method, url, body=None, headers={}):
        # patch upstream to recv() after we send, so that a closed
        # connection is discovered here rather than later in
        # HTTPResponse._read_status()
        try:
            self._send_request(method, url, body, headers)
            b = self.sock.recv(1, socket.MSG_PEEK)
            if b == '':
                self.close()
                raise socket.error(32) # sender closed connection.
        except socket.error, v:
            # trap 'Broken pipe' if we're allowed to automatically reconnect
            if v[0] != 32 or not self.auto_open:
                raise
            # try one more time
            self._send_request(method, url, body, headers)

In general, attempting to read a byte of the response just to trigger
the broken pipe error is suboptimal; it doesn't accurately distinguish
the "closed before received request" case from the "closed just after
received request" case.  For idempotent methods like GET and HEAD, this
doesn't matter, but it does for POST, etc.

There ought to be a better way to force the underlying OS to give you
the current 'closed' status on a socket which doesn't require your
trying to send/receive bytes.  The fundamental issue here is that the
TCP spec, and Linux, allow the request send to succeed even if TCP FIN
has been received from the other side -- the receipt of FIN after the
response is received is always going to be a race to some degree.

All of which is why the HTTP pipelining specs encourage you *not* to use
pipelining for non-idempotent requests.  The above patch could be
tweaked to only perform the MSG_PEEK and retry if this is the second or
later pipelined request.  Then, following the HTTP recommendations and
always calling HttpConnection.close()/HttpConnection.connect() before
invoking "unsafe" requests like POST, would make the whole thing safe.
msg91650 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2009-08-16 22:24
btw, when using async io (poll, select, etc) I -think- your socket will
see a read event when the server closes the connection (sends you a FIN
or even a RST) at which point your sock.recv() when you've been told
data was ready will return 0 bytes indicating that the server has closed
the connection.

I like your MSG_PEEK hack here.  I'll figure out if this or something
else should go into (the giant mess that is) httplib.
msg91681 - (view) Author: Jeremy Hylton (jhylton) Date: 2009-08-18 11:32
Not that we've removed the try one more time branch of the code,
because it was causing other problems.

Jeremy

On Sun, Aug 16, 2009 at 6:24 PM, Gregory P. Smith<report@bugs.python.org> wrote:
>
> Gregory P. Smith <greg@krypto.org> added the comment:
>
> btw, when using async io (poll, select, etc) I -think- your socket will
> see a read event when the server closes the connection (sends you a FIN
> or even a RST) at which point your sock.recv() when you've been told
> data was ready will return 0 bytes indicating that the server has closed
> the connection.
>
> I like your MSG_PEEK hack here.  I'll figure out if this or something
> else should go into (the giant mess that is) httplib.
>
> ----------
> priority:  -> normal
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue3566>
> _______________________________________
> _______________________________________________
> Python-bugs-list mailing list
> Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/jeremy%40alum.mit.edu
>
>
msg124366 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2010-12-19 22:05
unassigning, i don't had time to look at this one.
msg206661 - (view) Author: Alan Justino (alanjds) Date: 2013-12-20 02:38
Seems to affect 2.7 too.
msg231413 - (view) Author: Martin Panter (vadmium) Date: 2014-11-20 04:37
I don’t think the peek hack needs to go into to the standard library. It is just a workaround for code using the current library, to differentiate an empty status line due to EOF from any other kind of “bad” status line. This is what Issue 8450, “False BadStatusLine” is about.
History
Date User Action Args
2014-11-20 04:37:42vadmiumsetmessages: + msg231413
2014-07-25 14:54:59icordascsetnosy: + icordasc
2014-07-24 00:20:22demian.brechtsetnosy: + demian.brecht
2013-12-20 02:38:03alanjdssetnosy: + alanjds

messages: + msg206661
versions: + Python 2.7
2013-11-01 03:00:48vadmiumsetnosy: + vadmium
2013-09-11 14:20:41Yuri.Bochkarevsetnosy: + Yuri.Bochkarev
2012-02-01 02:27:14jceasetnosy: + jcea

versions: + Python 3.3, - Python 3.1, Python 2.7, Python 3.2
2011-06-06 12:27:22agriffissetnosy: + agriffis
2011-05-09 15:40:05amaksetnosy: + amak
2011-03-19 22:35:08orsenthilsetassignee: orsenthil
2010-12-20 00:04:11orsenthilsetnosy: + orsenthil
2010-12-19 22:05:27gregory.p.smithsetassignee: gregory.p.smith -> (no value)
messages: + msg124366
2010-08-03 21:37:33terry.reedysetversions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.5
2009-08-18 11:32:23jhyltonsetnosy: + jhylton
messages: + msg91681
2009-08-16 22:24:57gregory.p.smithsetpriority: normal

messages: + msg91650
2008-09-22 01:23:24gregory.p.smithsetassignee: gregory.p.smith
nosy: + gregory.p.smith
2008-08-23 13:44:57cananiansetmessages: + msg71806
2008-08-23 01:55:35mhammondsetnosy: + mhammond
2008-08-16 15:41:19cananiancreate