classification
Title: HTTPResponse may drop buffer holding next response
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: demian.brecht, martin.panter, pitrou
Priority: normal Keywords: needs review, patch

Created on 2015-02-02 12:56 by martin.panter, last changed 2015-06-22 12:44 by martin.panter.

Files
File name Uploaded Description Edit
http-buffer.patch martin.panter, 2015-02-02 12:56 review
http-buffer.v2.patch martin.panter, 2015-04-09 09:24 review
http-buffer.v3.patch martin.panter, 2015-05-23 09:03 review
http-buffer.v4.patch martin.panter, 2015-06-22 12:44 review
Messages (13)
msg235251 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-02 12:56
This is the same issue raised at <https://bugs.python.org/issue4879#msg91597>. Currently, every time a new response is to be received, HTTPConnection passes its raw socket object to HTTPResponse, which calls sock.makefile("rb") and creates a BufferedReader. The BufferedReader is used to parse the header section and read the response body. The problem is that the BufferedReader is closed at the end of reading the response, potentially losing buffered data read from a subsequent response.

Normally no data is lost, because most users would read the full response before triggering a new request, and the server would wait for a request before sending a response. But if a user pipelined a second request without reading all of the first response, and the server happened to send the end of the first response and the start of the second response in the same packet, it could trigger the problem. I have added a test called test_httplib.ServerTest.testDoubleResponse() which emulates this scenario. The problem also makes it hard to detect misbehaving servers, or use HTTPConnection to test that a server is behaving correctly.

I am adding a patch which creates the BufferedReader once for each connection. This involves changing the API of the HTTPResponse constructor. I think this should be okay because even though it is documented, it says “Not instantiated directly by user”. It did require changing the tests that call the HTTPResponse constructor though. If absolutely necessary, it may be possible to maintain backwards compatibility if we added a new constructor parameter, and carefully juggled how the close() calls work.
msg240315 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-04-09 09:24
http-buffer.v2.patch:

* Merged with recent changes
* Made the changes to the test suite slightly less intrusive. Unfortunately there are still a lot of changes left where mock sockets were being sent into the HTTPResponse constructor.
msg240501 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-04-11 21:57
I think it's ok to slightly break a non-public API since it's required to fix an obvious bug.
By the way, I guess we don't support HTTP pipelining, right?
msg240527 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-04-12 03:02
HTTP pipelining is not supported in general. According to the module doc string, you can actually pipeline a second request if you have read the first response’s headers but not its body:

>>> conn = HTTPSConnection("bugs.python.org")
>>> conn.request("GET", "/")
>>> response1 = conn.getresponse()  # Read header section only
>>> conn.request("GET", "/issue23377")
>>> response1.read()[:100]  # Read first body after pipelining second request
b'<!--\n This is the default body that is displayed when people visit the\n tracker. The tag below lists'
>>> response2 = conn.getresponse()
>>> response2.read()[:100]
b'\n<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xht'

But I suspect this might deadlock with some servers (that don’t properly support pipelining themselves) if the second request was a large POST or something and filled up the buffers that this mode relies on.
msg240839 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-04-14 04:42
> I think it's ok to slightly break a non-public API since it's required to fix an obvious bug.

To play devil's advocate here, /is/ this really non-public API? According to documented Python naming conventions, HTTPResponse is part of the public API. From code I've written myself, this would likely create a number of breakages. Granted, it would largely be in lower level testing code, but it would still create a headache. Is something that's lightly documented as being non-public take precedence over well known and used naming conventions? I'd argue no, it shouldn't.

That said, I'm not overly enthused about the alternative Martin describes, so at first thought, I'm pretty much square on the fence about this.

> By the way, I guess we don't support HTTP pipelining, right?

As Martin pointed out, sadly no. There's currently an internal state machine that seems to possibly be remnants of an initial implementation (just a guess, I haven't done any research into it to prove that though) that prevents pipelining. It's on my list of things to tackle as it seems to be a natural progression to HTTP/2.0 multiplexing, but unfortunately my time is a little shorter than usual at the moment (taking a new job has cut into my spare time a little).

@Martin:

I haven't had a chance to review the change in detail yet, but I think the idea's sound. I'll try to get some time over the next few days to dig into it further.
msg243516 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-05-18 18:50
Is this pending a review?
msg243550 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-19 00:20
Yes I would like someone to review my code changes for this; I don’t think anyone has done so yet. But I don’t consider this a very important bug, so there is no hurry.
msg243600 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-05-19 15:40
If nobody else gets to it first, I'll try to get to review this later today or tomorrow. Apologies for the delay on the review Martin, a new baby (coming next month), moving /and/ taking on a new job all at the same time seems to limit free time a little ;)
msg243708 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-05-21 00:03
Added comments to Rietveld.
msg243901 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-23 09:03
Thanks for the reviewing. Here is http-buffer.v3.patch:

* Merged with current code
* Better HTTPResponse(sock) compatibility suggested by Demian
* New HTTPConnection.request(close=True) feature also suggested by Demian. This sends “Connection: close” in the request, and arranges for the response to be detached from the HTTPConnection instance, making some hacks in urllib.request redundant.
msg244509 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-31 01:12
Just realized that the makefile() call in _tunnel() still lets this bug occur when a HTTPSConnection is being used with a proxy using the CONNECT method. So while my patch shouldn’t make things any worse than they already are, it needs more work to fix this instance. Maybe we would have to use the new SSL memory BIO support.
msg244976 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-06-07 21:40
Actually had a few free minutes so stuck a couple minor comments in Rietveld. Will have another go as soon as possible.
msg245625 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-06-22 12:44
Minor update based on Demian’s review
History
Date User Action Args
2015-06-22 12:44:07martin.pantersetfiles: + http-buffer.v4.patch

messages: + msg245625
2015-06-07 21:40:28demian.brechtsetmessages: + msg244976
2015-05-31 01:12:17martin.pantersetmessages: + msg244509
2015-05-23 09:03:47martin.pantersetfiles: + http-buffer.v3.patch

messages: + msg243901
2015-05-21 00:03:00demian.brechtsetmessages: + msg243708
2015-05-19 15:40:01demian.brechtsetmessages: + msg243600
2015-05-19 00:20:11martin.pantersetkeywords: + needs review

messages: + msg243550
2015-05-18 18:50:30pitrousetmessages: + msg243516
2015-04-14 04:42:29demian.brechtsetmessages: + msg240839
2015-04-12 03:02:20martin.pantersetmessages: + msg240527
2015-04-11 21:57:01pitrousetmessages: + msg240501
stage: patch review
2015-04-11 20:39:29pitrousetnosy: + pitrou, demian.brecht
2015-04-09 09:24:30martin.pantersetfiles: + http-buffer.v2.patch

messages: + msg240315
2015-02-02 12:56:16martin.pantercreate