-
-
Notifications
You must be signed in to change notification settings - Fork 29.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTPResponse may drop buffer holding next response #67566
Comments
This is the same issue raised at <https://bugs.python.org/issue4879#msg91597\>. Currently, every time a new response is to be received, HTTPConnection passes its raw socket object to HTTPResponse, which calls sock.makefile("rb") and creates a BufferedReader. The BufferedReader is used to parse the header section and read the response body. The problem is that the BufferedReader is closed at the end of reading the response, potentially losing buffered data read from a subsequent response. Normally no data is lost, because most users would read the full response before triggering a new request, and the server would wait for a request before sending a response. But if a user pipelined a second request without reading all of the first response, and the server happened to send the end of the first response and the start of the second response in the same packet, it could trigger the problem. I have added a test called test_httplib.ServerTest.testDoubleResponse() which emulates this scenario. The problem also makes it hard to detect misbehaving servers, or use HTTPConnection to test that a server is behaving correctly. I am adding a patch which creates the BufferedReader once for each connection. This involves changing the API of the HTTPResponse constructor. I think this should be okay because even though it is documented, it says “Not instantiated directly by user”. It did require changing the tests that call the HTTPResponse constructor though. If absolutely necessary, it may be possible to maintain backwards compatibility if we added a new constructor parameter, and carefully juggled how the close() calls work. |
http-buffer.v2.patch:
|
I think it's ok to slightly break a non-public API since it's required to fix an obvious bug. |
HTTP pipelining is not supported in general. According to the module doc string, you can actually pipeline a second request if you have read the first response’s headers but not its body: >>> conn = HTTPSConnection("bugs.python.org")
>>> conn.request("GET", "/")
>>> response1 = conn.getresponse() # Read header section only
>>> conn.request("GET", "/issue23377")
>>> response1.read()[:100] # Read first body after pipelining second request
b'<!--\n This is the default body that is displayed when people visit the\n tracker. The tag below lists'
>>> response2 = conn.getresponse()
>>> response2.read()[:100]
b'\n<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xht' But I suspect this might deadlock with some servers (that don’t properly support pipelining themselves) if the second request was a large POST or something and filled up the buffers that this mode relies on. |
To play devil's advocate here, /is/ this really non-public API? According to documented Python naming conventions, HTTPResponse is part of the public API. From code I've written myself, this would likely create a number of breakages. Granted, it would largely be in lower level testing code, but it would still create a headache. Is something that's lightly documented as being non-public take precedence over well known and used naming conventions? I'd argue no, it shouldn't. That said, I'm not overly enthused about the alternative Martin describes, so at first thought, I'm pretty much square on the fence about this.
As Martin pointed out, sadly no. There's currently an internal state machine that seems to possibly be remnants of an initial implementation (just a guess, I haven't done any research into it to prove that though) that prevents pipelining. It's on my list of things to tackle as it seems to be a natural progression to HTTP/2.0 multiplexing, but unfortunately my time is a little shorter than usual at the moment (taking a new job has cut into my spare time a little). I haven't had a chance to review the change in detail yet, but I think the idea's sound. I'll try to get some time over the next few days to dig into it further. |
Is this pending a review? |
Yes I would like someone to review my code changes for this; I don’t think anyone has done so yet. But I don’t consider this a very important bug, so there is no hurry. |
If nobody else gets to it first, I'll try to get to review this later today or tomorrow. Apologies for the delay on the review Martin, a new baby (coming next month), moving /and/ taking on a new job all at the same time seems to limit free time a little ;) |
Added comments to Rietveld. |
Thanks for the reviewing. Here is http-buffer.v3.patch:
|
Just realized that the makefile() call in _tunnel() still lets this bug occur when a HTTPSConnection is being used with a proxy using the CONNECT method. So while my patch shouldn’t make things any worse than they already are, it needs more work to fix this instance. Maybe we would have to use the new SSL memory BIO support. |
Actually had a few free minutes so stuck a couple minor comments in Rietveld. Will have another go as soon as possible. |
Minor update based on Demian’s review |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: