Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTPResponse may drop buffer holding next response #67566

Open
vadmium opened this issue Feb 2, 2015 · 13 comments
Open

HTTPResponse may drop buffer holding next response #67566

vadmium opened this issue Feb 2, 2015 · 13 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@vadmium
Copy link
Member

vadmium commented Feb 2, 2015

BPO 23377
Nosy @pitrou, @vadmium, @demianbrecht
Files
  • http-buffer.patch
  • http-buffer.v2.patch
  • http-buffer.v3.patch
  • http-buffer.v4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2015-02-02.12:56:16.093>
    labels = ['type-bug', 'library']
    title = 'HTTPResponse may drop buffer holding next response'
    updated_at = <Date 2015-06-22.12:44:07.896>
    user = 'https://github.com/vadmium'

    bugs.python.org fields:

    activity = <Date 2015-06-22.12:44:07.896>
    actor = 'martin.panter'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2015-02-02.12:56:16.093>
    creator = 'martin.panter'
    dependencies = []
    files = ['37977', '38875', '39472', '39769']
    hgrepos = []
    issue_num = 23377
    keywords = ['patch', 'needs review']
    message_count = 13.0
    messages = ['235251', '240315', '240501', '240527', '240839', '243516', '243550', '243600', '243708', '243901', '244509', '244976', '245625']
    nosy_count = 3.0
    nosy_names = ['pitrou', 'martin.panter', 'demian.brecht']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue23377'
    versions = ['Python 3.5']

    @vadmium
    Copy link
    Member Author

    vadmium commented Feb 2, 2015

    This is the same issue raised at <https://bugs.python.org/issue4879#msg91597\>. Currently, every time a new response is to be received, HTTPConnection passes its raw socket object to HTTPResponse, which calls sock.makefile("rb") and creates a BufferedReader. The BufferedReader is used to parse the header section and read the response body. The problem is that the BufferedReader is closed at the end of reading the response, potentially losing buffered data read from a subsequent response.

    Normally no data is lost, because most users would read the full response before triggering a new request, and the server would wait for a request before sending a response. But if a user pipelined a second request without reading all of the first response, and the server happened to send the end of the first response and the start of the second response in the same packet, it could trigger the problem. I have added a test called test_httplib.ServerTest.testDoubleResponse() which emulates this scenario. The problem also makes it hard to detect misbehaving servers, or use HTTPConnection to test that a server is behaving correctly.

    I am adding a patch which creates the BufferedReader once for each connection. This involves changing the API of the HTTPResponse constructor. I think this should be okay because even though it is documented, it says “Not instantiated directly by user”. It did require changing the tests that call the HTTPResponse constructor though. If absolutely necessary, it may be possible to maintain backwards compatibility if we added a new constructor parameter, and carefully juggled how the close() calls work.

    @vadmium vadmium added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Feb 2, 2015
    @vadmium
    Copy link
    Member Author

    vadmium commented Apr 9, 2015

    http-buffer.v2.patch:

    • Merged with recent changes
    • Made the changes to the test suite slightly less intrusive. Unfortunately there are still a lot of changes left where mock sockets were being sent into the HTTPResponse constructor.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 11, 2015

    I think it's ok to slightly break a non-public API since it's required to fix an obvious bug.
    By the way, I guess we don't support HTTP pipelining, right?

    @vadmium
    Copy link
    Member Author

    vadmium commented Apr 12, 2015

    HTTP pipelining is not supported in general. According to the module doc string, you can actually pipeline a second request if you have read the first response’s headers but not its body:

    >>> conn = HTTPSConnection("bugs.python.org")
    >>> conn.request("GET", "/")
    >>> response1 = conn.getresponse()  # Read header section only
    >>> conn.request("GET", "/issue23377")
    >>> response1.read()[:100]  # Read first body after pipelining second request
    b'<!--\n This is the default body that is displayed when people visit the\n tracker. The tag below lists'
    >>> response2 = conn.getresponse()
    >>> response2.read()[:100]
    b'\n<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xht'

    But I suspect this might deadlock with some servers (that don’t properly support pipelining themselves) if the second request was a large POST or something and filled up the buffers that this mode relies on.

    @demianbrecht
    Copy link
    Mannequin

    demianbrecht mannequin commented Apr 14, 2015

    I think it's ok to slightly break a non-public API since it's required to fix an obvious bug.

    To play devil's advocate here, /is/ this really non-public API? According to documented Python naming conventions, HTTPResponse is part of the public API. From code I've written myself, this would likely create a number of breakages. Granted, it would largely be in lower level testing code, but it would still create a headache. Is something that's lightly documented as being non-public take precedence over well known and used naming conventions? I'd argue no, it shouldn't.

    That said, I'm not overly enthused about the alternative Martin describes, so at first thought, I'm pretty much square on the fence about this.

    By the way, I guess we don't support HTTP pipelining, right?

    As Martin pointed out, sadly no. There's currently an internal state machine that seems to possibly be remnants of an initial implementation (just a guess, I haven't done any research into it to prove that though) that prevents pipelining. It's on my list of things to tackle as it seems to be a natural progression to HTTP/2.0 multiplexing, but unfortunately my time is a little shorter than usual at the moment (taking a new job has cut into my spare time a little).

    @martin:

    I haven't had a chance to review the change in detail yet, but I think the idea's sound. I'll try to get some time over the next few days to dig into it further.

    @pitrou
    Copy link
    Member

    pitrou commented May 18, 2015

    Is this pending a review?

    @vadmium
    Copy link
    Member Author

    vadmium commented May 19, 2015

    Yes I would like someone to review my code changes for this; I don’t think anyone has done so yet. But I don’t consider this a very important bug, so there is no hurry.

    @demianbrecht
    Copy link
    Mannequin

    demianbrecht mannequin commented May 19, 2015

    If nobody else gets to it first, I'll try to get to review this later today or tomorrow. Apologies for the delay on the review Martin, a new baby (coming next month), moving /and/ taking on a new job all at the same time seems to limit free time a little ;)

    @demianbrecht
    Copy link
    Mannequin

    demianbrecht mannequin commented May 21, 2015

    Added comments to Rietveld.

    @vadmium
    Copy link
    Member Author

    vadmium commented May 23, 2015

    Thanks for the reviewing. Here is http-buffer.v3.patch:

    • Merged with current code
    • Better HTTPResponse(sock) compatibility suggested by Demian
    • New HTTPConnection.request(close=True) feature also suggested by Demian. This sends “Connection: close” in the request, and arranges for the response to be detached from the HTTPConnection instance, making some hacks in urllib.request redundant.

    @vadmium
    Copy link
    Member Author

    vadmium commented May 31, 2015

    Just realized that the makefile() call in _tunnel() still lets this bug occur when a HTTPSConnection is being used with a proxy using the CONNECT method. So while my patch shouldn’t make things any worse than they already are, it needs more work to fix this instance. Maybe we would have to use the new SSL memory BIO support.

    @demianbrecht
    Copy link
    Mannequin

    demianbrecht mannequin commented Jun 7, 2015

    Actually had a few free minutes so stuck a couple minor comments in Rietveld. Will have another go as soon as possible.

    @vadmium
    Copy link
    Member Author

    vadmium commented Jun 22, 2015

    Minor update based on Demian’s review

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants