Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

httplib read status memory usage #51040

Closed
msucajtys mannequin opened this issue Aug 28, 2009 · 18 comments
Closed

httplib read status memory usage #51040

msucajtys mannequin opened this issue Aug 28, 2009 · 18 comments
Assignees
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@msucajtys
Copy link
Mannequin

msucajtys mannequin commented Aug 28, 2009

BPO 6791
Nosy @orsenthil, @pitrou, @vstinner
Files
  • httplib.patch: patch
  • i6791_unittest.patch: py3k unit test for the issue
  • i6791_py3k.patch: py3k patch
  • httplinelength.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/orsenthil'
    closed_at = <Date 2010-12-18.18:21:23.370>
    created_at = <Date 2009-08-28.08:32:35.306>
    labels = ['library', 'performance']
    title = 'httplib read status memory usage'
    updated_at = <Date 2010-12-18.18:21:23.368>
    user = 'https://bugs.python.org/msucajtys'

    bugs.python.org fields:

    activity = <Date 2010-12-18.18:21:23.368>
    actor = 'pitrou'
    assignee = 'orsenthil'
    closed = True
    closed_date = <Date 2010-12-18.18:21:23.370>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2009-08-28.08:32:35.306>
    creator = 'm.sucajtys'
    dependencies = []
    files = ['14795', '20048', '20049', '20100']
    hgrepos = []
    issue_num = 6791
    keywords = ['patch']
    message_count = 18.0
    messages = ['92027', '92030', '110923', '124010', '124011', '124021', '124029', '124030', '124031', '124111', '124126', '124130', '124132', '124138', '124247', '124253', '124295', '124302']
    nosy_count = 5.0
    nosy_names = ['orsenthil', 'pitrou', 'vstinner', 'm.sucajtys', 'rosslagerwall']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue6791'
    versions = ['Python 3.1', 'Python 2.7', 'Python 3.2']

    @msucajtys
    Copy link
    Mannequin Author

    msucajtys mannequin commented Aug 28, 2009

    During writing some code I discovered some behaviour of httplib. When we
    connect to host, which doesn’t respond with status line, but it just
    sending data, httplib may consume more and more memory, becouce when we
    execute
    h = httplib.HTTPConnection(‘host’)
    h.conect()
    h.request(‘GET’, ‘/’)
    r = h.getresponse()
    httplib tries to read one line from host. If host doesn’t send new line
    character (‘\n’), httplib reads more and more data. On my tests httplib
    could consume all of 4GB of memory and the python process was killed by
    oom_killer.
    The resolution is to limit maximum amount of data read on getting
    response. I have performed some test:
    I received 3438293 from hosts located in the network. The longest valid
    response line is
    HTTP/1.1 500 ( The specified Secure Sockets Layer (SSL) port is not
    allowed. ISA Server is not configured to allow SSL requests from this
    port. Most Web browsers use port 443 for SSL requests. )\r\n
    and it has 197 characters.
    In RFC2616 in section 6.1 we have:
    “The first line of a Response message is the Status-Line, consisting of
    the protocol version followed by a numeric status code and its
    associated textual phrase, with each element separated by SP characters.
    No CR or LF is allowed except in the final CRLF sequence.
    Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
    (..)The Reason-Phrase is intended to give a short textual description of
    the Status-Code.”
    So limiting maximum status line length to 256 characters is a solution
    of this problem. It doesn’t break compatibility withc RFC 2616.

    My patch was written originally on python2.4, but I’ve tested it on
    python2.6:
    [ms@host python2.6]$ patch --dry-run -i /home/ms/httplib.patch
    patching file httplib.py
    Hunk #1 succeeded at 209 (offset 54 lines).

    @msucajtys msucajtys mannequin added stdlib Python modules in the Lib dir performance Performance or resource usage labels Aug 28, 2009
    @msucajtys
    Copy link
    Mannequin Author

    msucajtys mannequin commented Aug 28, 2009

    I've also check patch against code in svn tree:
    wget http://svn.python.org/projects/python/trunk/Lib/httplib.py
    patch -p0 -i httplib.patch --dry-run
    patching file httplib.py
    Hunk #1 succeeded at 209 (offset 54 lines).
    Hunk #2 succeeded at 303 (offset 10 lines).

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 20, 2010

    Sumar, to get this moved forward could you please provide a unit test.

    @orsenthil orsenthil self-assigned this Jul 20, 2010
    @rosslagerwall
    Copy link
    Mannequin

    rosslagerwall mannequin commented Dec 15, 2010

    Attached is a unit test which tests the issue.
    Unfortunately, since it uses the resource module to limit memory to a workable size, it will only work on Unix.

    The given patch appears to fix the issue well.

    I think this should be taken as a security issue (even if a rather odd one) since a malicious http server could be set up in place of the normal one and crash any http python clients that connect to it.

    Eg:
    Run: dd if=/dev/zero bs=10M count=1000 | nc -l 8888
    And then:

    import httplib
    h = httplib.HTTPConnection('localhost', 8888)
    h.connect()
    h.request('GET', '/')
    r = h.getresponse()

    This should cause python to use up all the memory available.

    @rosslagerwall
    Copy link
    Mannequin

    rosslagerwall mannequin commented Dec 15, 2010

    A py3k patch against revision 87228.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 15, 2010

    First, I don't think the resource module needs to be used here. Second, I don't see why getcode() would return 200. If no valid response was received then some kind of error should certainly be raised, shouldn't it?

    @pitrou
    Copy link
    Member

    pitrou commented Dec 15, 2010

    By the way, looking at the code, readline() without any parameter is used all over http.client, so fixing only this one use case doesn't really make sense.

    @rosslagerwall
    Copy link
    Mannequin

    rosslagerwall mannequin commented Dec 15, 2010

    That's true. Near the bottom of the code, it says:

    # The status-line parsing code calls readline(), which normally
    # get the HTTP status line. For a 0.9 response, however, this is
    # actually the first line of the body!

    Limiting the length of the status line would break 0.9 responses so maybe this issue should be closed?

    @pitrou
    Copy link
    Member

    pitrou commented Dec 15, 2010

    That's true. Near the bottom of the code, it says:

    The status-line parsing code calls readline(), which normally

    get the HTTP status line. For a 0.9 response, however, this is

    actually the first line of the body!

    Limiting the length of the status line would break 0.9 responses so maybe this issue should be closed?

    Well, the HTTP 1.0 RFC was filed in 1996 and HTTP 1.1 is most commonly
    used today. I don't think we need to support 0.9 anymore. I'll open a
    separate issue for ripping off 0.9 support, though.

    @orsenthil
    Copy link
    Member

    I just read the whole discussion and it seems that code was in place so that client can tolerant of a BAD HTTP 0.9 Server response.
    http://www.w3.org/Protocols/HTTP/OldServers.html

    Given that bpo-10711 talks about removing HTTP/0.9 support (+1 to that), this issue will become obsolete. I too support removing HTTP/0.9. There are hardly any advantages in keeping it around.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 16, 2010

    Well, removing 0.9 support doesn't make this obsolete, does it?

    @orsenthil
    Copy link
    Member

    On Thu, Dec 16, 2010 at 01:18:30PM +0000, Antoine Pitrou wrote:

    Well, removing 0.9 support doesn't make this obsolete, does it?

    It does. Doesn't it? Because I saw in your patch that you fall back on
    HTTP 1.0 behaviour when the server does not return a status line and
    in which case a Exception will be raise and this issue won't be
    observed.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 16, 2010

    It does. Doesn't it? Because I saw in your patch that you fall back on
    HTTP 1.0 behaviour when the server does not return a status line and
    in which case a Exception will be raise and this issue won't be
    observed.

    I don't think you understood the issue here. Calling readline() without
    a maximum length means the process memory potentially explodes, if the
    server sends gigabytes of data without a single "\n".

    @orsenthil
    Copy link
    Member

    On Thu, Dec 16, 2010 at 02:02:10PM +0000, Antoine Pitrou wrote:

    I don't think you understood the issue here. Calling readline() without
    a maximum length means the process memory potentially explodes, if the
    server sends gigabytes of data without a single "\n".

    Yeah, I seem to have misunderstood the issue. Even if the response wa
    s an *invalid* one but it was huge data without \n, the readline call
    would just explode.

    • reading chunked response is doing a readline call too.

    Both this need to be addressed by having a limit on reading.

    I thought readline() is being called only when parsing headers which
    should almost always have CRLF (or at least LF) and thought valid
    responses always start with headers.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 17, 2010

    Now that 0.9 client support has been removed, this can proceed (at least for 3.2).

    @pitrou
    Copy link
    Member

    pitrou commented Dec 17, 2010

    Here is a patch limiting line length everywhere in http.client, + tests (it also affects http.server since the header parsing routine is shared).

    @orsenthil
    Copy link
    Member

    In the morning, I had a comment on the patch wondering why read _MAXLENGH + 1 and then check for len of header > _MAXLENGH. Instead of just reading _MAXLENGH (and if the length matched rejecting). ( Looks like it did not go through).

    I think that either way is okay. I am taking the privilege of committing the patch. Fixed for py3k in 87373. So it is be available in the next beta.

    Shall merge the changes to other codelines.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 18, 2010

    Partially backported in r87382 (3.1) and r87383 (2.7). Not everything could be merged in because of HTTP 0.9 support and (in 2.7) a slightly different architecture. Thank you.

    @pitrou pitrou closed this as completed Dec 18, 2010
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants