httplib infinite read on invalid header #64195

Lukasa · 2013-12-16T11:42:31Z

BPO	19996
Nosy	@warsaw, @tiran, @ezio-melotti, @bitdancer, @Lukasa
Files	hdrs.patch: Fix for current tip hdrs_27.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2015-01-26.04:35:11.011>
created_at = <Date 2013-12-16.11:42:30.767>
labels = ['type-bug', 'library', 'expert-email']
title = 'httplib infinite read on invalid header'
updated_at = <Date 2015-01-26.04:35:11.009>
user = 'https://github.com/Lukasa'

bugs.python.org fields:

activity = <Date 2015-01-26.04:35:11.009>
actor = 'python-dev'
assignee = 'none'
closed = True
closed_date = <Date 2015-01-26.04:35:11.011>
closer = 'python-dev'
components = ['Library (Lib)', 'email']
creation = <Date 2013-12-16.11:42:30.767>
creator = 'Lukasa'
dependencies = []
files = ['33169', '33198']
hgrepos = []
issue_num = 19996
keywords = ['patch']
message_count = 14.0
messages = ['206294', '206306', '206315', '206318', '206320', '206350', '206353', '206360', '206395', '206551', '207930', '221682', '224545', '234714']
nosy_count = 9.0
nosy_names = ['barry', 'christian.heimes', 'ezio.melotti', 'Arfrever', 'r.david.murray', 'python-dev', 'Lukasa', 'brianyardy', 'jaywink']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue19996'
versions = ['Python 2.7', 'Python 3.3', 'Python 3.4', 'Python 3.5']

Lukasa · 2013-12-16T11:55:45Z

Initially spotted on Requests GitHub bugtracker: https://github.com/kennethreitz/requests/issues/1804

On receiving an HTTP response with an invalid header, httplib stops parsing the headers and attempts to receive the rest of the message as body content. Normally that would be fine, but problems occur if later on in the headers "Transfer-Encoding: chunked" is declared. This leads to a hang while reading the body content until the remote end forcibly closes the connection.

This bug certainly affects versions 2.7 through 3.3.

To reproduce (note that we need to request gzip to get the server to send the bad header):

    import http.client
    h = http.client.HTTPConnection('www.sainsburysbank.co.uk')
    h.request('GET', '/', headers={'Accept-Encoding': 'gzip'})
    r = h.getresponse()
    hdrs = r.getheaders()
    body = r.read()  # Hang here.

cURL configured equivalently doesn't exhibit this problem, that is the following works fine:

curl --compressed http://www.sainsburysbank.co.uk/

It's not clear to me that this behaviour is wrong. The server is definitely violating RFC 2616 which expressly forbids empty header names. I'm open to consultation about what the correct fix should be here (which may be nothing at all).

bitdancer · 2013-12-16T13:31:55Z

Well, having it hang forever is a potential DOS attack, so something needs to be fixed, I think.

Lukasa · 2013-12-16T14:12:44Z

The easiest way to 'fix' the DoS problem is to throw an exception if an invalid header is parsed. That's a backwards-compatibility problem though: things that previously 'worked' now won't. That presumably limits the ability to back-apply this fix to 2.7.7.

An alternative option is to speculatively attempt to parse the next line for headers or the end of the header block. I'm not sure this is a great idea: at this stage all we know is that the header block is malformed, so it's not clear that 'doing our best' is a good idea either, especially since that attitude got us here to begin with.

The best 'middle of the road' option is to abort message parsing at this stage without throwing an exception. This leads to truncated headers and no body, where previously we'd have got truncated headers and a body that potentially included the missing headers. We could also potentially add a warning about the problem.

Are there any preferences for a fix here, or a better solution than the above (none of which I'm wild about)?

bitdancer · 2013-12-16T14:35:38Z

I haven't looked at the code, but could we preserve the existing behavior but apply a timeout to mitigate the DOS?

On the other hand, the fact that curl manages to return something indicates there is probably an error recovery strategy that would work. I'm not sure if we have an error reporting mechanism in httplib if we do error recovery. We do in the email module, and httplib uses the email code for headers, I think, so there might be a way to leverage that if there is no existing mechanism. But of course even deciding to do that requires some discussion :)

Lukasa · 2013-12-16T15:16:49Z

Maybe. If we do it we have to apply that timeout to all the socket actions on that HTTP connection. This would have the effect of changing the default value of the timeout parameter on the HTTPConnection object from socket._GLOBAL_DEFAULT_TIMEOUT to whatever value was chosen. We could do this for reads only, and avoid applying the timeout to connect() calls, but that's kind of weird.

We hit the same problem though: by default, HTTPConnections block indefinitely on all socket calls: we'd be changing that default to some finite timeout instead. Does that sound like a good way to go?

As for curl's error recovery strategy, I'm pretty sure it just keeps parsing the header block. That can definitely be done here. We do have an error reporting mechanism as well (sort of): we set the HTTPMessage.status field to some error string. We could do that, and continue to parse the header block: that's probably the least destructive way to fix this.

Lukasa · 2013-12-16T19:13:00Z

An update: in Python 2.7 through 3.3, fixing this should only affect http.client/httplib, because they do most of their header parsing themselves. Fixing this in later versions of Python is more interesting, as http.client got rewritten to use email.parser (which uses email.feedparser). This means any change to fix this problem in HTTP will also affect anything else that uses this module. Not sure how problematic that is yet.

Lukasa · 2013-12-16T19:18:11Z

Actually, that might be OK. I don't know the email package at all, but I suspect being able to handle empty header keys (by ignoring them) is a reasonable thing to do in the email case as well. Thoughts?

Lukasa · 2013-12-16T20:14:00Z

Alright, here's a patch for the current tip. I'll need to prepare a different patch for earlier versions of Python, which will take me a little while longer to do (maybe not today). I've also signed a contributor agreement, but it doesn't look like that's propagated here yet.

bitdancer · 2013-12-17T02:17:49Z

Heh. A missing header *name* was something I never considered in the email package tests. So yeah, that's worth handing one way or another. I'll put reviewing this on my TODO list, since I'm the maintainer of the email package.

I'm updating the versions per our usual practice: the versions in which we want to fix it.

Lukasa · 2013-12-18T21:17:52Z

Ok, here's a patch for 2.7 as well.

I decided to allow the empty header names in rfc822.py as well, if only because I wanted the changed parsing code to match. If anyone thinks that's an excessive change, I'll happily remove it.

Lukasa · 2014-01-12T07:40:51Z

Is there anything I can do to help move this forward? I appreciate you're all busy so I'm happy for this to take as long as it takes, I just wanted to make sure it's not blocked behind me.

brianyardy · 2014-06-27T12:09:16Z

import http.client
    h = http.client.HTTPConnection('http://www.einstantloan.co.uk/')
    h.request('GET', '/', headers={'Accept-Encoding': 'gzip'})
    r = h.getresponse()
    hdrs = r.getheaders()
    body = r.read()  # Hang here.

curl --compressed http://www.einstantloan.co.uk/

jaywink · 2014-08-02T09:27:20Z

I took the patches and verified that;

running the new tests without the changed code in Lib/email/feedparser.py (head) and Lib/httplib.py, Lib/rfc822.py (2.7) fails both the new tests.
running the new tests with the changed code passes the tests (on both head and 2.7).

Sainsburys Bank site seems to have been fixed thus verification in the first comment does not work - but the test I think emulates that well enough.

python-dev · 2015-01-26T04:35:11Z

New changeset 25ecf3d0ea03 by Benjamin Peterson in branch '3.4':
handle headers with no key (closes bpo-19996)
https://hg.python.org/cpython/rev/25ecf3d0ea03

New changeset 29923a9987be by Benjamin Peterson in branch 'default':
merge 3.4 (bpo-19996)
https://hg.python.org/cpython/rev/29923a9987be

New changeset 9a4882b12218 by Benjamin Peterson in branch '2.7':
simply ignore headers with no name (bpo-19996)
https://hg.python.org/cpython/rev/9a4882b12218

Lukasa mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Dec 16, 2013

bitdancer added the topic-email label Dec 17, 2013

python-dev mannequin closed this as completed Jan 26, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

httplib infinite read on invalid header #64195

httplib infinite read on invalid header #64195

Lukasa mannequin commented Dec 16, 2013

Lukasa mannequin commented Dec 16, 2013

bitdancer commented Dec 16, 2013

Lukasa mannequin commented Dec 16, 2013

bitdancer commented Dec 16, 2013

Lukasa mannequin commented Dec 16, 2013

Lukasa mannequin commented Dec 16, 2013

Lukasa mannequin commented Dec 16, 2013

Lukasa mannequin commented Dec 16, 2013

bitdancer commented Dec 17, 2013

Lukasa mannequin commented Dec 18, 2013

Lukasa mannequin commented Jan 12, 2014

brianyardy mannequin commented Jun 27, 2014

jaywink mannequin commented Aug 2, 2014

python-dev mannequin commented Jan 26, 2015

httplib infinite read on invalid header #64195

httplib infinite read on invalid header #64195

Comments

Lukasa mannequin commented Dec 16, 2013

Lukasa mannequin commented Dec 16, 2013

bitdancer commented Dec 16, 2013

Lukasa mannequin commented Dec 16, 2013

bitdancer commented Dec 16, 2013

Lukasa mannequin commented Dec 16, 2013

Lukasa mannequin commented Dec 16, 2013

Lukasa mannequin commented Dec 16, 2013

Lukasa mannequin commented Dec 16, 2013

bitdancer commented Dec 17, 2013

Lukasa mannequin commented Dec 18, 2013

Lukasa mannequin commented Jan 12, 2014

brianyardy mannequin commented Jun 27, 2014

jaywink mannequin commented Aug 2, 2014

python-dev mannequin commented Jan 26, 2015