Issue 22233: http.client splits headers on non-\r\n characters

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/66429

classification

Title:	http.client splits headers on non-\r\n characters
Type:	behavior	Stage:	resolved
Components:	email, Library (Lib)	Versions:	Python 3.6, Python 3.5

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	Clay Gerrard, barry, demian.brecht, doughellmann, ezio.melotti, gregory.p.smith, martin.panter, python-dev, r.david.murray, scharron, serhiy.storchaka
Priority:	normal	Keywords:	patch

Created on 2014-08-20 10:02 by scharron, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
crlf-headers.patch	martin.panter, 2015-11-28 01:34		review
crlf-headers2.patch	r.david.murray, 2016-09-07 17:26		review

Messages (13)
msg225562 - (view)	Author: Samuel Charron (scharron)	Date: 2014-08-20 10:02
In some cases, the headers from http.client (that uses email.feedparser) splits headers at wrong separators. The bug is from the use of str.splitlines (in email.feedparser) that splits on other characters than \r\n as it should. (See bug http://bugs.python.org/issue22232) To reproduce the bug : import http.client c = http.client.HTTPSConnection("graph.facebook.com") c.request("GET", "/%C4%85", None, {"test": "\x85"}) r = c.getresponse() print(r.headers) print(r.headers.keys()) print(r.headers.get("WWW-Authenticate")) As you can see, the WWW-Authenticate is wrongly parsed (it misses its final "), and therefore the rest of the headers are ignored.
msg225565 - (view)	Author: Samuel Charron (scharron)	Date: 2014-08-20 12:23
A consequence of this bug is that r.read() blocks until a timeout occurs since the content-length header is not interpreted (I think this is related to the HTTPResponse.__init__ comment)
msg246569 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2015-07-10 16:59
The obvious fix seems to be to not use splitlines but explicitly split on the allowed characters for ASCII based protocols and formats that only want \r and \n to be considered. I don't think we can rightfully change the unicode splitlines behavior.
msg255440 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-11-26 23:19
For the record, this is a simplified version of the original scenario, showing the low-level HTTP protocol: >>> request = ( ... b"GET /%C4%85 HTTP/1.1\r\n" ... b"Host: graph.facebook.com\r\n" ... b"\r\n" ... ) >>> s = create_connection(("graph.facebook.com", HTTPS_PORT)) >>> with ssl.wrap_socket(s) as s: ... s.sendall(request) ... response = s.recv(3000) ... 50 >>> pprint(response.splitlines(keepends=True)) [b'HTTP/1.1 404 Not Found\r\n', b'WWW-Authenticate: OAuth "Facebook Platform" "not_found" "(#803) Some of the ' b'aliases you requested do not exist: \xc4\x85"\r\n', b'Access-Control-Allow-Origin: *\r\n', b'Content-Type: text/javascript; charset=UTF-8\r\n', b'X-FB-Trace-ID: H9yxnVcQFuA\r\n', b'X-FB-Rev: 2063232\r\n', b'Pragma: no-cache\r\n', b'Cache-Control: no-store\r\n', b'Facebook-API-Version: v2.0\r\n', b'Expires: Sat, 01 Jan 2000 00:00:00 GMT\r\n', b'X-FB-Debug: 07ouxMl1Z439Ke/YzHSjXx3o9PcpGeZBFS7yrGwTzaaudrZWe5Ef8Z96oSo2dINp' b'3GR4q78+1oHDX2pUF2ky1A==\r\n', b'Date: Thu, 26 Nov 2015 23:03:47 GMT\r\n', b'Connection: keep-alive\r\n', b'Content-Length: 147\r\n', b'\r\n', b'{"error":{"message":"(#803) Some of the aliases you requested do not exist: ' b'\\u0105","type":"OAuthException","code":803,"fbtrace_id":"H9yxnVcQFuA"}}'] In my mind, the simplest way forward would be to change the “email” module to only parse lines using the “universal newlines” algorithm. The /Lib/email/feedparser.py module should use StringIO(s, newline="").readlines() rather than s.splitlines(keepends=True). That would mean all email parsing behaviour would change; for instance, given the following message: >>> m = email.message_from_string( ... "WWW-Authenticate: abc\x85<body or header?>\r\n" ... "\r\n" ... ) instead of the current behaviour: >>> m.items() [('WWW-Authenticate', 'abc\x85')] >>> m.get_payload() '<body or header?>\r\n\r\n' it would change to: >>> m.items() [('WWW-Authenticate', 'abc\x85<body or header?>')] >>> m.get_payload() ''
msg255460 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2015-11-27 14:04
I agree. Can you update the email issue with this suggestion and/or a patch? The problem with this, of course is backward compatibility, but since it is more correct per the RFCs, our past policy has been to fix it anyway.
msg255518 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-11-28 01:34
David: what is the email issue you mentioned? In the mean time, I am uploading a patch to this issue. It seems using StringIO is a bit slower than str.splitlines(). I found a way to optimize building long lines, which compensated a lot of the loss, but this optimization would apply even without using StringIO. My patch makes test.test_email 0.3% slower (the optimization alone would make it 4.4% faster), and test_email.TestFeedParsers.test_long_lines() is 3% slower (optimization 12% faster). I also tried two other alternatives to str.splitlines(), but they were both slower than the StringIO technique: * _partial is a list of UTF-8 bytes; join and use bytes.splitlines() * _partial is a UTF-8 bytearray; use bytearray.splitlines()
msg273983 - (view)	Author: Clay Gerrard (Clay Gerrard)	Date: 2016-08-31 00:34
BUMP. ;) This issue was recently raised as one blocker to OpenStack Object Storage (Swift) finishing our port to python3 (we're hoping to finish adding support >=3.5 by Spring '17 - /me crosses fingers). I wonder if someone might confirm or deny the attached patch is likely to be included in the 3.6 timeframe (circa 12/16?) and/or back-ported to the 3.5 series? FWIW, I would echo other's sentiment that I would much prefer the implementation to be correct even if there was some worry we might have to choose between further optimization and getting a fix ASAP :D Warm Regards, -Clay
msg273988 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-08-31 01:45
If someone reviews my patch and thinks it is fine, I might commit it. Maybe I can just re-review it myself, now that I have forgotten all the details :) If messing with the email package is a problem (performance, or compatibility), another option is to keep the changes to the HTTP module (which I would be more confident in changing on my own). I have another patch for review at Issue 24363 which apparently also fixes this splitlines() bug.
msg274013 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2016-08-31 10:38
I'm hoping to take a look at all of these at the core sprint next week.
msg274840 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2016-09-07 17:09
This looks good to me. However, although it is by no means obvious, the tests in test_parser are supposed to be for the new policies. When I changed the test to test them another place that needed to fixed was revealed. I've updated the patch accordingly.
msg274899 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-09-07 21:47
New changeset 69900c5992c5 by R David Murray in branch '3.5': #22233: Only split headers on \r and/or \n, per email RFCs. https://hg.python.org/cpython/rev/69900c5992c5 New changeset 4d2369b901be by R David Murray in branch 'default': Merge: #22233: Only split headers on \r and/or \n, per email RFCs. https://hg.python.org/cpython/rev/4d2369b901be
msg274900 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2016-09-07 21:49
I want to stack another patch on top of this, so I committed it. If you see anything I screwed up, Martin, please let me know.
msg274903 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-09-07 22:44
Your modifications look sensible David; thanks for handling this.

History
Date	User	Action	Args
2022-04-11 14:58:07	admin	set	github: 66429
2016-09-07 22:44:26	martin.panter	set	messages: + msg274903
2016-09-07 21:49:00	r.david.murray	set	status: open -> closed resolution: fixed messages: + msg274900 stage: commit review -> resolved
2016-09-07 21:47:50	python-dev	set	nosy: + python-dev messages: + msg274899
2016-09-07 17:27:24	r.david.murray	set	files: - crlf-headers2.patch
2016-09-07 17:26:59	r.david.murray	set	files: + crlf-headers2.patch
2016-09-07 17:26:36	r.david.murray	set	stage: patch review -> commit review
2016-09-07 17:09:34	r.david.murray	set	files: + crlf-headers2.patch messages: + msg274840
2016-08-31 10:38:21	r.david.murray	set	messages: + msg274013
2016-08-31 02:18:34	doughellmann	set	nosy: + doughellmann
2016-08-31 01:45:36	martin.panter	set	messages: + msg273988 versions: - Python 3.4
2016-08-31 00:34:55	Clay Gerrard	set	nosy: + Clay Gerrard messages: + msg273983
2016-08-10 02:27:38	martin.panter	link	issue27716 superseder
2015-11-28 01:34:32	martin.panter	set	files: + crlf-headers.patch versions: + Python 3.6 messages: + msg255518 keywords: + patch stage: needs patch -> patch review
2015-11-27 14:04:41	r.david.murray	set	messages: + msg255460
2015-11-26 23:19:03	martin.panter	set	messages: + msg255440
2015-07-10 16:59:23	gregory.p.smith	set	nosy: + gregory.p.smith messages: + msg246569 title: http.client splits headers on none-\r\n characters -> http.client splits headers on non-\r\n characters
2015-03-17 06:31:13	martin.panter	set	nosy: + martin.panter
2014-10-28 09:41:35	ezio.melotti	set	nosy: + ezio.melotti stage: needs patch
2014-08-23 16:20:48	serhiy.storchaka	set	nosy: + serhiy.storchaka
2014-08-23 13:45:05	r.david.murray	set	nosy: + barry components: + email
2014-08-20 21:52:16	demian.brecht	set	nosy: + demian.brecht
2014-08-20 12:23:25	scharron	set	messages: + msg225565
2014-08-20 12:00:44	r.david.murray	set	nosy: + r.david.murray
2014-08-20 10:02:33	scharron	create