Message 244721 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	martin.panter
Recipients	Lukasa, barry, demian.brecht, icordasc, martin.panter, mgdelmonte, r.david.murray
Date	2015-06-03.00:58:03
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1433293085.36.0.1992152346.issue24363@psf.upfronthosting.co.za>
In-reply-to

Content
Regarding the suggested fix for Python 2, make sure it does not prematurely end the parsing on empty folded lines (having only tabs and spaces in them). E.g. according to RFC 7230 this should be a single header field: b"Header: obsolete but\r\n" b" \r\n" b" still valid\r\n" I suspect the RFC doesn’t say anything specifically about this case. In general the guidance seems to be things like: * User agents should be tolerant of errors received in the protocol * Proxies should fix up errors when forwarding messages upstream * Servers should often reject errors in requests with 400 Bad Request (presumably to avoid the possibility of a downstream proxy being tricked by the protocol error and not triggering some security filter) In the case of the bank web site, the last lines of the header are: X-Frame-Options: SAMEORIGIN\r\n Set-Cookie: mb-CookieP=; HttpOnly; \r\n Secure\r\n Set-Cookie: mb-CookieP=; HttpOnly; Secure\r\n \r\n It is obvious that this case could be treated as a folded (continuation) line. But in general I think it would be better to ignore the erroneous line, or to record it as a defect so that the server module or other user can check it. Looking at the Python 3 code, both the client and server call http.client._parse_headers(), which sensibly reads each line until it sees a blank line (Lib/http/client.py:197). But then after jumping through some more hoops we parse it again until we find a line that fails the regular expression at Lib/email/feedparser.py:37. The remaining lines become the “payload” of the HTTP header: >>> with urlopen("http://www.merrickbank.com/") as response: ... response.info().get_payload() ... 'Secure\r\nSet-Cookie: mb-CookieP=; HttpOnly; Secure\r\n\r\n' What might be nice is a way to reuse the email header field parsing code, without worrying about the “From” line stuff, or the payload stuff.

Regarding the suggested fix for Python 2, make sure it does not prematurely end the parsing on empty folded lines (having only tabs and spaces in them). E.g. according to RFC 7230 this should be a single header field:

b"Header: obsolete but\r\n"
b"    \r\n"
b"    still valid\r\n"

I suspect the RFC doesn’t say anything specifically about this case. In general the guidance seems to be things like:

* User agents should be tolerant of errors received in the protocol
* Proxies should fix up errors when forwarding messages upstream
* Servers should often reject errors in requests with 400 Bad Request (presumably to avoid the possibility of a downstream proxy being tricked by the protocol error and not triggering some security filter)

In the case of the bank web site, the last lines of the header are:

X-Frame-Options: SAMEORIGIN\r\n
Set-Cookie: mb-CookieP=; HttpOnly; \r\n
Secure\r\n
Set-Cookie: mb-CookieP=; HttpOnly; Secure\r\n
\r\n

It is obvious that this case could be treated as a folded (continuation) line. But in general I think it would be better to ignore the erroneous line, or to record it as a defect so that the server module or other user can check it.

Looking at the Python 3 code, both the client and server call http.client._parse_headers(), which sensibly reads each line until it sees a blank line (Lib/http/client.py:197). But then after jumping through some more hoops we parse it again until we find a line that fails the regular expression at Lib/email/feedparser.py:37. The remaining lines become the “payload” of the HTTP header:

>>> with urlopen("http://www.merrickbank.com/") as response:
...     response.info().get_payload()
... 
'Secure\r\nSet-Cookie: mb-CookieP=; HttpOnly; Secure\r\n\r\n'

What might be nice is a way to reuse the email header field parsing code, without worrying about the “From” line stuff, or the payload stuff.

History
Date	User	Action	Args
2015-06-03 00:58:05	martin.panter	set	recipients: + martin.panter, barry, r.david.murray, icordasc, demian.brecht, Lukasa, mgdelmonte
2015-06-03 00:58:05	martin.panter	set	messageid: <1433293085.36.0.1992152346.issue24363@psf.upfronthosting.co.za>
2015-06-03 00:58:05	martin.panter	link	issue24363 messages
2015-06-03 00:58:03	martin.panter	create