Message 255440 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	martin.panter
Recipients	barry, demian.brecht, ezio.melotti, gregory.p.smith, martin.panter, r.david.murray, scharron, serhiy.storchaka
Date	2015-11-26.23:19:02
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1448579943.5.0.317912844783.issue22233@psf.upfronthosting.co.za>
In-reply-to

Content
For the record, this is a simplified version of the original scenario, showing the low-level HTTP protocol: >>> request = ( ... b"GET /%C4%85 HTTP/1.1\r\n" ... b"Host: graph.facebook.com\r\n" ... b"\r\n" ... ) >>> s = create_connection(("graph.facebook.com", HTTPS_PORT)) >>> with ssl.wrap_socket(s) as s: ... s.sendall(request) ... response = s.recv(3000) ... 50 >>> pprint(response.splitlines(keepends=True)) [b'HTTP/1.1 404 Not Found\r\n', b'WWW-Authenticate: OAuth "Facebook Platform" "not_found" "(#803) Some of the ' b'aliases you requested do not exist: \xc4\x85"\r\n', b'Access-Control-Allow-Origin: *\r\n', b'Content-Type: text/javascript; charset=UTF-8\r\n', b'X-FB-Trace-ID: H9yxnVcQFuA\r\n', b'X-FB-Rev: 2063232\r\n', b'Pragma: no-cache\r\n', b'Cache-Control: no-store\r\n', b'Facebook-API-Version: v2.0\r\n', b'Expires: Sat, 01 Jan 2000 00:00:00 GMT\r\n', b'X-FB-Debug: 07ouxMl1Z439Ke/YzHSjXx3o9PcpGeZBFS7yrGwTzaaudrZWe5Ef8Z96oSo2dINp' b'3GR4q78+1oHDX2pUF2ky1A==\r\n', b'Date: Thu, 26 Nov 2015 23:03:47 GMT\r\n', b'Connection: keep-alive\r\n', b'Content-Length: 147\r\n', b'\r\n', b'{"error":{"message":"(#803) Some of the aliases you requested do not exist: ' b'\\u0105","type":"OAuthException","code":803,"fbtrace_id":"H9yxnVcQFuA"}}'] In my mind, the simplest way forward would be to change the “email” module to only parse lines using the “universal newlines” algorithm. The /Lib/email/feedparser.py module should use StringIO(s, newline="").readlines() rather than s.splitlines(keepends=True). That would mean all email parsing behaviour would change; for instance, given the following message: >>> m = email.message_from_string( ... "WWW-Authenticate: abc\x85<body or header?>\r\n" ... "\r\n" ... ) instead of the current behaviour: >>> m.items() [('WWW-Authenticate', 'abc\x85')] >>> m.get_payload() '<body or header?>\r\n\r\n' it would change to: >>> m.items() [('WWW-Authenticate', 'abc\x85<body or header?>')] >>> m.get_payload() ''

For the record, this is a simplified version of the original scenario, showing the low-level HTTP protocol:

>>> request = (
...     b"GET /%C4%85 HTTP/1.1\r\n"
...     b"Host: graph.facebook.com\r\n"
...     b"\r\n"
... )
>>> s = create_connection(("graph.facebook.com", HTTPS_PORT))
>>> with ssl.wrap_socket(s) as s:
...     s.sendall(request)
...     response = s.recv(3000)
... 
50
>>> pprint(response.splitlines(keepends=True))
[b'HTTP/1.1 404 Not Found\r\n',
 b'WWW-Authenticate: OAuth "Facebook Platform" "not_found" "(#803) Some of the '
 b'aliases you requested do not exist: \xc4\x85"\r\n',
 b'Access-Control-Allow-Origin: *\r\n',
 b'Content-Type: text/javascript; charset=UTF-8\r\n',
 b'X-FB-Trace-ID: H9yxnVcQFuA\r\n',
 b'X-FB-Rev: 2063232\r\n',
 b'Pragma: no-cache\r\n',
 b'Cache-Control: no-store\r\n',
 b'Facebook-API-Version: v2.0\r\n',
 b'Expires: Sat, 01 Jan 2000 00:00:00 GMT\r\n',
 b'X-FB-Debug: 07ouxMl1Z439Ke/YzHSjXx3o9PcpGeZBFS7yrGwTzaaudrZWe5Ef8Z96oSo2dINp'
 b'3GR4q78+1oHDX2pUF2ky1A==\r\n',
 b'Date: Thu, 26 Nov 2015 23:03:47 GMT\r\n',
 b'Connection: keep-alive\r\n',
 b'Content-Length: 147\r\n',
 b'\r\n',
 b'{"error":{"message":"(#803) Some of the aliases you requested do not exist: '
 b'\\u0105","type":"OAuthException","code":803,"fbtrace_id":"H9yxnVcQFuA"}}']

In my mind, the simplest way forward would be to change the “email” module to only parse lines using the “universal newlines” algorithm. The /Lib/email/feedparser.py module should use StringIO(s, newline="").readlines() rather than s.splitlines(keepends=True). That would mean all email parsing behaviour would change; for instance, given the following message:

>>> m = email.message_from_string(
...     "WWW-Authenticate: abc\x85<body or header?>\r\n"
...     "\r\n"
... )

instead of the current behaviour:

>>> m.items()
[('WWW-Authenticate', 'abc\x85')]
>>> m.get_payload()
'<body or header?>\r\n\r\n'

it would change to:

>>> m.items()
[('WWW-Authenticate', 'abc\x85<body or header?>')]
>>> m.get_payload()
''

History
Date	User	Action	Args
2015-11-26 23:19:03	martin.panter	set	recipients: + martin.panter, barry, gregory.p.smith, ezio.melotti, r.david.murray, serhiy.storchaka, demian.brecht, scharron
2015-11-26 23:19:03	martin.panter	set	messageid: <1448579943.5.0.317912844783.issue22233@psf.upfronthosting.co.za>
2015-11-26 23:19:03	martin.panter	link	issue22233 messages
2015-11-26 23:19:02	martin.panter	create