Author martin.panter
Recipients barry, demian.brecht, ezio.melotti, gregory.p.smith, martin.panter, r.david.murray, scharron, serhiy.storchaka
Date 2015-11-26.23:19:02
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1448579943.5.0.317912844783.issue22233@psf.upfronthosting.co.za>
In-reply-to
Content
For the record, this is a simplified version of the original scenario, showing the low-level HTTP protocol:

>>> request = (
...     b"GET /%C4%85 HTTP/1.1\r\n"
...     b"Host: graph.facebook.com\r\n"
...     b"\r\n"
... )
>>> s = create_connection(("graph.facebook.com", HTTPS_PORT))
>>> with ssl.wrap_socket(s) as s:
...     s.sendall(request)
...     response = s.recv(3000)
... 
50
>>> pprint(response.splitlines(keepends=True))
[b'HTTP/1.1 404 Not Found\r\n',
 b'WWW-Authenticate: OAuth "Facebook Platform" "not_found" "(#803) Some of the '
 b'aliases you requested do not exist: \xc4\x85"\r\n',
 b'Access-Control-Allow-Origin: *\r\n',
 b'Content-Type: text/javascript; charset=UTF-8\r\n',
 b'X-FB-Trace-ID: H9yxnVcQFuA\r\n',
 b'X-FB-Rev: 2063232\r\n',
 b'Pragma: no-cache\r\n',
 b'Cache-Control: no-store\r\n',
 b'Facebook-API-Version: v2.0\r\n',
 b'Expires: Sat, 01 Jan 2000 00:00:00 GMT\r\n',
 b'X-FB-Debug: 07ouxMl1Z439Ke/YzHSjXx3o9PcpGeZBFS7yrGwTzaaudrZWe5Ef8Z96oSo2dINp'
 b'3GR4q78+1oHDX2pUF2ky1A==\r\n',
 b'Date: Thu, 26 Nov 2015 23:03:47 GMT\r\n',
 b'Connection: keep-alive\r\n',
 b'Content-Length: 147\r\n',
 b'\r\n',
 b'{"error":{"message":"(#803) Some of the aliases you requested do not exist: '
 b'\\u0105","type":"OAuthException","code":803,"fbtrace_id":"H9yxnVcQFuA"}}']

In my mind, the simplest way forward would be to change the “email” module to only parse lines using the “universal newlines” algorithm. The /Lib/email/feedparser.py module should use StringIO(s, newline="").readlines() rather than s.splitlines(keepends=True). That would mean all email parsing behaviour would change; for instance, given the following message:

>>> m = email.message_from_string(
...     "WWW-Authenticate: abc\x85<body or header?>\r\n"
...     "\r\n"
... )

instead of the current behaviour:

>>> m.items()
[('WWW-Authenticate', 'abc\x85')]
>>> m.get_payload()
'<body or header?>\r\n\r\n'

it would change to:

>>> m.items()
[('WWW-Authenticate', 'abc\x85<body or header?>')]
>>> m.get_payload()
''
History
Date User Action Args
2015-11-26 23:19:03martin.pantersetrecipients: + martin.panter, barry, gregory.p.smith, ezio.melotti, r.david.murray, serhiy.storchaka, demian.brecht, scharron
2015-11-26 23:19:03martin.pantersetmessageid: <1448579943.5.0.317912844783.issue22233@psf.upfronthosting.co.za>
2015-11-26 23:19:03martin.panterlinkissue22233 messages
2015-11-26 23:19:02martin.pantercreate