This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author r.david.murray
Recipients Lukasa, r.david.murray
Date 2016-08-09.14:11:38
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1470751898.39.0.263124391467.issue27716@psf.upfronthosting.co.za>
In-reply-to
Content
Well, email will happily parse bytes and treat the non-ascii data as opaque (though it does record errors in an internal data structure), but the python3 http api expects the parsed headers to be strings when you access them, so you'd just hit the decoding problem at that point rather than earlier.

This is a hard problem. Since headers *can* be latin1 (I'd forgotten that) SMTPUTF8 won't work.  We are stuck against the problem that python makes a careful distinction between bytes and string, but http does not.

In theory we could pass bytes to email, and then provide a new API for getting at the "raw" (bytes) header so you can decode it however you want.  That runs into backward compatibility problems, though, since we currently do decode from latin-1 and many programs are probably relying on that.  

Throwing out an idea here: maybe having the http policy decode the parsed bytes header from latin-1 when headers are accessed through the normal API would preserve backward compatibility.  I'm not too worried about back-compat in the http policy, since it is provisional until 3.6 comes out and I doubt anyone is currently using it.
History
Date User Action Args
2016-08-09 14:11:38r.david.murraysetrecipients: + r.david.murray, Lukasa
2016-08-09 14:11:38r.david.murraysetmessageid: <1470751898.39.0.263124391467.issue27716@psf.upfronthosting.co.za>
2016-08-09 14:11:38r.david.murraylinkissue27716 messages
2016-08-09 14:11:38r.david.murraycreate