Author jason.coombs
Recipients barry, jason.coombs, jayvdb, r.david.murray, tanzer@swing.co.at
Date 2018-12-05.20:36:43
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1544042203.44.0.788709270274.issue25545@psf.upfronthosting.co.za>
In-reply-to
Content
I don't think this ticket should be implemented as described.

Consider the use-case in importlib_metadata, which loads metadata from a package, metadata known to be of a specified encoding. It already knows the encoding and has decoded the full message to text and now wants to parse it. It seems very much in the remit of something like email.parser to parse already-decoded content.

Yes, the RFCs describe how to decode bytes content, but that shouldn't preclude the e-mail module from supporting parsing from Unicode text.

And in fact, it does seem that the library is able to parse non-ascii Unicode text, especially on Python 3. Consider 'parse-text.py', attached. It illustrates that the parser currently mostly meets my expectation - on Python 2.7 and 3.7, e-mail messages are parsed from unicode text without any indication of an encoding, and returning unicode text on both Python 2 and Python 3.

Python 2 is deficient in that message_from_string will get a UnicodeEncodeError constructing a bytes-oriented StringIO from the input, which is easily worked-around by using the text-oriented io.StringIO.

Still, I would argue the current behavior is desirable and shouldn't be deprecated.
History
Date User Action Args
2018-12-05 20:36:43jason.coombssetrecipients: + jason.coombs, barry, r.david.murray, jayvdb, tanzer@swing.co.at
2018-12-05 20:36:43jason.coombssetmessageid: <1544042203.44.0.788709270274.issue25545@psf.upfronthosting.co.za>
2018-12-05 20:36:43jason.coombslinkissue25545 messages
2018-12-05 20:36:43jason.coombscreate