Message 331159 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	jaraco
Recipients	barry, jaraco, jayvdb, r.david.murray, tanzer@swing.co.at
Date	2018-12-05.20:36:43
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1544042203.44.0.788709270274.issue25545@psf.upfronthosting.co.za>
In-reply-to

Content
I don't think this ticket should be implemented as described. Consider the use-case in importlib_metadata, which loads metadata from a package, metadata known to be of a specified encoding. It already knows the encoding and has decoded the full message to text and now wants to parse it. It seems very much in the remit of something like email.parser to parse already-decoded content. Yes, the RFCs describe how to decode bytes content, but that shouldn't preclude the e-mail module from supporting parsing from Unicode text. And in fact, it does seem that the library is able to parse non-ascii Unicode text, especially on Python 3. Consider 'parse-text.py', attached. It illustrates that the parser currently mostly meets my expectation - on Python 2.7 and 3.7, e-mail messages are parsed from unicode text without any indication of an encoding, and returning unicode text on both Python 2 and Python 3. Python 2 is deficient in that message_from_string will get a UnicodeEncodeError constructing a bytes-oriented StringIO from the input, which is easily worked-around by using the text-oriented io.StringIO. Still, I would argue the current behavior is desirable and shouldn't be deprecated.

I don't think this ticket should be implemented as described.

Consider the use-case in importlib_metadata, which loads metadata from a package, metadata known to be of a specified encoding. It already knows the encoding and has decoded the full message to text and now wants to parse it. It seems very much in the remit of something like email.parser to parse already-decoded content.

Yes, the RFCs describe how to decode bytes content, but that shouldn't preclude the e-mail module from supporting parsing from Unicode text.

And in fact, it does seem that the library is able to parse non-ascii Unicode text, especially on Python 3. Consider 'parse-text.py', attached. It illustrates that the parser currently mostly meets my expectation - on Python 2.7 and 3.7, e-mail messages are parsed from unicode text without any indication of an encoding, and returning unicode text on both Python 2 and Python 3.

Python 2 is deficient in that message_from_string will get a UnicodeEncodeError constructing a bytes-oriented StringIO from the input, which is easily worked-around by using the text-oriented io.StringIO.

Still, I would argue the current behavior is desirable and shouldn't be deprecated.

History
Date	User	Action	Args
2018-12-05 20:36:43	jaraco	set	recipients: + jaraco, barry, r.david.murray, jayvdb, tanzer@swing.co.at
2018-12-05 20:36:43	jaraco	set	messageid: <1544042203.44.0.788709270274.issue25545@psf.upfronthosting.co.za>
2018-12-05 20:36:43	jaraco	link	issue25545 messages
2018-12-05 20:36:43	jaraco	create