Message 254131 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	r.david.murray
Recipients	r.david.murray, tanzer@swing.co.at
Date	2015-11-05.17:35:37
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1446744937.56.0.0775324107287.issue25545@psf.upfronthosting.co.za>
In-reply-to

Content
I agree that the situation is not the best, but it is the one we have. I can't delete those methods now, they've existed in Python3 for too long, and initially were the only thing that worked (albeit only with ASCII only strings). If you can suggest ways of improving the string support without breaking existing python3 code that may be using it (most likely wrongly, but working for them), then I will happily review them. As for "that sounds strange" about non-ascii bodies being non-trivial, remember that the context is the byte-string serialization protocol defined in RFC 5322. This is the evolution of a protocol that started out ascii only, learned something about 8-bit data, then learned something about using bytes for handling other languages. It is an evolutionary mess that has lots of pitfalls. You can't simply serialize a message to unicode, preserving the RFC 5322/MIME markup, and have a valid email, unless you make it a 7-bit clean (ascii only) representation. And that is what the email package does. So, conversely, email can only parse (as a string) a 7-bit, ASCII only, representation. To do what you appear to want, to be able to represent non-ascii as the equivalent unicode cannot work, because email messages may contain binary data which cannot be represented in printable unicode. So, it is unfortunate that a non-ascii body is non-trivial in email, but there's no getting around the fact that it is. The new API in python3 aims to make it as simple as possible, but of course that doesn't help python2 users. But, making unicode easier is one big reason python3 exists (the biggest one, in practice).

I agree that the situation is not the best, but it is the one we have. I can't delete those methods now, they've existed in Python3 for too long, and initially were the only thing that worked (albeit only with ASCII only strings).

If you can suggest ways of improving the string support without breaking existing python3 code that may be using it (most likely wrongly, but working for them), then I will happily review them.

As for "that sounds strange" about non-ascii bodies being non-trivial, remember that the context is the byte-string serialization protocol defined in RFC 5322. This is the *evolution* of a protocol that started out ascii only, learned something about 8-bit data, then learned something about using bytes for handling other languages. It is an evolutionary mess that has lots of pitfalls. You can't simply serialize a message to unicode, preserving the RFC 5322/MIME markup, and have a valid email, unless you make it a 7-bit clean (ascii only) representation. And that is what the email package does. So, conversely, email can only *parse* (as a string) a 7-bit, ASCII only, representation.

To do what you appear to want, to be able to represent non-ascii as the equivalent unicode *cannot work*, because email messages may contain binary data which *cannot* be represented in printable unicode.

So, it is *unfortunate* that a non-ascii body is non-trivial in email, but there's no getting around the fact that it is. The new API in python3 aims to make it as simple as possible, but of course that doesn't help python2 users. But, making unicode easier is one big reason python3 exists (the biggest one, in practice).

History
Date	User	Action	Args
2015-11-05 17:35:37	r.david.murray	set	recipients: + r.david.murray, tanzer@swing.co.at
2015-11-05 17:35:37	r.david.murray	set	messageid: <1446744937.56.0.0775324107287.issue25545@psf.upfronthosting.co.za>
2015-11-05 17:35:37	r.david.murray	link	issue25545 messages
2015-11-05 17:35:37	r.david.murray	create