Message 100479 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	r.david.murray
Recipients	dongying, r.david.murray
Date	2010-03-05.14:14:04
SpamBayes Score	2.7354428e-09
Marked as misclassified	No
Message-id	<1267798448.27.0.454729164431.issue8054@psf.upfronthosting.co.za>
In-reply-to

Content
We don't fully support setting defaultencoding to anything other than ASCII. The test suite doesn't fully pass, for example, if defaultencoding is set to 'utf-8' in site.py. But that aside, the documentation for MIMEText says: "No guessing or encoding is performed on the text data.". In your first example you are passing it unicode, which is un-encoded. It might be helpful if it threw a ValueError when passed unicode, but it isn't technically a bug that it doesn't, since it does throw an error if you haven't changed defaultencoding. The behavior also can't be changed, since existing code may be depending on being able to pass ascii-only unicode strings in and having them auto-coerced to ascii. Note that the cause of the problem is the fact that the email transport encoder is assuming that the input is binary data and is breaking it up into appropriately sized lines by counting bytes. You've fed it a unicode string, which it then winds up breaking up by unicode character count, then passing the lines to binascii.b2a_base64, which given the non-standard defaultencoding then coerces it to utf-8, which contains a number of bytes different from the original character count, which are then encoded in base64, giving you the uneven length lines in the final output. In Python3 this isn't a problem, since you can't accidentally mix up unicode and bytes in Python3.

We don't fully support setting defaultencoding to anything other than ASCII.  The test suite doesn't fully pass, for example, if defaultencoding is set to 'utf-8' in site.py.

But that aside, the documentation for MIMEText says: "No guessing or encoding is performed on the text data.".  In your first example you are passing it unicode, which is un-encoded.  It might be helpful if it threw a ValueError when passed unicode, but it isn't technically a bug that it doesn't, since it does throw an error if you haven't changed defaultencoding.  The behavior also can't be changed, since existing code may be depending on being able to pass ascii-only unicode strings in and having them auto-coerced to ascii.

Note that the cause of the problem is the fact that the email transport encoder is assuming that the input is binary data and is breaking it up into appropriately sized lines by counting bytes.  You've fed it a unicode string, which it then winds up breaking up by *unicode* character count, then passing the lines to binascii.b2a_base64, which given the non-standard defaultencoding then coerces it to utf-8, which contains a number of bytes different from the original character count, which are then encoded in base64, giving you the uneven length lines in the final output.

In Python3 this isn't a problem, since you can't accidentally mix up unicode and bytes in Python3.

History
Date	User	Action	Args
2010-03-05 14:14:08	r.david.murray	set	recipients: + r.david.murray, dongying
2010-03-05 14:14:08	r.david.murray	set	messageid: <1267798448.27.0.454729164431.issue8054@psf.upfronthosting.co.za>
2010-03-05 14:14:05	r.david.murray	link	issue8054 messages
2010-03-05 14:14:05	r.david.murray	create