This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author r.david.murray
Recipients barry, l0nwlf, r.david.murray, ynkdir
Date 2010-05-05.01:11:00
SpamBayes Score 2.8659636e-06
Marked as misclassified No
Message-id <1273021864.16.0.310348110403.issue7472@psf.upfronthosting.co.za>
In-reply-to
Content
Comments on patch:

We prefer patches to be generated from the top level directory of the checkout, so that it can be applied by doing 'patch -p0 <xxx.patch' from the top level directory without having to look in the patch file to see what directory it was generated in.

The test is correct in general outline, but the thing that needs to be tested is that when a byte string in a character encoding that is eight bit, but whose output encoding (the encoding email will use when writing the message) is 7bit, that 7bit will be chosen for the transfer encoding.  If you look in Lib/email/charsets.py, there are only two such character sets, euc-jp and shift-jis.

We discussed this in IRC, and you found a euc-js character, but then said that the test passed even with the fix removed.  The byte string you are using in the bytes test you posted does not appear to be encodable in the output character set (iso-2022-jp). I'm guessing you used this, and _charset='iso-2022-jp', because otherwise the test passes without the fix.

That is, if put in a character, such as 文 ('\xca\xb8' as an euc-jp encoded byte stream), and pass _encoding='euc-jp', the test passes without the fix.  So, you were exactly right in what you said in IRC, and should have posted that version of the unit test :)

Looking at the code even more carefully than I did last time, it turns out that as soon as the 'charset' is set, the payload gets translated from the input character set to the output character set, and *then* encode_7or8bit is called.  As far as I have been able to figure out, there is no way for encode_7or8bit to get called with the payload encoded in the input character set (not even if it is called directly, since it is passed a message instance and so set_charset must already have been called on the message instance it is passed).

So it turns out that the if test is not needed after all.
History
Date User Action Args
2010-05-05 01:11:04r.david.murraysetrecipients: + r.david.murray, barry, ynkdir, l0nwlf
2010-05-05 01:11:04r.david.murraysetmessageid: <1273021864.16.0.310348110403.issue7472@psf.upfronthosting.co.za>
2010-05-05 01:11:01r.david.murraylinkissue7472 messages
2010-05-05 01:11:00r.david.murraycreate