Message 163791 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	r.david.murray
Recipients	barry, mitya57, r.david.murray, v+python
Date	2012-06-24.14:48:03
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1340549287.11.0.173590036197.issue15016@psf.upfronthosting.co.za>
In-reply-to

Content
Well, the original change to using utf-8 by default was considered a bug fix. But I suppose you are right that this goes beyond that into enhancement territory. In which case we could wait for an enhancement to the C API to base it on, for which we'd need to open a new issue. On the other hand, the email package already uses the "encode to see if we have ascii" trick elsewhere (though on smaller strings), and the ascii codec is the fastest codec, with latin-1 only slightly slower. The critical difference here, though, is that we end up doing two encoding passes, once to test it and a second time to actually create the message body. The same is true of the ascii case. It should be possible to fix this, by using the encoded string in generating the _payload, short circuiting the set_payload mechanism. That's a somewhat ugly hack, necessitated because of the incomplete conversion of email to a unicode-centric design. I'm working on that :) So, again, we may be waiting on other enhancements, in this case in the email package, to do this fix "right". But it would be worth figuring out how to do it, so that we know what kind of (internal?) API enhancements we want in order to serve this kind of use case.

Well, the original change to using utf-8 by default was considered a bug fix.  But I suppose you are right that this goes beyond that into enhancement territory.  In which case we could wait for an enhancement to the C API to base it on, for which we'd need to open a new issue.

On the other hand, the email package already uses the "encode to see if we have ascii" trick elsewhere (though on smaller strings), and the ascii codec is the fastest codec, with latin-1 only slightly slower.

The critical difference here, though, is that we end up doing two encoding passes, once to test it and a second time to actually create the message body.  The same is true of the ascii case.  It should be possible to fix this, by using the encoded string in generating the _payload, short circuiting the set_payload mechanism.  That's a somewhat ugly hack, necessitated because of the incomplete conversion of email to a unicode-centric design.  I'm working on that :)

So, again, we may be waiting on other enhancements, in this case in the email package, to do this fix "right".  But it would be worth figuring out *how* to do it, so that we know what kind of (internal?) API enhancements we want in order to serve this kind of use case.

History
Date	User	Action	Args
2012-06-24 14:48:07	r.david.murray	set	recipients: + r.david.murray, barry, v+python, mitya57
2012-06-24 14:48:07	r.david.murray	set	messageid: <1340549287.11.0.173590036197.issue15016@psf.upfronthosting.co.za>
2012-06-24 14:48:06	r.david.murray	link	issue15016 messages
2012-06-24 14:48:03	r.david.murray	create