This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author r.david.murray
Recipients jeffknupp, mitya57, r.david.murray
Date 2012-03-22.18:57:04
SpamBayes Score 4.6311942e-07
Marked as misclassified No
Message-id <1332442626.06.0.68801101853.issue14380@psf.upfronthosting.co.za>
In-reply-to
Content
Pretty close.  I'd do the check for us_ascii first, and only do the encode test/switch to utf-8 if that's the charset.  The reason is that that if a charset has been specified, we don't waste time doing an unnecessary encoding (and the ascii codec is very fast, which you can't say about all the codecs).

Now, what would be *really* nice is to also try latin-1 before falling back to utf-8, but I wouldn't want to make that the default behavior for performance reasons.  I'm planning to add support for that at some point, but I haven't decided exactly how (policy setting? New optional setting in the alias structure?)

There seem to be unrelated changes to torture_test in your patch?
History
Date User Action Args
2012-03-22 18:57:06r.david.murraysetrecipients: + r.david.murray, mitya57, jeffknupp
2012-03-22 18:57:06r.david.murraysetmessageid: <1332442626.06.0.68801101853.issue14380@psf.upfronthosting.co.za>
2012-03-22 18:57:05r.david.murraylinkissue14380 messages
2012-03-22 18:57:05r.david.murraycreate