Message 106964 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	r.david.murray
Recipients	barry, bgamari, l0nwlf, maxua, r.david.murray, tony_nelson
Date	2010-06-03.16:31:20
SpamBayes Score	2.0485908e-05
Marked as misclassified	No
Message-id	<1275582683.23.0.384513249134.issue4487@psf.upfronthosting.co.za>
In-reply-to

Content
For various reasons the email module has a table of character sets. What might be most effective would be for the email module to look a character set name up in the codecs module and find out the cannonical name of the character set, and then look that up in its table (ie: remove the aliases table from email completely, and instead depend on codecs to resolve the cannonical name). Unfortunately the codecs module does not recognize all of the aliases used by email, nor is there necessarily any guarantee that the two modules will agree on the proper cannonical name. The attached patch instead uses the codecs module as a fallback if the charset name does not appear in the email package's ALIASES or CHARSETS tables. It therefore makes both utf8 and utf_8 work, as well as all the other variants the codec module accepts. The unit test just tests 'utf8', since if that one works all the others should too. I'm tentatively reclassifying this as a bug rather than a feature request, since I think it is a reasonable expectation that email would support at least the same set of encoding names that the rest of Python does.

For various reasons the email module has a table of character sets.  What might be most effective would be for the email module to look a character set name up in the codecs module and find out the cannonical name of the character set, and then look that up in its table (ie: remove the aliases table from email completely, and instead depend on codecs to resolve the cannonical name).  Unfortunately the codecs module does not recognize all of the aliases used by email, nor is there necessarily any guarantee that the two modules will agree on the proper cannonical name.

The attached patch instead uses the codecs module as a fallback if the charset name does not appear in the email package's ALIASES or CHARSETS tables.  It therefore makes both utf8 and utf_8 work, as well as all the other variants the codec module accepts.  The unit test just tests 'utf8', since if that one works all the others should too.

I'm tentatively reclassifying this as a bug rather than a feature request, since I think it is a reasonable expectation that email would support at least the same set of encoding names that the rest of Python does.

History
Date	User	Action	Args
2010-06-03 16:31:23	r.david.murray	set	recipients: + r.david.murray, barry, tony_nelson, maxua, bgamari, l0nwlf
2010-06-03 16:31:23	r.david.murray	set	messageid: <1275582683.23.0.384513249134.issue4487@psf.upfronthosting.co.za>
2010-06-03 16:31:21	r.david.murray	link	issue4487 messages
2010-06-03 16:31:20	r.david.murray	create