This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients eric.araujo, ezio.melotti, l0nwlf, lemburg, maker, r.david.murray
Date 2011-05-26.17:57:43
SpamBayes Score 3.7156833e-11
Marked as misclassified No
Message-id <4DDE948A.7020208@egenix.com>
In-reply-to <1306429301.21.0.604014200117.issue8898@psf.upfronthosting.co.za>
Content
R. David Murray wrote:
> 
> R. David Murray <rdmurray@bitdance.com> added the comment:
> 
> What is not-a-charset?
>
> I apparently misunderstood what normalize_encodings does.  It isn't doing a lookup in the codecs registry and returning the canonical name for the codec.  Does that mean we actually have to fetch the codec in order to get the canonical name?  I suspect so, and that is probably OK, since in most cases the codec is eventually going to get called while processing the email that triggered the ALIASES lookup.
> 
> I also notice that there is a table of aliases in the codec module documentation, so that will need to be updated as well.

As far as the aliases.py part of the patch goes, I'm fine with that
since it corrects a few real bugs and adds the missing Latin-N
codec names.

Regarding using this table in the email package, I'm not really
clear on what you want to achieve.

If you are looking for a way to determine whether Python has a codec
installed for a certain charset name, then codecs.lookup() will
tell you this (and it also applies all the aliasing and normalization
needed).

If you want to avoid the actual codec module import (codecs.lookup()
imports the module), you can mimic the logic used by the lookup function
of the encodings package. Not sure, whether that's worth it, though,
since it is rather likely that you're going to use the codec you've
just looked up soon after the test and codecs.lookup() caches the
found codecs.

If you want to convert an arbitrary encoding name to a registered
standard IANA MIME charset name, then the aliases.py module is not
going to be of much help, since we are using our own canonical
names which do not necessarily map to the MIME charset names.

You'd have to add a new mime_alias map to the email package
for that. I'd suggest to use the same approach as for the
aliases.py module, which is to first normalize the encoding
name using normalize_encoding() and then running that through
the mime_alias map.

Hope that helps.
History
Date User Action Args
2011-05-26 17:57:44lemburgsetrecipients: + lemburg, ezio.melotti, eric.araujo, r.david.murray, l0nwlf, maker
2011-05-26 17:57:43lemburglinkissue8898 messages
2011-05-26 17:57:43lemburgcreate