Message 136989 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	eric.araujo, ezio.melotti, l0nwlf, lemburg, maker, r.david.murray
Date	2011-05-26.17:57:43
SpamBayes Score	3.7156833e-11
Marked as misclassified	No
Message-id	<4DDE948A.7020208@egenix.com>
In-reply-to	<1306429301.21.0.604014200117.issue8898@psf.upfronthosting.co.za>

Content
R. David Murray wrote: > > R. David Murray <rdmurray@bitdance.com> added the comment: > > What is not-a-charset? > > I apparently misunderstood what normalize_encodings does. It isn't doing a lookup in the codecs registry and returning the canonical name for the codec. Does that mean we actually have to fetch the codec in order to get the canonical name? I suspect so, and that is probably OK, since in most cases the codec is eventually going to get called while processing the email that triggered the ALIASES lookup. > > I also notice that there is a table of aliases in the codec module documentation, so that will need to be updated as well. As far as the aliases.py part of the patch goes, I'm fine with that since it corrects a few real bugs and adds the missing Latin-N codec names. Regarding using this table in the email package, I'm not really clear on what you want to achieve. If you are looking for a way to determine whether Python has a codec installed for a certain charset name, then codecs.lookup() will tell you this (and it also applies all the aliasing and normalization needed). If you want to avoid the actual codec module import (codecs.lookup() imports the module), you can mimic the logic used by the lookup function of the encodings package. Not sure, whether that's worth it, though, since it is rather likely that you're going to use the codec you've just looked up soon after the test and codecs.lookup() caches the found codecs. If you want to convert an arbitrary encoding name to a registered standard IANA MIME charset name, then the aliases.py module is not going to be of much help, since we are using our own canonical names which do not necessarily map to the MIME charset names. You'd have to add a new mime_alias map to the email package for that. I'd suggest to use the same approach as for the aliases.py module, which is to first normalize the encoding name using normalize_encoding() and then running that through the mime_alias map. Hope that helps.

R. David Murray wrote:
> 
> R. David Murray <rdmurray@bitdance.com> added the comment:
> 
> What is not-a-charset?
>
> I apparently misunderstood what normalize_encodings does.  It isn't doing a lookup in the codecs registry and returning the canonical name for the codec.  Does that mean we actually have to fetch the codec in order to get the canonical name?  I suspect so, and that is probably OK, since in most cases the codec is eventually going to get called while processing the email that triggered the ALIASES lookup.
> 
> I also notice that there is a table of aliases in the codec module documentation, so that will need to be updated as well.

As far as the aliases.py part of the patch goes, I'm fine with that
since it corrects a few real bugs and adds the missing Latin-N
codec names.

Regarding using this table in the email package, I'm not really
clear on what you want to achieve.

If you are looking for a way to determine whether Python has a codec
installed for a certain charset name, then codecs.lookup() will
tell you this (and it also applies all the aliasing and normalization
needed).

If you want to avoid the actual codec module import (codecs.lookup()
imports the module), you can mimic the logic used by the lookup function
of the encodings package. Not sure, whether that's worth it, though,
since it is rather likely that you're going to use the codec you've
just looked up soon after the test and codecs.lookup() caches the
found codecs.

If you want to convert an arbitrary encoding name to a registered
standard IANA MIME charset name, then the aliases.py module is not
going to be of much help, since we are using our own canonical
names which do not necessarily map to the MIME charset names.

You'd have to add a new mime_alias map to the email package
for that. I'd suggest to use the same approach as for the
aliases.py module, which is to first normalize the encoding
name using normalize_encoding() and then running that through
the mime_alias map.

Hope that helps.

History
Date	User	Action	Args
2011-05-26 17:57:44	lemburg	set	recipients: + lemburg, ezio.melotti, eric.araujo, r.david.murray, l0nwlf, maker
2011-05-26 17:57:43	lemburg	link	issue8898 messages
2011-05-26 17:57:43	lemburg	create