Message 136999 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	eric.araujo, ezio.melotti, l0nwlf, lemburg, maker, r.david.murray
Date	2011-05-26.19:44:11
SpamBayes Score	3.9850745e-10
Marked as misclassified	No
Message-id	<4DDEAD7E.7060500@egenix.com>
In-reply-to	<1306437690.58.0.560040074084.issue8898@psf.upfronthosting.co.za>

Content
R. David Murray wrote: > > R. David Murray <rdmurray@bitdance.com> added the comment: > > Well, my thought was to avoid having multiple charset alias lists in the stdlib, and reusing the one in codecs, which is larger than the one in email, seemed to make sense. This came up because a bug was reported where email (silently) failed to encode a string because the charset alias, while present in codecs, wasn't present in the email ALIASES table. > > I suppose that as an alternative I could add full support for the IANA aliases list to email. Email is the most likely place to run in to variant charset aliases anyway. > > If that's the way we go, then this issue should be changed over to covering just updating codecs with the missing aliases, and a new issue opened for adding full IANA alias support to email. I think it would be useful to have a mapping from the Python canoncial name (the one the encodings package uses) to the "preferred MIME name" as referenced in the IANA list: http://www.iana.org/assignments/character-sets This mapping could also be added to the encodings package together with a function that translates a given encoding name to its canoncial Python name (codec_module_name()) and another one to translate it to the "preferred MIME name" according to the above list (encoding_mime_name()). Note that we don't support all the aliases mentioned in the IANA list because many of the are outdated and some have proved to be wrong (the aliased encodings are actually different in a few places). There are also a few encodings in the list which we don't support at all. Since we only rarely get requests for supporting new aliases or encodings, I think it's safe to say that the existing set is fairly complete from a practical point of view.

R. David Murray wrote:
> 
> R. David Murray <rdmurray@bitdance.com> added the comment:
> 
> Well, my thought was to avoid having multiple charset alias lists in the stdlib, and reusing the one in codecs, which is larger than the one in email, seemed to make sense.  This came up because a bug was reported where email (silently) failed to encode a string because the charset alias, while present in codecs, wasn't present in the email ALIASES table.
> 
> I suppose that as an alternative I could add full support for the IANA aliases list to email.  Email is the most likely place to run in to variant charset aliases anyway.
> 
> If that's the way we go, then this issue should be changed over to covering just updating codecs with the missing aliases, and a new issue opened for adding full IANA alias support to email.

I think it would be useful to have a mapping from the Python
canoncial name (the one the encodings package uses) to the
"preferred MIME name" as referenced in the IANA list:

http://www.iana.org/assignments/character-sets

This mapping could also be added to the encodings package
together with a function that translates a given encoding
name to its canoncial Python name (codec_module_name())
and another one to translate it to the "preferred MIME name"
according to the above list (encoding_mime_name()).

Note that we don't support all the aliases mentioned in the IANA
list because many of the are outdated and some have proved to be
wrong (the aliased encodings are actually different in a few
places). There are also a few encodings in the list which we
don't support at all.

Since we only rarely get requests for supporting new aliases or
encodings, I think it's safe to say that the existing set
is fairly complete from a practical point of view.

History
Date	User	Action	Args
2011-05-26 19:44:12	lemburg	set	recipients: + lemburg, ezio.melotti, eric.araujo, r.david.murray, l0nwlf, maker
2011-05-26 19:44:12	lemburg	link	issue8898 messages
2011-05-26 19:44:11	lemburg	create