classification
Title: The email package should defer to the codecs module for all aliases
Type: enhancement Stage: test needed
Components: email Versions: Python 3.3
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, eric.araujo, ezio.melotti, l0nwlf, lemburg, maker, r.david.murray
Priority: normal Keywords: easy, patch

Created on 2010-06-04 18:53 by r.david.murray, last changed 2012-05-24 03:15 by r.david.murray.

Files
File name Uploaded Description Edit
issue8898.patch maker, 2011-05-21 22:19 review
fail_tactis.txt maker, 2011-05-22 06:48
issue8898_withtests.patch maker, 2011-05-22 06:48 review
fail_mcbs.txt maker, 2011-05-22 06:49
issue8898_skip.patch maker, 2011-05-22 12:44 review
issue8898_normalize.patch maker, 2011-05-24 16:29 review
issue8898_2.patch maker, 2011-05-27 12:47 review
issue8898_3.patch maker, 2011-05-27 15:59 review
Messages (38)
msg107087 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-06-04 18:53
Currently the email module maintains a set of "charset" aliases that it maps to codec names before looking up the codec in the codecs module.  Ideally it should instead be able to just look up any 'charset' name, and if it is a valid alias for a codec, the codec module would return the codec with the canonical name.  It is possible (I haven't checked yet) that the email module needs a different canonical  'charset' name for certain codecs, but if so it can do that mapping after getting the canonical codec name from codecs.

To implement this we need to make two simple changes:

1) add any aliases the email module recognizes but the codecs module doesn't to the codecs module.

2) rewrite email.charset so that it does not have an ALIASES table (but may have a smaller 'canonical charset map' table instead).
msg107093 - (view) Author: Shashwat Anand (l0nwlf) Date: 2010-06-04 20:13
from email.charset.ALIASES most of them failed to be recognize by codecs module.


>>> for i in email.charset.ALIASES.keys():
...     try:
...         codecs.lookup(i)
...     except LookupError:
...         print("Not recognized by codecs : alias {} mapped to {}".format(i, email.charset.ALIASES[i]))
...     
... 
Not recognized by codecs : alias latin-8 mapped to iso-8859-14
Not recognized by codecs : alias latin-9 mapped to iso-8859-15
Not recognized by codecs : alias latin-2 mapped to iso-8859-2
Not recognized by codecs : alias latin-3 mapped to iso-8859-3
<codecs.CodecInfo object for encoding iso8859-1 at 0x10160af58>
Not recognized by codecs : alias latin-6 mapped to iso-8859-10
Not recognized by codecs : alias latin-7 mapped to iso-8859-13
Not recognized by codecs : alias latin-4 mapped to iso-8859-4
Not recognized by codecs : alias latin-5 mapped to iso-8859-9
<codecs.CodecInfo object for encoding euc_jp at 0x1016260b8>
Not recognized by codecs : alias latin-10 mapped to iso-8859-16
<codecs.CodecInfo object for encoding ascii at 0x101626120>
Not recognized by codecs : alias latin_10 mapped to iso-8859-16
<codecs.CodecInfo object for encoding iso8859-1 at 0x10160aae0>
Not recognized by codecs : alias latin_2 mapped to iso-8859-2
Not recognized by codecs : alias latin_3 mapped to iso-8859-3
Not recognized by codecs : alias latin_4 mapped to iso-8859-4
Not recognized by codecs : alias latin_5 mapped to iso-8859-9
Not recognized by codecs : alias latin_6 mapped to iso-8859-10
Not recognized by codecs : alias latin_7 mapped to iso-8859-13
Not recognized by codecs : alias latin_8 mapped to iso-8859-14
Not recognized by codecs : alias latin_9 mapped to iso-8859-15
<codecs.CodecInfo object for encoding cp949 at 0x101626390>
<codecs.CodecInfo object for encoding euc_kr at 0x101626530>


So basically apart from latin-1 all the latin* failed to be recognized by codecs.
msg107098 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-06-04 20:43
Shashwat Anand wrote:
> 
> Shashwat Anand <anand.shashwat@gmail.com> added the comment:
> 
> from email.charset.ALIASES most of them failed to be recognize by codecs module.
> 
> 
>>>> for i in email.charset.ALIASES.keys():
> ...     try:
> ...         codecs.lookup(i)
> ...     except LookupError:
> ...         print("Not recognized by codecs : alias {} mapped to {}".format(i, email.charset.ALIASES[i]))
> ...     
> ... 
> Not recognized by codecs : alias latin-8 mapped to iso-8859-14
> Not recognized by codecs : alias latin-9 mapped to iso-8859-15
> Not recognized by codecs : alias latin-2 mapped to iso-8859-2
> Not recognized by codecs : alias latin-3 mapped to iso-8859-3
> <codecs.CodecInfo object for encoding iso8859-1 at 0x10160af58>
> Not recognized by codecs : alias latin-6 mapped to iso-8859-10
> Not recognized by codecs : alias latin-7 mapped to iso-8859-13
> Not recognized by codecs : alias latin-4 mapped to iso-8859-4
> Not recognized by codecs : alias latin-5 mapped to iso-8859-9
> <codecs.CodecInfo object for encoding euc_jp at 0x1016260b8>
> Not recognized by codecs : alias latin-10 mapped to iso-8859-16
> <codecs.CodecInfo object for encoding ascii at 0x101626120>
> Not recognized by codecs : alias latin_10 mapped to iso-8859-16
> <codecs.CodecInfo object for encoding iso8859-1 at 0x10160aae0>
> Not recognized by codecs : alias latin_2 mapped to iso-8859-2
> Not recognized by codecs : alias latin_3 mapped to iso-8859-3
> Not recognized by codecs : alias latin_4 mapped to iso-8859-4
> Not recognized by codecs : alias latin_5 mapped to iso-8859-9
> Not recognized by codecs : alias latin_6 mapped to iso-8859-10
> Not recognized by codecs : alias latin_7 mapped to iso-8859-13
> Not recognized by codecs : alias latin_8 mapped to iso-8859-14
> Not recognized by codecs : alias latin_9 mapped to iso-8859-15
> <codecs.CodecInfo object for encoding cp949 at 0x101626390>
> <codecs.CodecInfo object for encoding euc_kr at 0x101626530>
> 
> 
> So basically apart from latin-1 all the latin* failed to be recognized by codecs.

We need to add aliases for those codecs. The current aliases
list only supports the format "latinN" for N in 1-10.
msg107100 - (view) Author: Shashwat Anand (l0nwlf) Date: 2010-06-04 20:53
>We need to add aliases for those codecs. The current aliases
>list only supports the format "latinN" for N in 1-10.

latinN means latin1 to latin10 ? 
But latin_1 is a recognized alias.

>>> codecs.lookup('latin_1')
<codecs.CodecInfo object for encoding iso8859-1 at 0x10160aae0>
msg107102 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-06-04 21:14
Shashwat Anand wrote:
> 
> Shashwat Anand <anand.shashwat@gmail.com> added the comment:
> 
>> We need to add aliases for those codecs. The current aliases
>> list only supports the format "latinN" for N in 1-10.
> 
> latinN means latin1 to latin10 ? 

Yes. We should add aliases for the format "latin_N" as well.

> But latin_1 is a recognized alias.
> 
>>>> codecs.lookup('latin_1')
> <codecs.CodecInfo object for encoding iso8859-1 at 0x10160aae0>

Yes, since that's the native name of the dedicated Python codec
for ISO-8859-1.
msg124713 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-27 16:54
Too late for 3.2, will implement for 3.3.
msg136443 - (view) Author: Michele Orrù (maker) * Date: 2011-05-21 14:27
The attached patch adds aliases for latin_N in encodings.aliases, and fixes email.charset behaviour according to codecs.lookup, as requested.
Tested on (Arch) Linux.

Am I supposed to add any unittest? I'm wavering about where they should be placed (in encodings or email?).
msg136488 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-05-21 22:51
The patch looks ok to me.
AFAIU the lookup will take care to normalize the name and return latin_N.  This also implies that other names (like 'latin-N', 'LaTiN~~N' and so on) will be normalized to latin_N and then accepted.

Regarding the tests, I don't see tests for the aliases anywhere, so something like:
for alias, codec_name in encodings.aliases.items():
    self.assertEqual(codecs.lookup(alias).name, codec_name)
could be added somewhere to check that all the aliases in the dict map to the correct codec.
msg136507 - (view) Author: Michele Orrù (maker) * Date: 2011-05-22 06:47
Well, actually encodings.aliases links to the encoding _module name_, as
described in the doc:
""" Encoding Aliases Support
    This module is used by the encodings package search function to
    map encodings names to module names.
"""
So I've adjusted your snippet according to this, as you can see in the
attachment.

I've also slightly changed the imports as pep8 says:
"""
Yes: import os
import sys

No: import sys, os
"""

Anyway, running the test failed for two encodings, there are two bugs there,
indeed.
- mcbs has something broken in its imports;
- tactis module is not present.

Since they are really easy to fix, I haven't yet reported to the bugtraker.
Let me know what should I do.
Post on bugs.python.org bug and patch? Any new test specifically for the
email module?
msg136511 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-05-22 11:13
Michele Orrù wrote:
> 
> Michele Orrù <maker.py@gmail.com> added the comment:
> 
> Well, actually encodings.aliases links to the encoding _module name_, as
> described in the doc:
> """ Encoding Aliases Support
>     This module is used by the encodings package search function to
>     map encodings names to module names.
> """
> So I've adjusted your snippet according to this, as you can see in the
> attachment.
> 
> I've also slightly changed the imports as pep8 says:
> """
> Yes: import os
> import sys
> 
> No: import sys, os
> """
> 
> Anyway, running the test failed for two encodings, there are two bugs there,
> indeed.
> - mcbs has something broken in its imports;

mbcs is only available on Windows.

> - tactis module is not present.

I'm not sure what happened here: either the alias entry is wrong
or the codec module was not committed.

In either case, no one has complained about this encoding not working,
so we can probably just remove it from the alias table. See
http://bugs.python.org/issue1251921 for a similar report and
discussion.
msg136514 - (view) Author: Michele Orrù (maker) * Date: 2011-05-22 11:32
So, what do you prefer? Add a check for sys.platform, or just skip it?

discussion on python-dev. So I'm +1 for just skipping it for now (with a XXX
comment on the right maybe).
msg136515 - (view) Author: Michele Orrù (maker) * Date: 2011-05-22 11:37
Sorry, I was told that email the bugtracker could not work properly.


> > - mcbs has something broken in its imports;

> mbcs is only available on Windows.
So, what do you prefer? Add a check for sys.platform, or just skip it?

> > - tactis module is not present.

> I'm not sure what happened here: either the alias entry is wrong
> or the codec module was not committed.

> In either case, no one has complained about this encoding not working,
> so we can probably just remove it from the alias table. See
> http://bugs.python.org/issue1251921 for a similar report and
> discussion.

I don't have such autority, and probably such a choice will require a discussion on python-dev. So I'm +1 for just skipping it for now (with a XXX comment on the right maybe).
msg136518 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-05-22 11:58
Michele Orrù wrote:
> 
> Michele Orrù <maker.py@gmail.com> added the comment:
> 
> Sorry, I was told that email the bugtracker could not work properly.
> 
> 
>>> - mcbs has something broken in its imports;
> 
>> mbcs is only available on Windows.
>
> So, what do you prefer? Add a check for sys.platform, or just skip it?

The test suite provides ways to implement known failures on
specific platforms, so I'd suggest to use those mechanisms.
I've never used those, so can't comment on how much work it is
to use them.

If that's too difficult, just use sys.platform.

>>> - tactis module is not present.
> 
>> I'm not sure what happened here: either the alias entry is wrong
>> or the codec module was not committed.
> 
>> In either case, no one has complained about this encoding not working,
>> so we can probably just remove it from the alias table. See
>> http://bugs.python.org/issue1251921 for a similar report and
>> discussion.
> 
> I don't have such autority, and probably such a choice will require a discussion on python-dev. So I'm +1 for just skipping it for now (with a XXX comment on the right maybe).

Given the old discussion on the other ticket, it's fine to
remove the alias entry:

    # tactis codec
    'tis260'             : 'tactis',
msg136519 - (view) Author: Michele Orrù (maker) * Date: 2011-05-22 12:27
unittest.skip* are decorators, so useless in this case; also, AFAIS
Lib/test/ uses sys.platform.

I would suggest to put a try statement in encodings.mbcs, and raise an
error in case the imported modules imported are not found.
But this is another story.
msg136520 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-05-22 12:36
Something like:
if name == 'mbcs' and not sys.platform.startswith('win'):
    continue
should be enough.
msg136521 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-05-22 12:54
I suggest to:
  1) remove the alias for tactis;
  2) add the aliases for latin_* and the tests for the aliases;
  3) fix the email.charset to use the new aliases instead of its own dict.

2) and 3) should go on 3.3 only, 1) could be considered a bug and fixed on 2.7/3.2 too, but since the codec is already missing, removing the alias won't change anything (i.e. it will raise a LookupError with or without alias).
msg136533 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-05-22 14:54
Ezio Melotti wrote:
> 
> Ezio Melotti <ezio.melotti@gmail.com> added the comment:
> 
> I suggest to:
>   1) remove the alias for tactis;
>   2) add the aliases for latin_* and the tests for the aliases;
>   3) fix the email.charset to use the new aliases instead of its own dict.
> 
> 2) and 3) should go on 3.3 only, 1) could be considered a bug and fixed on 2.7/3.2 too, but since the codec is already missing, removing the alias won't change anything (i.e. it will raise a LookupError with or without alias).

+1
msg136539 - (view) Author: Michele Orrù (maker) * Date: 2011-05-22 15:34
In the sense that the alias for 'tactis' should be removed also in 2.7 and 3.2?
msg136550 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-22 18:12
euc_jp and euc_kr seem to be backward (that is, codecs translates them to the _ version, instead of translating the _ version to the - version).  I worry that there might be other deviations from the standard email names.  I would suggest we pull the list of preferred MIME names from the IANA charset registry and make a test out of them in the email package.  If changing the name returned by codecs is determined to not be acceptable, then those entries will need to remain in the charset module ALIASES table and the codecs-check logic adjusted accordingly.

Unfortunately the IANA registry does not list MIME names for all of the charsets in common use, and the canonical names are not always the ones commonly used in email.  Hopefully the codecs registry is using the most common name for those, and hopefully if there are differences it won't break any user code, since any reasonable email code should be coping with the aliases in any case.

Ezio, if you want to steal this one from me, that's fine by me.
msg136551 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-22 18:16
Hmm.  Must have misread.  Looks like all the common charsets do have MIME entries in the IANA table.
msg136553 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-22 18:19
On second thought the resolution order ought to be swapped anyway: if the user has added an ALIAS, they are going to want that used, not the one from codecs.
msg136614 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-05-23 11:48
R. David Murray wrote:
> 
> R. David Murray <rdmurray@bitdance.com> added the comment:
> 
> euc_jp and euc_kr seem to be backward (that is, codecs translates them to the _ version, instead of translating the _ version to the - version).  I worry that there might be other deviations from the standard email names.  I would suggest we pull the list of preferred MIME names from the IANA charset registry and make a test out of them in the email package.  If changing the name returned by codecs is determined to not be acceptable, then those entries will need to remain in the charset module ALIASES table and the codecs-check logic adjusted accordingly.
> 
> Unfortunately the IANA registry does not list MIME names for all of the charsets in common use, and the canonical names are not always the ones commonly used in email.  Hopefully the codecs registry is using the most common name for those, and hopefully if there are differences it won't break any user code, since any reasonable email code should be coping with the aliases in any case.

The way I understand the patch was that the email package will
start to use the encoding aliases for determining the codec
name instead of its own list. That is: only for decoding the
input data, not for creating a correct MIME encoding name in
output data.
msg136636 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-23 13:43
Well, it turns out that back when I opened this issue I misunderstood what the ALIASES table was used for.  it *is* used before doing a codecs lookup, but it is also used to convert whatever charset name the programmer specifies into the standard MIME name for the codec when generating emails.

Clearly the email module needs to base its transformation on the IANA table.  I think the ideal would be to have a program that pulls the IANA table and generates the ALIASES table.  On the other hand, codecs should already have all of those aliases (this theoretical program could be used to ensure that), so another alternative is to use codecs to look up the "python canonical" name for the charset, and have the email ALIASES table just map the ones where that isn't the preferred MIME name into the MIME name.
msg136764 - (view) Author: Michele Orrù (maker) * Date: 2011-05-24 16:29
After discussing on IRC, it figured out that the best choice would be to use normalize_encoding plus ALIAS, as the attached patch does.
msg136984 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-26 17:01
What is not-a-charset?

I apparently misunderstood what normalize_encodings does.  It isn't doing a lookup in the codecs registry and returning the canonical name for the codec.  Does that mean we actually have to fetch the codec in order to get the canonical name?  I suspect so, and that is probably OK, since in most cases the codec is eventually going to get called while processing the email that triggered the ALIASES lookup.

I also notice that there is a table of aliases in the codec module documentation, so that will need to be updated as well.
msg136989 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-05-26 17:57
R. David Murray wrote:
> 
> R. David Murray <rdmurray@bitdance.com> added the comment:
> 
> What is not-a-charset?
>
> I apparently misunderstood what normalize_encodings does.  It isn't doing a lookup in the codecs registry and returning the canonical name for the codec.  Does that mean we actually have to fetch the codec in order to get the canonical name?  I suspect so, and that is probably OK, since in most cases the codec is eventually going to get called while processing the email that triggered the ALIASES lookup.
> 
> I also notice that there is a table of aliases in the codec module documentation, so that will need to be updated as well.

As far as the aliases.py part of the patch goes, I'm fine with that
since it corrects a few real bugs and adds the missing Latin-N
codec names.

Regarding using this table in the email package, I'm not really
clear on what you want to achieve.

If you are looking for a way to determine whether Python has a codec
installed for a certain charset name, then codecs.lookup() will
tell you this (and it also applies all the aliasing and normalization
needed).

If you want to avoid the actual codec module import (codecs.lookup()
imports the module), you can mimic the logic used by the lookup function
of the encodings package. Not sure, whether that's worth it, though,
since it is rather likely that you're going to use the codec you've
just looked up soon after the test and codecs.lookup() caches the
found codecs.

If you want to convert an arbitrary encoding name to a registered
standard IANA MIME charset name, then the aliases.py module is not
going to be of much help, since we are using our own canonical
names which do not necessarily map to the MIME charset names.

You'd have to add a new mime_alias map to the email package
for that. I'd suggest to use the same approach as for the
aliases.py module, which is to first normalize the encoding
name using normalize_encoding() and then running that through
the mime_alias map.

Hope that helps.
msg136994 - (view) Author: Michele Orrù (maker) * Date: 2011-05-26 19:12
+1

What do you think? Ezio, David?
msg136996 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-26 19:21
Well, my thought was to avoid having multiple charset alias lists in the stdlib, and reusing the one in codecs, which is larger than the one in email, seemed to make sense.  This came up because a bug was reported where email (silently) failed to encode a string because the charset alias, while present in codecs, wasn't present in the email ALIASES table.

I suppose that as an alternative I could add full support for the IANA aliases list to email.  Email is the most likely place to run in to variant charset aliases anyway.

If that's the way we go, then this issue should be changed over to covering just updating codecs with the missing aliases, and a new issue opened for adding full IANA alias support to email.
msg136998 - (view) Author: Michele Orrù (maker) * Date: 2011-05-26 19:35
In that case, I could still take care of it; it would be really easy to do.

So, it's up to you to tell me what is the best design choice. (:
msg136999 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-05-26 19:44
R. David Murray wrote:
> 
> R. David Murray <rdmurray@bitdance.com> added the comment:
> 
> Well, my thought was to avoid having multiple charset alias lists in the stdlib, and reusing the one in codecs, which is larger than the one in email, seemed to make sense.  This came up because a bug was reported where email (silently) failed to encode a string because the charset alias, while present in codecs, wasn't present in the email ALIASES table.
> 
> I suppose that as an alternative I could add full support for the IANA aliases list to email.  Email is the most likely place to run in to variant charset aliases anyway.
> 
> If that's the way we go, then this issue should be changed over to covering just updating codecs with the missing aliases, and a new issue opened for adding full IANA alias support to email.

I think it would be useful to have a mapping from the Python
canoncial name (the one the encodings package uses) to the
"preferred MIME name" as referenced in the IANA list:

http://www.iana.org/assignments/character-sets

This mapping could also be added to the encodings package
together with a function that translates a given encoding
name to its canoncial Python name (codec_module_name())
and another one to translate it to the "preferred MIME name"
according to the above list (encoding_mime_name()).

Note that we don't support all the aliases mentioned in the IANA
list because many of the are outdated and some have proved to be
wrong (the aliased encodings are actually different in a few
places). There are also a few encodings in the list which we
don't support at all.

Since we only rarely get requests for supporting new aliases or
encodings, I think it's safe to say that the existing set
is fairly complete from a practical point of view.
msg137007 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-26 20:35
I agree that since we get very few requests to add aliases our current tables are probably what we want.  So adding the MIME_preferred_name mapping *somewhere* is indeed what I would like to see happen.  It doesn't matter to me whether it is in the codecs module or the email module.
msg137048 - (view) Author: Michele Orrù (maker) * Date: 2011-05-27 12:47
Any idea about how to unittest mime.aliases?

Also, since I've just created a new file, are there some buracratic issues? I mean, do I have to add something at the top of the file?
(I'm just signing the Contributor Agreement)
msg137049 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-05-27 13:12
Michele Orrù wrote:
> 
> Michele Orrù <maker.py@gmail.com> added the comment:
> 
> Any idea about how to unittest mime.aliases?

Test the APIs you probably created for accessing it.

> Also, since I've just created a new file, are there some buracratic issues? I mean, do I have to add something at the top of the file?
> (I'm just signing the Contributor Agreement)

You just need to put the usual copyright line at the top of
the file, together with the sentence from the agreement.

Apart from that, you also need to make sure that the other build
setups include the new file (PCbuild, Makefile.pre.in, etc.). If you
don't know how to do this, you can ask someone else to take
care of this, since it usually requires domain knowledge (e.g.
to add the file to the Windows builds).
msg137051 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-27 13:59
Your new file isn't in the patch.  I'm imagining it is a table and a couple methods, so I think perhaps putting it either in charset or in utils would be better than creating a new file.

As for testing it, what I'd love to see is a test that downloads the current IANA table (there are routines in test.support for doing this in a way that respects the test suite's 'resources' settings), pulls out the preferred MIME aliases, and makes sure that all of them are mapped to some canonical Python codec.  Then you can invert that and make sure all of the results returned by that test map back to the correct MIME alias.
msg137056 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-27 14:24
Prompted on IRC, I see I missed the file because it was so short.

This still isn't what I'm looking for.  We are assuming that email is going to use the codec eventually so that it is not a bad thing to have charset pre-populate the codec cache.  So what I'm looking for is:

    try:
        python_name = codecs.lookup(input_charset).name
        mime_name = ALIASES.get(python_name, input_charset)
    except LookupError:
        mime_name = input_charset

MAL's idea was to implement the ALIASES step via a two-way mapping in the encodings module (python-canonical-name <=> MIME-preferred-name).  That would be fine, too, but the email.charset logic should look like the above however the table is implemented.
msg137060 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-27 15:24
The second line in that try: block should have been:

  mime_name = ALIASES.get(python_name, python_name)
msg137072 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-05-27 16:08
> email (silently) failed to encode a string

Is this silent error another bug to fix?
msg137082 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-27 16:42
Not in email5.  The RFC says that if the charset parameter isn't known you just pass it through.  In email6 we will be making a more careful distinction between errors that should be passed silently per the RFC, and ones that should be noisy because the API in question is being used to create the message ab-initio.  (In email5 the exact same machinery is used to create a message from parsed source as is used to create a message programatically, resulting in the silent passing of certain errors that should really be noisy.)
History
Date User Action Args
2012-05-24 03:15:34r.david.murraysetassignee: r.david.murray ->

components: + email, - Library (Lib)
nosy: + barry
2011-05-27 16:42:52r.david.murraysetmessages: + msg137082
2011-05-27 16:08:50eric.araujosetmessages: + msg137072
2011-05-27 15:59:56makersetfiles: + issue8898_3.patch
2011-05-27 15:24:12r.david.murraysetmessages: + msg137060
2011-05-27 14:24:06r.david.murraysetmessages: + msg137056
2011-05-27 13:59:13r.david.murraysetmessages: + msg137051
2011-05-27 13:12:16lemburgsetmessages: + msg137049
2011-05-27 12:47:46makersetfiles: + issue8898_2.patch

messages: + msg137048
2011-05-26 20:35:34r.david.murraysetmessages: + msg137007
2011-05-26 19:44:12lemburgsetmessages: + msg136999
title: The email package should defer to the codecs module for all aliases -> The email package should defer to the codecs module for all aliases
2011-05-26 19:35:35makersetmessages: + msg136998
2011-05-26 19:21:30r.david.murraysetmessages: + msg136996
2011-05-26 19:12:36makersetmessages: + msg136994
title: The email package should defer to the codecs module for all aliases -> The email package should defer to the codecs module for all aliases
2011-05-26 17:57:43lemburgsetmessages: + msg136989
2011-05-26 17:01:40r.david.murraysetmessages: + msg136984
2011-05-24 16:29:21makersetfiles: + issue8898_normalize.patch

messages: + msg136764
2011-05-23 13:43:46r.david.murraysetmessages: + msg136636
2011-05-23 11:48:58lemburgsetmessages: + msg136614
title: The email package should defer to the codecs module for all aliases -> The email package should defer to the codecs module for all aliases
2011-05-22 18:19:59r.david.murraysetmessages: + msg136553
2011-05-22 18:16:57r.david.murraysetmessages: + msg136551
2011-05-22 18:12:26r.david.murraysetmessages: + msg136550
2011-05-22 15:35:36makersetfiles: - unnamed
2011-05-22 15:34:53makersetmessages: + msg136539
title: The email package should defer to the codecs module for all aliases -> The email package should defer to the codecs module for all aliases
2011-05-22 14:54:27lemburgsetmessages: + msg136533
title: The email package should defer to the codecs module for all aliases -> The email package should defer to the codecs module for all aliases
2011-05-22 12:54:00ezio.melottisetmessages: + msg136521
2011-05-22 12:44:08makersetfiles: + issue8898_skip.patch
2011-05-22 12:43:54makersetfiles: - issue8898_skip.patch
2011-05-22 12:36:01ezio.melottisetmessages: + msg136520
2011-05-22 12:27:10makersetfiles: + issue8898_skip.patch

messages: + msg136519
title: The email package should defer to the codecs module for all aliases -> The email package should defer to the codecs module for all aliases
2011-05-22 11:58:44lemburgsetmessages: + msg136518
title: The email package should defer to the codecs module for all aliases -> The email package should defer to the codecs module for all aliases
2011-05-22 11:38:01makersetfiles: - unnamed
2011-05-22 11:37:16makersetmessages: + msg136515
2011-05-22 11:32:52makersetfiles: + unnamed

messages: + msg136514
title: The email package should defer to the codecs module for all aliases -> The email package should defer to the codecs module for all aliases
2011-05-22 11:13:10lemburgsetmessages: + msg136511
title: The email package should defer to the codecs module for all aliases -> The email package should defer to the codecs module for all aliases
2011-05-22 06:49:03makersetfiles: + fail_mcbs.txt
2011-05-22 06:48:39makersetfiles: + issue8898_withtests.patch
2011-05-22 06:48:09makersetfiles: + fail_tactis.txt
2011-05-22 06:47:41makersetfiles: + unnamed

messages: + msg136507
2011-05-21 22:51:59ezio.melottisetmessages: + msg136488
2011-05-21 22:19:48makersetfiles: + issue8898.patch
2011-05-21 22:19:20makersetfiles: - issue8898.patch
2011-05-21 19:49:12makersetnosy: + eric.araujo
2011-05-21 14:27:54makersetfiles: + issue8898.patch

nosy: + ezio.melotti, maker
messages: + msg136443

keywords: + patch
2010-12-27 16:54:09r.david.murraysetnosy: lemburg, r.david.murray, l0nwlf
title: The email package should defer to the codecs module for all aliases -> The email package should defer to the codecs module for all aliases
messages: + msg124713
versions: + Python 3.3, - Python 3.2
2010-06-04 21:14:17lemburgsetmessages: + msg107102
2010-06-04 20:53:02l0nwlfsetmessages: + msg107100
2010-06-04 20:43:37lemburgsetmessages: + msg107098
title: The email package should defer to the codecs module for all aliases -> The email package should defer to the codecs module for all aliases
2010-06-04 20:13:38l0nwlfsetmessages: + msg107093
2010-06-04 18:53:49r.david.murraycreate