Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utf8 alias for email charsets #48737

Closed
maxua mannequin opened this issue Dec 2, 2008 · 13 comments
Closed

Add utf8 alias for email charsets #48737

maxua mannequin opened this issue Dec 2, 2008 · 13 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@maxua
Copy link
Mannequin

maxua mannequin commented Dec 2, 2008

BPO 4487
Nosy @malemburg, @warsaw, @amauryfa, @merwok, @bitdancer
Files
  • charset-utf8-alias.patch
  • email_accept_codec_aliases.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/bitdancer'
    closed_at = <Date 2010-10-05.23:17:10.719>
    created_at = <Date 2008-12-02.13:17:11.456>
    labels = ['type-bug', 'library']
    title = 'Add utf8 alias for email charsets'
    updated_at = <Date 2011-04-03.17:58:05.542>
    user = 'https://bugs.python.org/maxua'

    bugs.python.org fields:

    activity = <Date 2011-04-03.17:58:05.542>
    actor = 'l0nwlf'
    assignee = 'r.david.murray'
    closed = True
    closed_date = <Date 2010-10-05.23:17:10.719>
    closer = 'r.david.murray'
    components = ['Library (Lib)']
    creation = <Date 2008-12-02.13:17:11.456>
    creator = 'maxua'
    dependencies = []
    files = ['12191', '17532']
    hgrepos = []
    issue_num = 4487
    keywords = ['patch']
    message_count = 13.0
    messages = ['76738', '85329', '87254', '102511', '103007', '106964', '106965', '107018', '107071', '107082', '107091', '118042', '118044']
    nosy_count = 8.0
    nosy_names = ['lemburg', 'barry', 'amaury.forgeotdarc', 'tony_nelson', 'eric.araujo', 'maxua', 'r.david.murray', 'bgamari']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue4487'
    versions = ['Python 2.6', 'Python 3.1', 'Python 2.7', 'Python 3.2']

    @maxua
    Copy link
    Mannequin Author

    maxua mannequin commented Dec 2, 2008

    When using MIME email package you can specify "utf8" as the encoding. It
    will be accepted but it is not rendered correctly in some MUA. E.g. Mac
    OS X Mail.app doesn't display it properly while Google Gmail does.

    It is confusing since Python itself happily understands both utf8 and utf-8.

    The patch adds "utf8" as an alias to "utf-8" encoding which means user
    won't need to think twice.

    Test case:
    from email.MIMEText import MIMEText

    msg = MIMEText(u'\u043a\u0438\u0440\u0438\u043b\u0438\u0446\u0430')
    msg.set_charset('utf8')
    print msg.as_string()

    @maxua maxua mannequin added the stdlib Python modules in the Lib dir label Dec 2, 2008
    @tonynelson
    Copy link
    Mannequin

    tonynelson mannequin commented Apr 3, 2009

    This seems entirely reasonable, helpful, and in accord with the mapping
    of ascii to us-ascii. I recommend accepting this patch or a slightly
    fancier one that would also do "utf_8".

    There are pobably other encoding names with the same issue of being
    accepted by Python but not be understood by other email clients.

    This issue also affects 2.6.1 and 2.7trunk. I haven't checked 3.x.

    @bgamari
    Copy link
    Mannequin

    bgamari mannequin commented May 5, 2009

    Has this patch been merged yet?

    @l0nwlf
    Copy link
    Mannequin

    l0nwlf mannequin commented Apr 7, 2010

    I tested it on python 2.5, 2.6, 2.7 trunk and 3.2 varying msg.set_charset(x) with x = 'utf8' and 'utf-8'
    Here are the results. Apparantly python 2.x had issue with Test case and 3.2 passed but I guess it is unrelated with the issue.

    07:35:40 l0nwlf-MBP:~ $ python2.5
    Python 2.5.4 (r254:67916, Jul  7 2009, 23:51:24) 
    [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from email.MIMEText import MIMEText
    >>> msg = MIMEText(u'\u043a\u0438\u0440\u0438\u043b\u0438\u0446\u0430')
    >>> msg.set_charset('utf8')
    >>> print msg.as_string()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/email/message.py", line 131, in as_string
        g.flatten(self, unixfrom=unixfrom)
      File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/email/generator.py", line 84, in flatten
        self._write(msg)
      File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/email/generator.py", line 109, in _write
        self._dispatch(msg)
      File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/email/generator.py", line 135, in _dispatch
        meth(msg)
      File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/email/generator.py", line 178, in _handle_text
        self._fp.write(payload)
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)
    >>> msg.set_charset('utf-8')
    >>> print msg.as_string()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/email/message.py", line 131, in as_string
        g.flatten(self, unixfrom=unixfrom)
      File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/email/generator.py", line 84, in flatten
        self._write(msg)
      File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/email/generator.py", line 109, in _write
        self._dispatch(msg)
      File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/email/generator.py", line 135, in _dispatch
        meth(msg)
      File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/email/generator.py", line 178, in _handle_text
        self._fp.write(payload)
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)
    
    07:36:17 l0nwlf-MBP:~ $ python2.6
    Python 2.6.5 (r265:79063, Apr  6 2010, 21:34:21) 
    [GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from email.MIMEText import MIMEText
    >>> msg = MIMEText(u'\u043a\u0438\u0440\u0438\u043b\u0438\u0446\u0430')
    >>> msg.set_charset('utf8')
    >>> print msg.as_string()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/email/message.py", line 135, in as_string
        g.flatten(self, unixfrom=unixfrom)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/email/generator.py", line 84, in flatten
        self._write(msg)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/email/generator.py", line 109, in _write
        self._dispatch(msg)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/email/generator.py", line 135, in _dispatch
        meth(msg)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/email/generator.py", line 178, in _handle_text
        self._fp.write(payload)
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)
    >>> msg.set_charset('utf-8')
    >>> print msg.as_string()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/email/message.py", line 135, in as_string
        g.flatten(self, unixfrom=unixfrom)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/email/generator.py", line 84, in flatten
        self._write(msg)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/email/generator.py", line 109, in _write
        self._dispatch(msg)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/email/generator.py", line 135, in _dispatch
        meth(msg)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/email/generator.py", line 178, in _handle_text
        self._fp.write(payload)
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)
     
    07:36:37 l0nwlf-MBP:~ $ python2.7
    Python 2.7a4+ (trunk:78750, Mar  7 2010, 08:09:00) 
    [GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from email.MIMEText import MIMEText
     >>> msg = MIMEText(u'\u043a\u0438\u0440\u0438\u043b\u0438\u0446\u0430')
    >>> msg.set_charset('utf8')
    >>> print msg.as_string()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python2.7/email/message.py", line 135, in as_string
        g.flatten(self, unixfrom=unixfrom)
      File "/usr/local/lib/python2.7/email/generator.py", line 83, in flatten
        self._write(msg)
      File "/usr/local/lib/python2.7/email/generator.py", line 108, in _write
        self._dispatch(msg)
      File "/usr/local/lib/python2.7/email/generator.py", line 134, in _dispatch
        meth(msg)
      File "/usr/local/lib/python2.7/email/generator.py", line 180, in _handle_text
        self._fp.write(payload)
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)
    >>> msg.set_charset('utf-8')
    >>> print msg.as_string()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python2.7/email/message.py", line 135, in as_string
        g.flatten(self, unixfrom=unixfrom)
      File "/usr/local/lib/python2.7/email/generator.py", line 83, in flatten
        self._write(msg)
      File "/usr/local/lib/python2.7/email/generator.py", line 108, in _write
        self._dispatch(msg)
      File "/usr/local/lib/python2.7/email/generator.py", line 134, in _dispatch
        meth(msg)
      File "/usr/local/lib/python2.7/email/generator.py", line 180, in _handle_text
        self._fp.write(payload)
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)
    
    07:37:06 l0nwlf-MBP:~ $ python3.2
    Python 3.2a0 (py3k:79532, Apr  1 2010, 01:48:52) 
    [GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from email.mime.text import MIMEText
    >>> msg = MIMEText('\u043a\u0438\u0440\u0438\u043b\u0438\u0446\u0430')
    >>> msg.set_charset('utf8')
    >>> print (msg.as_string())
    MIME-Version: 1.0
    Content-Transfer-Encoding: 8bit
    Content-Type: text/plain; charset="utf8"
    
    кирилица
    >>> msg.set_charset('utf-8')
    >>> print (msg.as_string())
    MIME-Version: 1.0
    Content-Transfer-Encoding: 8bit
    Content-Type: text/plain; charset="utf-8"

    кирилица

    The aliasing of 'utf-8' into 'utf8' and may be 'utf_8' seems reasonable IMO however Test Case fails.

    @l0nwlf l0nwlf mannequin added the type-feature A feature request or enhancement label Apr 7, 2010
    @l0nwlf
    Copy link
    Mannequin

    l0nwlf mannequin commented Apr 13, 2010

    MIMEText doesn't support unicode input. This was the reason OP Test case failed. For reference : http://bugs.python.org/issue1368247

    @bitdancer
    Copy link
    Member

    For various reasons the email module has a table of character sets. What might be most effective would be for the email module to look a character set name up in the codecs module and find out the cannonical name of the character set, and then look that up in its table (ie: remove the aliases table from email completely, and instead depend on codecs to resolve the cannonical name). Unfortunately the codecs module does not recognize all of the aliases used by email, nor is there necessarily any guarantee that the two modules will agree on the proper cannonical name.

    The attached patch instead uses the codecs module as a fallback if the charset name does not appear in the email package's ALIASES or CHARSETS tables. It therefore makes both utf8 and utf_8 work, as well as all the other variants the codec module accepts. The unit test just tests 'utf8', since if that one works all the others should too.

    I'm tentatively reclassifying this as a bug rather than a feature request, since I think it is a reasonable expectation that email would support at least the same set of encoding names that the rest of Python does.

    @bitdancer bitdancer self-assigned this Jun 3, 2010
    @bitdancer bitdancer added type-bug An unexpected behavior, bug, or error and removed type-feature A feature request or enhancement labels Jun 3, 2010
    @merwok
    Copy link
    Member

    merwok commented Jun 3, 2010

    Idea: Import the aliases mapping from codecs and extend it with email-specific aliases. Alternate idea: Add email’s names to codecs.

    Side note: “charset” stands for “character encoding”, not “character set”. See <http://www.w3.org/International/questions/qa-what-is-encoding#what\>

    @malemburg
    Copy link
    Member

    R. David Murray wrote:

    R. David Murray <rdmurray@bitdance.com> added the comment:

    For various reasons the email module has a table of character sets. What might be most effective would be for the email module to look a character set name up in the codecs module and find out the cannonical name of the character set, and then look that up in its table (ie: remove the aliases table from email completely, and instead depend on codecs to resolve the cannonical name). Unfortunately the codecs module does not recognize all of the aliases used by email, nor is there necessarily any guarantee that the two modules will agree on the proper cannonical name.

    I think that the encodings package should be the only source of
    valid aliases and encoding names - after all, you wouldn't be
    able to process email content using names or aliases not
    appearing in the encodings package tables.

    If there are aliases missing, then we can add them there.

    If the email packages needs different canonical names, it can apply
    its own map on the canonical names returned by the encodings package.

    @bitdancer
    Copy link
    Member

    Mark, any objection to my putting this patch in now, and then we'll fix the aliases implementation in 3.2?

    @malemburg
    Copy link
    Member

    R. David Murray wrote:

    R. David Murray <rdmurray@bitdance.com> added the comment:

    Mark, any objection to my putting this patch in now, and then we'll fix the aliases implementation in 3.2?

    No. Please open a new issue targeting Python 3.2 for this.

    Thanks,

    Marc-Andre Lemburg
    eGenix.com


    2010-07-19: EuroPython 2010, Birmingham, UK 44 days to go

    ::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

    eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
    Registered at Amtsgericht Duesseldorf: HRB 46611
    http://www.egenix.com/company/contact/

    @bitdancer
    Copy link
    Member

    Patch committed to trunk in r81705. Leaving issue open pending porting to the other branches, but I've also opened bpo-8898 to further change things so that codecs becomes the sole authority for aliases in 3.2.

    @amauryfa
    Copy link
    Member

    amauryfa commented Oct 5, 2010

    David, can this issue be closed?

    @bitdancer
    Copy link
    Member

    Yes. Benjamin merged this to py3k in r82292. If someone wants to explain to me how to cherry pick the changeset into 3.1 I'd be happy to do it, otherwise I think I'm done with this one :)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants