Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode in email.MIMEText and email/Charset.py #42634

Closed
gdamjan mannequin opened this issue Nov 28, 2005 · 13 comments
Closed

unicode in email.MIMEText and email/Charset.py #42634

gdamjan mannequin opened this issue Nov 28, 2005 · 13 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@gdamjan
Copy link
Mannequin

gdamjan mannequin commented Nov 28, 2005

BPO 1368247
Nosy @loewis, @warsaw, @orsenthil, @vstinner, @bitdancer
Files
  • Charset.patch
  • mimetext-unicode.patch: unicode mimetext support
  • mimetext_unicode_input.patch
  • mimetext_unicode_input.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/bitdancer'
    closed_at = <Date 2010-06-03.02:06:49.885>
    created_at = <Date 2005-11-28.14:15:40.000>
    labels = ['type-bug', 'library']
    title = 'unicode in email.MIMEText and email/Charset.py'
    updated_at = <Date 2010-06-03.02:06:49.884>
    user = 'https://bugs.python.org/gdamjan'

    bugs.python.org fields:

    activity = <Date 2010-06-03.02:06:49.884>
    actor = 'r.david.murray'
    assignee = 'r.david.murray'
    closed = True
    closed_date = <Date 2010-06-03.02:06:49.885>
    closer = 'r.david.murray'
    components = ['Library (Lib)']
    creation = <Date 2005-11-28.14:15:40.000>
    creator = 'gdamjan'
    dependencies = []
    files = ['6898', '12190', '17513', '17514']
    hgrepos = []
    issue_num = 1368247
    keywords = ['patch']
    message_count = 13.0
    messages = ['49137', '49138', '76737', '76740', '76741', '87253', '87292', '103008', '104416', '106835', '106837', '106924', '106930']
    nosy_count = 9.0
    nosy_names = ['loewis', 'barry', 'orsenthil', 'vstinner', 'gdamjan', 'maxua', 'r.david.murray', 'bgamari', 'l0nwlf']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue1368247'
    versions = ['Python 3.1', 'Python 3.2']

    @gdamjan
    Copy link
    Mannequin Author

    gdamjan mannequin commented Nov 28, 2005

    This is the test case that fails in python 2.4.1:
    from email.MIMEText import MIMEText
    msg =
    MIMEText(u'\u043a\u0438\u0440\u0438\u043b\u0438\u0446\u0430')
    msg.set_charset('utf-8')
    msg.as_string()

    And attached is a patch to correct it.

    @gdamjan gdamjan mannequin assigned warsaw Nov 28, 2005
    @gdamjan gdamjan mannequin added the stdlib Python modules in the Lib dir label Nov 28, 2005
    @gdamjan gdamjan mannequin assigned warsaw Nov 28, 2005
    @gdamjan gdamjan mannequin added the stdlib Python modules in the Lib dir label Nov 28, 2005
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Mar 5, 2007

    Your proposed patch doesn't seem to work in Python 2.5, or the trunk (i.e. it won't prevent an exception from occuring). Can you please revise it?

    @maxua
    Copy link
    Mannequin

    maxua mannequin commented Dec 2, 2008

    How about this version?

    @vstinner
    Copy link
    Member

    vstinner commented Dec 2, 2008

    It was proposed to rewrite MIMEText in Python 3.1 (and 2.7?) to use
    unicode characters in the internals and reconvert to bytes to send it
    to a socket (or a file).

    @gdamjan
    Copy link
    Mannequin Author

    gdamjan mannequin commented Dec 2, 2008

    The patch by maxua works fine with 2.6 too and solves the problem.
    I'd suggest it be applied to the 2.6 branch, even if email is rewriten
    for 2.7/3.x.

    @bgamari
    Copy link
    Mannequin

    bgamari mannequin commented May 5, 2009

    What is the status of this?

    @bitdancer
    Copy link
    Member

    It looks to me like MIMEText doesn't actually support unicode input. .

    One way to get the example to work is to do this:

    MIMEText(u'\u043a\u0438\u0440\u0438\u043b\u0438\u0446\u0430'.encode('utf-8'), 'plain', 'utf-8')

    The above call produces valid output from as_string:

    'Content-Type: text/plain; charset="utf-8"\nMIME-Version:
    1.0\nContent-Transfer-Encoding: base64\n\n0LrQuNGA0LjQu9C40YbQsA==\n'

    How you'd get it to use 8bit, I have no idea. Still, I'm inclined to
    close this as invalid unless Barry tells me my analysis is wrong.

    (CF: http://mg.pov.lt/blog/unicode-emails-in-python for a good example
    of handling unicode using the email package, which I found after
    figuring out the above.)

    Clearly, the documentation of this could be better, but I suspect the
    developers would rather spend their time fixing the email module in py3.
    A doc patch would certainly be accepted. (Maybe someone could ask the
    above blogger if we could borrow his example for the docs.)

    @bitdancer bitdancer added invalid type-bug An unexpected behavior, bug, or error labels May 5, 2009
    @l0nwlf
    Copy link
    Mannequin

    l0nwlf mannequin commented Apr 13, 2010

    After applying maxua's patch we do not get the unicode error but as david stated the support is not there. Here is the test.

    >>> import email
    >>> msg = email.MIMEText.MIMEText(u'\u043a\u0438\u0440\u0438\u043b\u0438\u0446\u0430')
    >>> msg.set_charset('utf8')
    >>> msg.as_string()
    'MIME-Version: 1.0\nContent-Transfer-Encoding: 8bit\nContent-Type: text/plain; charset="utf8"\n\n\xd0\xba\xd0\xb8\xd1\x80\xd0\xb8\xd0\xbb\xd0\xb8\xd1\x86\xd0\xb0'

    This does not seems a viable general solution to the problem.

    I guess, this issue should be closed and emphasis should be now on development of 'email 6.0'. By the way I mailed Marius, the author of the blog-post http://mg.pov.lt/blog/unicode-emails-in-python , if I can borrow his example for the doc-patch.

    @orsenthil
    Copy link
    Member

    Hi David,
    The attached patch for this issue:

    + if isinstance(payload, unicode):
    + payload = payload.encode(msg.get_charset().output_charset or 'us-ascii')

    looks fine enough to me. Are you worried about the /or 'us-ascii'/ part of this patch?

    IMHO, the patch may prevent the straight forward Exception for which the issue was raised.

    But on a larger scale, it is advisable to document MIMEText usage wth encoding as you mentioned.

    @warsaw warsaw assigned bitdancer and unassigned warsaw May 5, 2010
    @bitdancer
    Copy link
    Member

    I don't think either of the previous patches are correct. I found a note in bpo-1685453 that Barry would like for this to work, and after poking around in the code for a bit I think it can be done without breaking anything.

    Attached is a patch that adds unicode support to MIMEText, including unit tests and docs updates. Note that it is necessary to specify a charset if you have non-ASCII text in your unicode string, since the default charset is us-ascii. The unit tests confirm this behavior.

    Now the question is, is this a bug-fix or an enhancement? I *think* it is safe to apply and backport, since I think the only behavior it changes is to make unicode input work, whereas before it would give a traceback. But I've been wrong before :(

    @bitdancer
    Copy link
    Member

    Ah, it's not 100% true that it doesn't change "working" behavior. Before the patch, the first example in this ticket doesn't raise an error until the as_string call. After this patch, the error is raised as soon as MIMEText is called without the charset parameter. Since without the patch the code still fails eventually, I think this is an acceptable behavior change for a bug fix, but it does make me a little nervous :)

    Updated patch with doc change to Message.set_charset attached.

    @bitdancer
    Copy link
    Member

    Committed to 2.7 in r81658 and 2.6 in r81659. I'm leaving this open for the moment because while py3 doesn't have this problem, the tests should still pass and they don't.

    @bitdancer
    Copy link
    Member

    OK, py3k now passes all but one of the tests, and I've disabled that one pending email6 since fixing it would break backward compatibility within the 3.x series. The fix is different, doing the encoding to output_charset just before calling base64mime. This makes me think that quopri in py3k probably isn't working right, but that's a different issue. I did not forward port the doc changes as they are inappropriate for py3k.

    py3k patched in r81660, 3.1 in r81661.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants