classification
Title: email package with unicode subject/body
Type: Stage:
Components: Library (Lib) Versions: Python 3.0
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: barry Nosy List: barry, haypo
Priority: release blocker Keywords: patch

Created on 2008-11-12 13:16 by haypo, last changed 2008-11-20 22:48 by barry. This issue is now closed.

Files
File name Uploaded Description Edit
email_mime_unicode.patch haypo, 2008-11-12 13:16
email_example.patch haypo, 2008-11-12 13:30
Messages (7)
msg75784 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-11-12 13:16
I never used the email package, so my issue is maybe not a bug. I'm 
trying to send an email with diacritics in the subject and the body. 
I'm french so it's natural to use characters not in the ASCII range. I 
wrote this small program:

def main():
    # coding: utf8
    ADDRESS = 'victor.stinner@haypocalc.com'
    from email.mime.text import MIMEText
    msg = MIMEText('accent éôŁ', 'plain', 'utf-8')
    msg['Subject'] = 'sujet éôł'
    msg['From'] = ADDRESS
    msg['To'] = ADDRESS
    text = msg.as_string()
    print("--- FLATTEN ---")
    print(text)
    return
    import smtplib
    client=smtplib.SMTP('smtp.free.fr')
    client.sendmail(ADDRESS, ADDRESS, text)
    client.quit()
main()

(remove the "return" to really send the email)

The problem:
  (...)
  File "/home/haypo/prog/py3k/Lib/email/generator.py", line 141, in 
_write_headers
    header_name=h, continuation_ws='\t')
  File "/home/haypo/prog/py3k/Lib/email/header.py", line 189, in 
__init__
    self.append(s, charset, errors)
  File "/home/haypo/prog/py3k/Lib/email/header.py", line 262, in 
append
    input_bytes = s.encode(input_charset, errors)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 
6-8: ordinal not in range(128)

I don't understand why it uses ASCII whereas I specified that I would 
like to use the UTF-8 charset.

My attached patch reused the message charset to encode the headers, 
but use ASCII if the header can be encoded as ASCII. The patch 
included an unit test.
msg75785 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-11-12 13:30
The first email example (the one using a file in the library 
documentation) opens a text in binary mode and use the ASCII charset. 
It's quite strange because I expect an text to use only characters, 
something like:
   charset = 'ASCII'
   # Create a text/plain message
   with open(textfile, 'r', encoding=charset) as fp:
      msg = MIMEText(fp.read(), 'plain', charset)

... and the example doesn't work:
Traceback (most recent call last):
  File "y.py", line 11, in <module>
    msg = MIMEText(fp.read())
  File "/home/haypo/prog/py3k/Lib/email/mime/text.py", line 30, in 
__init__
    self.set_payload(_text, _charset)
  File "/home/haypo/prog/py3k/Lib/email/message.py", line 234, in 
set_payload
    self.set_charset(charset)
  File "/home/haypo/prog/py3k/Lib/email/message.py", line 269, in 
set_charset
    cte(self)
  File "/home/haypo/prog/py3k/Lib/email/encoders.py", line 60, in 
encode_7or8bit
    orig.encode('ascii')
AttributeError: 'bytes' object has no attribute 'encode'

Solutions:
 - Message.set_payload() have to block type different than str
   => or would it be possible to use bytes as payload???
 - Fix the example to use characters

The new attached patch fixes the example and check type in 
Message.set_payload().
msg75786 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-11-12 14:24
"Please make this a release blocker and I will look at it this 
weekend.   -Barry"
msg76114 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2008-11-20 16:15
This example works though, and it also works in earlier Pythons.


from email.header import Header

def main():
    # coding: utf8
    ADDRESS = 'victor.stinner@haypocalc.com'
    from email.mime.text import MIMEText
    msg = MIMEText('accent \xe9\xf4\u0142', 'plain', 'utf-8')
    msg['Subject'] = Header('sujet \xe9\xf4\u0142'.encode('utf-8'),
                            'utf-8')
    msg['From'] = ADDRESS
    msg['To'] = ADDRESS
    text = msg.as_string()
    print("--- FLATTEN ---")
    print(text)
    return

main()
msg76115 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2008-11-20 16:21
I'm rejecting the patch because the old way of making this work still
works in Python 3.0.  Any larger changes to the API need to be made in
the context of redesigning the email package to be byte/str aware.
msg76143 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-11-20 22:07
> I'm rejecting the patch because the old way of making 
> this work still works in Python 3.0.

I checked the documentation and there is a section about "email: 
Internationalized headers". I didn't read this section. I just 
expected that Python uses the right encoding beacuse it was already 
specified in the MIMEText() constructor...

> Any larger changes to the API need to be made in
> the context of redesigning the email package to be byte/str aware.

Right.
msg76145 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2008-11-20 22:48
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Nov 20, 2008, at 5:07 PM, STINNER Victor wrote:

> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
>
>> I'm rejecting the patch because the old way of making
>> this work still works in Python 3.0.
>
> I checked the documentation and there is a section about "email:
> Internationalized headers". I didn't read this section. I just
> expected that Python uses the right encoding beacuse it was already
> specified in the MIMEText() constructor...

Yes.  This is a stupid API (tm). :)

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSSXpJHEjvBPtnXfVAQKfOAP9G2BSPKIPTVTeo5k3rovqGbYSCB23SK+P
+YHInZY2NTikFUgJec4EvWvvuTkW77nb5kxVTb+MlQJMAN//AOy8xvHsFUae4F8Y
P9DsDMb3MhKokr/Y1gZyxlpHhXiK5r6aEh9+cWrujXbf9gwtYWmeiKl6MoZkOWYA
3H9gASFvuUI=
=mapP
-----END PGP SIGNATURE-----
History
Date User Action Args
2008-11-20 22:48:06barrysetmessages: + msg76145
2008-11-20 22:07:51hayposetmessages: + msg76143
2008-11-20 16:21:57barrysetstatus: open -> closed
resolution: rejected
messages: + msg76115
2008-11-20 16:15:19barrysetmessages: + msg76114
2008-11-12 16:27:31benjamin.petersonsetassignee: barry
nosy: + barry
2008-11-12 14:24:53hayposetpriority: release blocker
messages: + msg75786
2008-11-12 13:30:38hayposetfiles: + email_example.patch
messages: + msg75785
2008-11-12 13:16:31haypocreate