classification
Title: email package should work better with unicode
Type: behavior Stage: resolved
Components: Library (Lib), Unicode Versions: Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: r.david.murray Nosy List: ajaksu2, barry, bgamari, eric.araujo, l0nwlf, ocean-city, pebbe, r.david.murray, sivang
Priority: normal Keywords:

Created on 2007-03-21 18:39 by barry, last changed 2011-12-05 18:14 by r.david.murray. This issue is now closed.

Messages (7)
msg31612 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2007-03-21 18:39
This is a catch-all issue for improving the email package's handling of unicode.  For now, please add issues/problems you find with email & unicode to this tracker item.

For example:

MIMEText()'s first argument should accept a unicode if _charset is also given.  It should not be necessary to manually encode the first argument into an 8-bit string.
msg84700 - (view) Author: Daniel Diniz (ajaksu2) Date: 2009-03-30 22:56
Link to #1681333, #4487, #1443875, #1555842, #4661, #1078919, #963906,
#1379416 and #1368247.
msg84753 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-03-31 06:05
Probably these are related too. #5259 #5304
msg100550 - (view) Author: Peter Kleiweg (pebbe) Date: 2010-03-06 22:45
In Python 3.1.1, email.mime.text.MIMEText accepts an 8-bit charset, but not utf-8.

I think you should not have to specify a charset. All strings are unicode now, so I think the package should choose an appropriate charset based on the characters in the text, us-ascii, some iso-8859 charset, or utf-8, whatever fits.


Python 3.1.1 (r311:74480, Oct  2 2009, 11:50:52)                                                                                              
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2                                                                                 
Type "help", "copyright", "credits" or "license" for more information.                                                                        
>>> from email.mime.text import MIMEText                                                                                                      
>>> text = 'H\u00e9'                                                                                                                          
>>> msg = MIMEText(text, 'plain', 'iso-8859-1')                                                                                               
>>> print(msg.as_string())                                                                                                                    
Content-Type: text/plain; charset="iso-8859-1"                                                                                                
MIME-Version: 1.0                                                                                                                             
Content-Transfer-Encoding: quoted-printable                                                                                                   
                                                                                                                                              
H=E9                                                                                                                                          
>>> msg = MIMEText(text, 'plain', 'utf-8')                                                                                                    
Traceback (most recent call last):                                                                                                            
  File "/my/opt/Python-3/lib/python3.1/email/message.py", line 269, in set_charset                                                            
    cte(self)                                                                                                                                 
TypeError: 'str' object is not callable                                                                                                       
                                                                                                                                              
During handling of the above exception, another exception occurred:                                                                           
                                                                                                                                              
Traceback (most recent call last):                                                                                                            
  File "<stdin>", line 1, in <module>                                                                                                         
  File "/my/opt/Python-3/lib/python3.1/email/mime/text.py", line 30, in __init__                                                              
    self.set_payload(_text, _charset)                                                                                                         
  File "/my/opt/Python-3/lib/python3.1/email/message.py", line 234, in set_payload                                                            
    self.set_charset(charset)                                                                                                                 
  File "/my/opt/Python-3/lib/python3.1/email/message.py", line 271, in set_charset                                                            
    self._payload = charset.body_encode(self._payload)                                                                                        
  File "/my/opt/Python-3/lib/python3.1/email/charset.py", line 380, in body_encode                                                            
    return email.base64mime.body_encode(string)                                                                                               
  File "/my/opt/Python-3/lib/python3.1/email/base64mime.py", line 94, in body_encode                                                          
    enc = b2a_base64(s[i:i + max_unencoded]).decode("ascii")                                                                                  
TypeError: must be bytes or buffer, not str                                                                                                   
>>>
msg124715 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-27 17:04
Now that we are primarily focused on Python3 development, collecting "unicode" issues is not really all that useful (at least not to me, and I'm currently doing the email maintenance), so I'm closing this.  All the relevant issues are assigned to me anyway, so I'll be dealing with them by and by.
msg148880 - (view) Author: Sivan Greenberg (sivang) Date: 2011-12-05 17:12
I am having hard time parsing all the text/html and text/plain parts of a message, concatenating them into a string. I am thinking of writing some custom code to do manual handling of this...

If this could be fixed that would be great. The issues are converting from and to ascii/unicode or whatever encoding/charset the part uses.
msg148882 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-12-05 18:14
That particular problem will get fixed in the next version of the email package (hopefully in Python3.3), but that isn't ready yet.
History
Date User Action Args
2011-12-05 18:14:12r.david.murraysetmessages: + msg148882
2011-12-05 17:12:44sivangsetnosy: + sivang
messages: + msg148880
2010-12-27 17:04:58r.david.murraysetstatus: open -> closed
nosy: barry, ocean-city, ajaksu2, eric.araujo, r.david.murray, bgamari, l0nwlf, pebbe
messages: + msg124715

dependencies: - Add utf8 alias for email charsets, email.parser: impossible to read messages encoded in a different encoding, smtplib is broken in Python3, email/base64mime.py cannot work, Add decode_header_as_string method to email.utils, Unicode email address helper, email.Header (via add_header) encodes non-ASCII content incorrectly, unicode in email.MIMEText and email/Charset.py, email.Header encode() unicode P2.6, email/charset.py convert() patch, email package and Unicode strings handling, email.header unicode fix
resolution: out of date
stage: test needed -> resolved
2010-07-17 10:20:11eric.araujosetnosy: + eric.araujo
2010-06-24 10:40:09l0nwlfsetnosy: + l0nwlf
2010-05-05 13:34:46barrysetassignee: barry -> r.david.murray

nosy: + r.david.murray
2010-03-06 22:45:26pebbesetnosy: + pebbe
messages: + msg100550
2009-06-18 01:37:46r.david.murraysetdependencies: + Add decode_header_as_string method to email.utils
versions: + Python 3.2, - Python 3.0
2009-05-01 16:00:58bgamarisetnosy: + bgamari
2009-03-31 06:05:28ocean-citysetnosy: + ocean-city
dependencies: + smtplib is broken in Python3, email/base64mime.py cannot work
messages: + msg84753
2009-03-30 22:56:23ajaksu2setdependencies: + Add utf8 alias for email charsets, email.parser: impossible to read messages encoded in a different encoding, Unicode email address helper, email.Header (via add_header) encodes non-ASCII content incorrectly, unicode in email.MIMEText and email/Charset.py, email.Header encode() unicode P2.6, email/charset.py convert() patch, email package and Unicode strings handling, email.header unicode fix
type: behavior
components: + Unicode
versions: + Python 3.0, Python 3.1, Python 2.7
nosy: + ajaksu2

messages: + msg84700
stage: test needed
2007-03-21 18:39:03barrycreate