classification
Title: unicode in email.MIMEText and email/Charset.py
Type: behavior Stage: committed/rejected
Components: Library (Lib) Versions: Python 3.1, Python 3.0, Python 2.7, Python 2.6
process
Status: pending Resolution: invalid
Dependencies: Superseder:
Assigned To: barry Nosy List: barry, bgamari, gdamjan, haypo, loewis, maxua, r.david.murray (7)
Priority: normal Keywords patch

Created on 2005-11-28 14:15 by gdamjan, last changed 2009-05-05 21:54 by r.david.murray.

Files
File name Uploaded Description Edit Remove
Charset.patch gdamjan, 2005-11-28 14:15
mimetext-unicode.patch maxua, 2008-12-02 13:10 unicode mimetext support
Messages (7)
msg49137 - (view) Author: Damjan Georgievski (gdamjan) Date: 2005-11-28 14:15
This is the test case that fails in python 2.4.1:
from email.MIMEText import MIMEText
msg =
MIMEText(u'\u043a\u0438\u0440\u0438\u043b\u0438\u0446\u0430')
msg.set_charset('utf-8')
msg.as_string()

And attached is a patch to correct it.
msg49138 - (view) Author: Martin v. Löwis (loewis) Date: 2007-03-05 13:13
Your proposed patch doesn't seem to work in Python 2.5, or the trunk (i.e. it won't prevent an exception from occuring). Can you please revise it?
msg76737 - (view) Author: (maxua) Date: 2008-12-02 13:10
How about this version?
msg76740 - (view) Author: STINNER Victor (haypo) Date: 2008-12-02 13:53
It was proposed to rewrite MIMEText in Python 3.1 (and 2.7?) to use 
unicode characters in the internals and reconvert to bytes to send it 
to a socket (or a file).
msg76741 - (view) Author: Damjan Georgievski (gdamjan) Date: 2008-12-02 13:56
The patch by maxua works fine with 2.6 too and solves the problem.
I'd suggest it be applied to the 2.6 branch, even if email is rewriten
for 2.7/3.x.
msg87253 - (view) Author: Ben Gamari (bgamari) Date: 2009-05-05 16:52
What is the status of this?
msg87292 - (view) Author: R. David Murray (r.david.murray) Date: 2009-05-05 21:54
It looks to me like MIMEText doesn't actually support unicode input.  .

One way to get the example to work is to do this:

 MIMEText(u'\u043a\u0438\u0440\u0438\u043b\u0438\u0446\u0430'.encode('utf-8'), 'plain', 'utf-8')

The above call produces valid output from as_string:

'Content-Type: text/plain; charset="utf-8"\nMIME-Version:
1.0\nContent-Transfer-Encoding: base64\n\n0LrQuNGA0LjQu9C40YbQsA==\n'

How you'd get it to use 8bit, I have no idea.  Still, I'm inclined to
close this as invalid unless Barry tells me my analysis is wrong.

(CF: http://mg.pov.lt/blog/unicode-emails-in-python for a good example
of handling unicode using the email package, which I found after
figuring out the above.)

Clearly, the documentation of this could be better, but I suspect the
developers would rather spend their time fixing the email module in py3.
 A doc patch would certainly be accepted.  (Maybe someone could ask the
above blogger if we could borrow his example for the docs.)
History
Date User Action Args
2009-05-05 21:54:33r.david.murraysetstatus: open -> pending

nosy: + r.david.murray
versions: + Python 2.6, Python 3.0, Python 3.1, Python 2.7
messages: + msg87292
resolution: invalid

type: behavior
stage: committed/rejected
2009-05-05 16:52:43bgamarisetnosy: + bgamari
messages: + msg87253
2009-03-30 22:56:23ajaksu2linkissue1685453 dependencies
2008-12-02 13:56:57gdamjansetmessages: + msg76741
2008-12-02 13:53:18hayposetmessages: + msg76740
2008-12-02 13:52:17hayposetnosy: + haypo
2008-12-02 13:10:54maxuasetfiles: + mimetext-unicode.patch
nosy: + maxua
messages: + msg76737
2005-11-28 14:15:40gdamjancreate