classification
Title: email package quoted printable behaviour changed
Type: behavior Stage:
Components: Library (Lib) Versions: Python 2.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: r.david.murray Nosy List: ThomasAH, barry, paulproteus, r.david.murray, rdemetrescu
Priority: normal Keywords:

Created on 2006-07-20 14:22 by ThomasAH, last changed 2010-12-14 19:15 by r.david.murray.

Messages (5)
msg29229 - (view) Author: Thomas Arendsen Hein (ThomasAH) Date: 2006-07-20 14:22
from email.Message import Message
from email.Charset import Charset, QP
text = "="
msg = Message()
charset = Charset("utf-8")
charset.header_encoding = QP
charset.body_encoding = QP
msg.set_charset(charset)
msg.set_payload(text)
print msg.as_string()

Gives

MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

=3D


With the email package from python2.4.3 and 2.4.4c0 the
last '=3D' becomes just '=', so an extra
msg.body_encode(text) is needed.
msg29230 - (view) Author: Thomas Arendsen Hein (ThomasAH) Date: 2006-07-20 16:01
Logged In: YES 
user_id=839582

One program which got hit by this is MoinMoin, see
http://moinmoin.wikiwikiweb.de/MoinMoinBugs/ResetPasswordEmailImproperlyEncoded
msg58248 - (view) Author: Roger Demetrescu (rdemetrescu) Date: 2007-12-06 16:53
I am not sure if it is related, but anyway...

MIMEText behaviour has changed from python 2.4 to 2.5.


# Python 2.4

>>> from email.MIMEText import MIMEText
>>> m = MIMEText(None, 'html', 'iso-8859-1')
>>> m.set_payload('abc ' * 50)
>>> print m
From nobody Thu Dec  6 12:52:40 2007
Content-Type: text/html; charset="iso-8859-1"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc=
 abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc ab=
c abc abc abc abc abc abc abc abc abc abc abc abc=20








# Python 2.5

>>> from email.MIMEText import MIMEText
>>> m = MIMEText(None, 'html', 'iso-8859-1')
>>> m.set_payload('abc ' * 50)
>>> print m
From nobody Thu Dec  6 14:46:07 2007
Content-Type: text/html; charset="iso-8859-1"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc
abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc
abc abc abc abc abc abc abc abc abc abc abc abc abc abc





However, if we initialize MIMEText with the text, we get the correct output:

# python 2.5

>>> from email.MIMEText import MIMEText
>>> m = MIMEText('abc ' * 50, 'html', 'iso-8859-1')
>>> print m
From nobody Thu Dec  6 13:01:17 2007
Content-Type: text/html; charset="iso-8859-1"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc=
 abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc ab=
c abc abc abc abc abc abc abc abc abc abc abc abc=20



If I want to set payload after MIMEText is already created, I need to
use this workaround::

#python 2.5
from email.MIMEText import MIMEText
m = MIMEText(None, 'html', 'iso-8859-1')
m.set_payload(m._charset.body_encode('abc' * 50))


PS: The issue's versions field is filled with "Python 2.4". Shouldn't it
be "Python 2.5" ?
msg73949 - (view) Author: Asheesh Laroia (paulproteus) Date: 2008-09-27 23:59
Another way to see this issue is that the email module double-encodes
when one attempts to use quoted-printable encoding.  This has to be
worked around by e.g. MoinMoin.

It's easy to get proper base64-encoded output of email.mime.text:

 	>>> mt = email.mime.text.MIMEText('Ta mère', 'plain', 'utf-8')
 	>>> 'Content-Transfer-Encoding: base64' in mt.as_string()
 	True
 	>>> mt.as_string().split('\n')[-2]
 	'VGEgbcOocmU='

There we go, all nice and base64'd.

I can *not* figure out how to get quoted-printable-encoding.  I found 
http://docs.python.org/lib/module-email.encoders.html , so I thought
great - I'll just encode my MIMEText object:

 	>>> email.encoders.encode_quopri(mt)
 	>>> 'Content-Transfer-Encoding: quoted-printable' in mt.as_string()
 	True

Great!  Except it's actually double-encoded, and the headers admit to as
much.  You see here that, in addition to the quoted-printable header
just discovered, there is also a base64-related header, and the result
is not strictly QP encoding but QP(base64(payload)).

 	>>> 'Content-Transfer-Encoding: base64' in mt.as_string()
 	True
 	>>> mt.as_string().split('\n')[-2]
 	'VGEgbcOocmU=3D'

It should look like:

 	>>> quopri.encodestring('Ta mère')
 	'Ta m=C3=A8re'

I raised this issue on the Baypiggies list
<http://mail.python.org/pipermail/baypiggies/2008-September/003983.html>, but
luckily I found this here bug.  This is with Python 2.5.2-0ubuntu1 from
Ubuntu 8.04.

 	paulproteus@alchemy:~ $ python --version
 	Python 2.5.2

If we can come to a decision as to how this *should* work, I could
contribute a patch and/or tests to fix it.  I could even perhaps write a
new section of the Python documentation of the email module explaining this.
msg105045 - (view) Author: Thomas Arendsen Hein (ThomasAH) Date: 2010-05-05 14:59
Roger Demetrescu, I filed the issue with "Python 2.4", because the behavior changed somewhere between 2.4.2 and 2.4.3

The updated link to the MoinMoin bug entry is:
http://moinmo.in/MoinMoinBugs/ResetPasswordEmailImproperlyEncoded

The workaround I use to be compatible with <= 2.4.2 and >= 2.4.3 is:

    msg.set_payload('=')
    if msg.as_string().endswith('='):
        text = charset.body_encode(text)
    msg.set_payload(text)
History
Date User Action Args
2010-12-14 19:15:29r.david.murraysettype: behavior
2010-05-05 14:59:24ThomasAHsetmessages: + msg105045
2010-05-05 13:51:12barrysetassignee: barry -> r.david.murray

nosy: + r.david.murray
2008-09-27 23:59:46paulproteussetnosy: + paulproteus
messages: + msg73949
2007-12-06 16:53:42rdemetrescusetnosy: + rdemetrescu
messages: + msg58248
2006-07-20 14:22:21ThomasAHcreate