This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: email package content-transfer-encoding behaviour changed
Type: behavior Stage:
Components: Documentation, email, Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: ThomasAH, barry, docs@python, iritkatriel, littleq0903, paulproteus, python-dev, r.david.murray, rdemetrescu
Priority: normal Keywords:

Created on 2006-07-20 14:22 by ThomasAH, last changed 2022-04-11 14:56 by admin.

Messages (11)
msg29229 - (view) Author: Thomas Arendsen Hein (ThomasAH) Date: 2006-07-20 14:22
from email.Message import Message
from email.Charset import Charset, QP
text = "="
msg = Message()
charset = Charset("utf-8")
charset.header_encoding = QP
charset.body_encoding = QP
msg.set_charset(charset)
msg.set_payload(text)
print msg.as_string()

Gives

MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

=3D


With the email package from python2.4.3 and 2.4.4c0 the
last '=3D' becomes just '=', so an extra
msg.body_encode(text) is needed.
msg29230 - (view) Author: Thomas Arendsen Hein (ThomasAH) Date: 2006-07-20 16:01
Logged In: YES 
user_id=839582

One program which got hit by this is MoinMoin, see
http://moinmoin.wikiwikiweb.de/MoinMoinBugs/ResetPasswordEmailImproperlyEncoded
msg58248 - (view) Author: Roger Demetrescu (rdemetrescu) Date: 2007-12-06 16:53
I am not sure if it is related, but anyway...

MIMEText behaviour has changed from python 2.4 to 2.5.


# Python 2.4

>>> from email.MIMEText import MIMEText
>>> m = MIMEText(None, 'html', 'iso-8859-1')
>>> m.set_payload('abc ' * 50)
>>> print m
From nobody Thu Dec  6 12:52:40 2007
Content-Type: text/html; charset="iso-8859-1"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc=
 abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc ab=
c abc abc abc abc abc abc abc abc abc abc abc abc=20








# Python 2.5

>>> from email.MIMEText import MIMEText
>>> m = MIMEText(None, 'html', 'iso-8859-1')
>>> m.set_payload('abc ' * 50)
>>> print m
From nobody Thu Dec  6 14:46:07 2007
Content-Type: text/html; charset="iso-8859-1"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc
abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc
abc abc abc abc abc abc abc abc abc abc abc abc abc abc





However, if we initialize MIMEText with the text, we get the correct output:

# python 2.5

>>> from email.MIMEText import MIMEText
>>> m = MIMEText('abc ' * 50, 'html', 'iso-8859-1')
>>> print m
From nobody Thu Dec  6 13:01:17 2007
Content-Type: text/html; charset="iso-8859-1"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc=
 abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc ab=
c abc abc abc abc abc abc abc abc abc abc abc abc=20



If I want to set payload after MIMEText is already created, I need to
use this workaround::

#python 2.5
from email.MIMEText import MIMEText
m = MIMEText(None, 'html', 'iso-8859-1')
m.set_payload(m._charset.body_encode('abc' * 50))


PS: The issue's versions field is filled with "Python 2.4". Shouldn't it
be "Python 2.5" ?
msg73949 - (view) Author: Asheesh Laroia (paulproteus) * Date: 2008-09-27 23:59
Another way to see this issue is that the email module double-encodes
when one attempts to use quoted-printable encoding.  This has to be
worked around by e.g. MoinMoin.

It's easy to get proper base64-encoded output of email.mime.text:

 	>>> mt = email.mime.text.MIMEText('Ta mère', 'plain', 'utf-8')
 	>>> 'Content-Transfer-Encoding: base64' in mt.as_string()
 	True
 	>>> mt.as_string().split('\n')[-2]
 	'VGEgbcOocmU='

There we go, all nice and base64'd.

I can *not* figure out how to get quoted-printable-encoding.  I found 
http://docs.python.org/lib/module-email.encoders.html , so I thought
great - I'll just encode my MIMEText object:

 	>>> email.encoders.encode_quopri(mt)
 	>>> 'Content-Transfer-Encoding: quoted-printable' in mt.as_string()
 	True

Great!  Except it's actually double-encoded, and the headers admit to as
much.  You see here that, in addition to the quoted-printable header
just discovered, there is also a base64-related header, and the result
is not strictly QP encoding but QP(base64(payload)).

 	>>> 'Content-Transfer-Encoding: base64' in mt.as_string()
 	True
 	>>> mt.as_string().split('\n')[-2]
 	'VGEgbcOocmU=3D'

It should look like:

 	>>> quopri.encodestring('Ta mère')
 	'Ta m=C3=A8re'

I raised this issue on the Baypiggies list
<http://mail.python.org/pipermail/baypiggies/2008-September/003983.html>, but
luckily I found this here bug.  This is with Python 2.5.2-0ubuntu1 from
Ubuntu 8.04.

 	paulproteus@alchemy:~ $ python --version
 	Python 2.5.2

If we can come to a decision as to how this *should* work, I could
contribute a patch and/or tests to fix it.  I could even perhaps write a
new section of the Python documentation of the email module explaining this.
msg105045 - (view) Author: Thomas Arendsen Hein (ThomasAH) Date: 2010-05-05 14:59
Roger Demetrescu, I filed the issue with "Python 2.4", because the behavior changed somewhere between 2.4.2 and 2.4.3

The updated link to the MoinMoin bug entry is:
http://moinmo.in/MoinMoinBugs/ResetPasswordEmailImproperlyEncoded

The workaround I use to be compatible with <= 2.4.2 and >= 2.4.3 is:

    msg.set_payload('=')
    if msg.as_string().endswith('='):
        text = charset.body_encode(text)
    msg.set_payload(text)
msg184663 - (view) Author: Colin Su (littleq0903) * Date: 2013-03-19 19:04
Confirmed with David, we work on this together on sprints.

This is not a bug, if you do "set_payload" directly by yourself, you need to encode the payload by yourself because set_payload() doesn't encode payload if 'Content-Transfer-Encoding' did exist.
msg184685 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-03-19 21:43
Reviewing this again, it seems to me that there are two separate issues reported here: (1) set_payload on an existing MIMEText object no longer encodes (but it has now been a long time since it changed). (2) the functions in the encodings module, given an already encoded message, double encode.

(1) is now set in stone.  That is, it is documented as working this way implicitly if you read the set_payload and set_charset docs and has been working that way for a while now.  An explicit note should be added to the MIMEText docs, with a workaround.)

(2) could be fixed, I think, since it is unlikely that anyone would be depending on such behavior.
msg184692 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-03-19 22:22
New changeset ba500b179c3a by R David Murray in branch '3.2':
#1525919: Document MIMEText+set_payload encoding behavior.
http://hg.python.org/cpython/rev/ba500b179c3a

New changeset fcbc28ef96a3 by R David Murray in branch '3.3':
Merge: #1525919: Document MIMEText+set_payload encoding behavior.
http://hg.python.org/cpython/rev/fcbc28ef96a3

New changeset b9e07f20832e by R David Murray in branch 'default':
Merge: #1525919: Document MIMEText+set_payload encoding behavior.
http://hg.python.org/cpython/rev/b9e07f20832e
msg184698 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-03-19 22:47
I've committed the doc change.  I'm going to be lazy and leave this issue open to deal with the encodings module fix.
msg408387 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-12-12 14:51
The encoding functions are now doing

orig = msg.get_payload(decode=True)

Does this fix the double-encoding issue?


This change was made in 
https://github.com/python/cpython/commit/00ae435deef434f471e39bea3f3ab3a3e3cd90fe
msg408433 - (view) Author: Thomas Arendsen Hein (ThomasAH) Date: 2021-12-13 09:53
Default python3 on Debian buster:
$ python3
Python 3.7.3 (default, Jan 22 2021, 20:04:44) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import email.mime.text
>>> mt = email.mime.text.MIMEText('Ta mère', 'plain', 'utf-8')
>>> print(mt.as_string())
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64

VGEgbcOocmU=

>>> email.encoders.encode_quopri(mt)
>>> print(mt.as_string())
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Transfer-Encoding: quoted-printable

Ta=20m=C3=A8re

So the encoded text looks good now, but there are still duplicate headers.

Old output (python2.7) is identical to what Asheesh Laroia (paulproteus)
reported for python2.5:
---
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Transfer-Encoding: quoted-printable

VGEgbcOocmU=3D
---
History
Date User Action Args
2022-04-11 14:56:19adminsetgithub: 43702
2021-12-13 09:55:31iritkatrielsetversions: + Python 3.9, Python 3.10, Python 3.11, - Python 2.7, Python 3.2, Python 3.3, Python 3.4
2021-12-13 09:53:07ThomasAHsetstatus: pending -> open

messages: + msg408433
2021-12-12 14:51:26iritkatrielsetstatus: open -> pending
nosy: + iritkatriel
messages: + msg408387

2013-03-19 22:48:10r.david.murraysettitle: email package quoted printable behaviour changed -> email package content-transfer-encoding behaviour changed
2013-03-19 22:47:33r.david.murraysetmessages: + msg184698
2013-03-19 22:22:03python-devsetnosy: + python-dev
messages: + msg184692
2013-03-19 21:43:00r.david.murraysetmessages: + msg184685
components: + Library (Lib), email
2013-03-19 19:05:55littleq0903setassignee: docs@python

components: + Documentation, - Library (Lib), email
nosy: + docs@python
2013-03-19 19:04:52littleq0903setnosy: + littleq0903
messages: + msg184663
2013-03-19 18:06:11littleq0903setversions: + Python 2.7, Python 3.2, Python 3.3, Python 3.4, - Python 2.4
2012-05-16 01:22:03r.david.murraysetassignee: r.david.murray -> (no value)
components: + email
2010-12-14 19:15:29r.david.murraysettype: behavior
2010-05-05 14:59:24ThomasAHsetmessages: + msg105045
2010-05-05 13:51:12barrysetassignee: barry -> r.david.murray

nosy: + r.david.murray
2008-09-27 23:59:46paulproteussetnosy: + paulproteus
messages: + msg73949
2007-12-06 16:53:42rdemetrescusetnosy: + rdemetrescu
messages: + msg58248
2006-07-20 14:22:21ThomasAHcreate