This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Possible to set invalid Content-Transfer-Encoding on email.mime.multipart.MIMEMultipart
Type: behavior Stage: resolved
Components: email, Library (Lib) Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: barry Nosy List: Sharebear, barry, christian.heimes, cjw296, loewis, michaelanckaert, nanjekyejoannah, r.david.murray
Priority: low Keywords:

Created on 2008-01-14 11:17 by Sharebear, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (16)
msg59896 - (view) Author: Jonathan Share (Sharebear) Date: 2008-01-14 11:17
Steps to Reproduce
==================
>>> from email.mime.multipart import MIMEMultipart
>>> from email.mime.text import MIMEText
>>> multipart = MIMEMultipart()
>>> multipart.set_charset('UTF-8')
>>> text = MIMEText("sample text")
>>> multipart.attach(text)
>>> print multipart.as_string()
MIME-Version: 1.0
Content-Type: multipart/mixed; charset="utf-8";
        boundary="===============0973828728=="
Content-Transfer-Encoding: base64

--===============0973828728==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit

sample text
--===============0973828728==--
>>> multipart = MIMEMultipart()
>>> multipart.attach(text)
>>> multipart.set_charset('UTF-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
email/message.py", line 262, in set_charset
    self._payload = charset.body_encode(self._payload)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
email/charset.py", line 384, in body_encode
    return email.base64mime.body_encode(s)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
email/base64mime.py", line 148, in encode
    enc = b2a_base64(s[i:i + max_unencoded])
TypeError: b2a_base64() argument 1 must be string or read-only buffer, 
not list

Explanation
===========
The first example above demonstrates that if you call set_charset('UTF-
8') on a MIMEMultipart instance before adding child parts then it is 
possible to generate a multipart/* message with an illegal Content-
Transfer-Encoding as specified by RFC 2045[1] "If an entity is
   of type "multipart" the Content-Transfer-Encoding is not permitted to
   have any value other than "7bit", "8bit" or "binary"."

In the second example, I demonstrate that if you try and call 
set_charset after adding child parts, the code exceptions. The user 
should at least be provided with a more targeted exception.

Notes
=====
Where should this be fixed? The smallest fix would be to add a check to 
set_charset to see if it is dealing with with a multipart message but 
as I express in issue1822 I feel the better design would be to move 
this subtype specific logic into the appropriate subclass.

Again, this is something I'm willing to work on in next saturday's bug 
day if I can get some feedback on my architectural concerns.

[1] http://tools.ietf.org/html/rfc2045#section-6.4
msg59922 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-01-14 21:36
I'd like to question whether anything needs to be fixed at all, i.e.
whether it is the responsibility of the email package to reject all
kinds of non-sensical data. Garbage in, garbage out.

Barry, can you take a look?
msg60139 - (view) Author: Jonathan Share (Sharebear) Date: 2008-01-19 10:02
Martin,

I can almost agree with you _if_ I was setting the Content-Transfer-
Encoding myself, however I am not. I am setting the charset and the 
library chooses an appropriate Content-Transfer-Encoding to represent 
the mime part with. Currently I can't see any way other than reading 
the source or writing a test case (and that would require understanding 
what the email.mime module was doing "under the hood") for a developer 
to find out which Content-Transfer-Encoding was going to be used.

Also, just from a usability point of view I would expect that creating 
an invalid mime part would be a little more difficult. Especially 
considering the fix should be as small as adding "if not encoding in 
valid encodings: raise SensibleException".
msg60155 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-01-19 12:44
Please provide an unit test which verifies the bug and a fix for the bug.
msg60189 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2008-01-19 16:28
You're right that this should probably be fixed in the subclass, but you
also have to remember that the parser generally doesn't create subclass
instances.  It only creates instances of Message.  As long as you can
make it work properly with the parser and generator, I'm okay with
overriding set_charset() in the subclass to do the right thing.
msg60193 - (view) Author: Jonathan Share (Sharebear) Date: 2008-01-19 16:48
I'm beginning to realise this is slightly bigger than I first thought
 ;-)

Trying to make a nice test case for this issue, I thought it would be a 
good idea for the parser to register a defect for invalid content-
transfer-encoding so I can test against that in the test case rather 
than fragile substring tests. Unfortunately the parser code isn't the 
easiest code to get your head around on a first look.
msg60208 - (view) Author: Jonathan Share (Sharebear) Date: 2008-01-19 18:40
Run out of time to look at this today. In order to write a nice test 
case for this issue I need the parser to notice this error in messages. 
I've filed issue1874 for the parser not reporting the invalid cte in 
the msg.defects
msg83194 - (view) Author: Chris Withers (cjw296) * (Python committer) Date: 2009-03-05 13:06
Okay, splitting this out a little. I've moved the exception when setting  
character set after adding parts out to [Issue5423].

Here's a simpler example of the problem with setting character sets on 
multiparts:

>>> from email.MIMEMultipart import MIMEMultipart
>>> msg = MIMEMultipart()
>>> msg.set_charset('iso-8859-15')
>>> print msg.as_string()
MIME-Version: 1.0
Content-Type: multipart/mixed; charset="iso-8859-15";
        boundary="===============1300027372=="
Content-Transfer-Encoding: quoted-printable

As a programmer, I don't think I've done anything wrong, but that mail 
is not valid and causes some fussy MTAs to barf and show the message as 
blank.

That said, when would you ever need or want to set the character set on 
a MIMEMultipart? I have this in my code, but I suspect I was just 
sheep/paranoia programming. When would just making set_charset on a 
MIMEMultipart raise an exception cause problems?
msg124761 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-28 04:56
As far as I can tell it is simply wrong per-RFC to put a charset parameter on a mulitpart content-type.  So I think this should, indeed, raise an error on the Multipart subtype.

If someone sets any charset, the CTE is set wrong.  So code that sets charset is already broken, even though tolerant mailers will accept the resulting message (but some mailers won't, as described in this issue).

So, I think set_charset on MIMEMultipart should be deprecated and turned into a no-op in 3.2.
msg124772 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-12-28 09:35
Fine with me to fix this API during beta.
msg348646 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-29 12:01
This issue is not newcomer friendly, I remove the easy keyword.
msg349714 - (view) Author: Michael Anckaert (michaelanckaert) * Date: 2019-08-14 16:12
This issue is still present on Python 3.7 and above. 

As David suggested set_charset could be turned into a no-op on MIMEMultipart. 

I traced set_charset back to inheritance from email.message.Message, would overriding set_charset (and possibly raising a deprecation warning) be an acceptable fix?
msg373841 - (view) Author: Joannah Nanjekye (nanjekyejoannah) * (Python committer) Date: 2020-07-17 17:44
I agree with @Victor. I removed the easy tag on this easy
msg379061 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2020-10-19 23:30
Updating the Python versions to the only active ones on which this bug could conceivably be fixed.  I haven't validated that it's still a problem, and I haven't decided whether it's appropriate to backport to 3.9 and 3.8.

I'll work on a patch and see how it goes.
msg379065 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2020-10-19 23:58
The other question is what to do about `EmailMessage` objects, which don't have a `set_charset()` method.  For now, I'll ignore that.
msg379066 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2020-10-20 00:06
Actually, I think I am going to close this as won't fix, for two reasons.

First, this only potentially affects the legacy API, and second, in Python 3, the error you get when you do it like the original repro example seems obvious to me.

```
>>> mp = MIMEMultipart()
>>> t = MIMEText('sample text')
>>> mp.attach(t)
>>> mp.set_charset('utf-8')
Traceback (most recent call last):
  File "/Users/barry/projects/python/cpython/Lib/email/message.py", line 356, in set_charset
    cte(self)
TypeError: 'str' object is not callable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/barry/projects/python/cpython/Lib/email/message.py", line 364, in set_charset
    payload = payload.encode('ascii', 'surrogateescape')
AttributeError: 'list' object has no attribute 'encode'
```
History
Date User Action Args
2022-04-11 14:56:29adminsetgithub: 46148
2020-10-20 00:06:35barrysetstatus: open -> closed
resolution: wont fix
messages: + msg379066

stage: needs patch -> resolved
2020-10-19 23:58:59barrysetmessages: + msg379065
2020-10-19 23:31:04barrysetassignee: barry
2020-10-19 23:30:58barrysetmessages: + msg379061
versions: + Python 3.8, Python 3.9, Python 3.10, - Python 3.3
2020-09-21 08:51:58vstinnersetnosy: - vstinner
2020-09-19 19:02:11georg.brandlsetnosy: - georg.brandl
2020-07-17 17:44:12nanjekyejoannahsetnosy: + nanjekyejoannah
messages: + msg373841
2020-07-17 17:43:25nanjekyejoannahsetkeywords: - easy
2019-08-14 16:12:41michaelanckaertsetnosy: + michaelanckaert
messages: + msg349714
2019-07-29 12:01:00vstinnersetnosy: + vstinner
messages: + msg348646
2012-05-16 01:58:44r.david.murraysetassignee: r.david.murray -> (no value)
components: + email
2011-03-14 03:32:37r.david.murraysetnosy: loewis, barry, georg.brandl, christian.heimes, cjw296, Sharebear, r.david.murray
versions: + Python 3.3, - Python 3.2
2010-12-28 09:35:40georg.brandlsetnosy: loewis, barry, georg.brandl, christian.heimes, cjw296, Sharebear, r.david.murray
messages: + msg124772
2010-12-28 05:01:54r.david.murraylinkissue5423 superseder
2010-12-28 04:56:22r.david.murraysetversions: - Python 3.1, Python 2.7
nosy: + georg.brandl

messages: + msg124761

keywords: + easy
stage: needs patch
2010-08-04 23:49:20terry.reedysetversions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.5
2010-05-05 13:53:44barrysetassignee: barry -> r.david.murray

nosy: + r.david.murray
2009-03-05 13:06:54cjw296setnosy: + cjw296
messages: + msg83194
2008-01-19 18:40:59Sharebearsetmessages: + msg60208
2008-01-19 16:48:11Sharebearsetmessages: + msg60193
2008-01-19 16:28:05barrysetmessages: + msg60189
2008-01-19 12:44:11christian.heimessetpriority: low
nosy: + christian.heimes
messages: + msg60155
2008-01-19 10:02:41Sharebearsetmessages: + msg60139
2008-01-14 21:36:04loewissetassignee: barry
messages: + msg59922
nosy: + barry, loewis
2008-01-14 11:17:11Sharebearcreate