Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to set invalid Content-Transfer-Encoding on email.mime.multipart.MIMEMultipart #46148

Closed
Sharebear mannequin opened this issue Jan 14, 2008 · 16 comments
Closed
Assignees
Labels
3.8 only security fixes 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir topic-email type-bug An unexpected behavior, bug, or error

Comments

@Sharebear
Copy link
Mannequin

Sharebear mannequin commented Jan 14, 2008

BPO 1823
Nosy @loewis, @warsaw, @tiran, @cjw296, @bitdancer, @MichaelAnckaert, @nanjekyejoannah

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/warsaw'
closed_at = <Date 2020-10-20.00:06:35.388>
created_at = <Date 2008-01-14.11:17:11.916>
labels = ['type-bug', '3.8', 'expert-email', '3.10', 'library', '3.9']
title = 'Possible to set invalid Content-Transfer-Encoding on email.mime.multipart.MIMEMultipart'
updated_at = <Date 2020-10-20.00:06:35.388>
user = 'https://bugs.python.org/Sharebear'

bugs.python.org fields:

activity = <Date 2020-10-20.00:06:35.388>
actor = 'barry'
assignee = 'barry'
closed = True
closed_date = <Date 2020-10-20.00:06:35.388>
closer = 'barry'
components = ['Library (Lib)', 'email']
creation = <Date 2008-01-14.11:17:11.916>
creator = 'Sharebear'
dependencies = []
files = []
hgrepos = []
issue_num = 1823
keywords = []
message_count = 16.0
messages = ['59896', '59922', '60139', '60155', '60189', '60193', '60208', '83194', '124761', '124772', '348646', '349714', '373841', '379061', '379065', '379066']
nosy_count = 8.0
nosy_names = ['loewis', 'barry', 'christian.heimes', 'cjw296', 'Sharebear', 'r.david.murray', 'michaelanckaert', 'nanjekyejoannah']
pr_nums = []
priority = 'low'
resolution = 'wont fix'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue1823'
versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

@Sharebear
Copy link
Mannequin Author

Sharebear mannequin commented Jan 14, 2008

Steps to Reproduce
==================

>>> from email.mime.multipart import MIMEMultipart
>>> from email.mime.text import MIMEText
>>> multipart = MIMEMultipart()
>>> multipart.set_charset('UTF-8')
>>> text = MIMEText("sample text")
>>> multipart.attach(text)
>>> print multipart.as_string()
MIME-Version: 1.0
Content-Type: multipart/mixed; charset="utf-8";
        boundary="===============0973828728=="
Content-Transfer-Encoding: base64

--===============0973828728==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit

sample text
--===============0973828728==--
>>> multipart = MIMEMultipart()
>>> multipart.attach(text)
>>> multipart.set_charset('UTF-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
email/message.py", line 262, in set_charset
    self._payload = charset.body_encode(self._payload)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
email/charset.py", line 384, in body_encode
    return email.base64mime.body_encode(s)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
email/base64mime.py", line 148, in encode
    enc = b2a_base64(s[i:i + max_unencoded])
TypeError: b2a_base64() argument 1 must be string or read-only buffer, 
not list

Explanation
===========
The first example above demonstrates that if you call set_charset('UTF-
8') on a MIMEMultipart instance before adding child parts then it is
possible to generate a multipart/* message with an illegal Content-
Transfer-Encoding as specified by RFC 2045[1] "If an entity is
of type "multipart" the Content-Transfer-Encoding is not permitted to
have any value other than "7bit", "8bit" or "binary"."

In the second example, I demonstrate that if you try and call
set_charset after adding child parts, the code exceptions. The user
should at least be provided with a more targeted exception.

Notes
=====
Where should this be fixed? The smallest fix would be to add a check to
set_charset to see if it is dealing with with a multipart message but
as I express in bpo-1822 I feel the better design would be to move
this subtype specific logic into the appropriate subclass.

Again, this is something I'm willing to work on in next saturday's bug
day if I can get some feedback on my architectural concerns.

[1] http://tools.ietf.org/html/rfc2045#section-6.4

@Sharebear Sharebear mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jan 14, 2008
@loewis
Copy link
Mannequin

loewis mannequin commented Jan 14, 2008

I'd like to question whether anything needs to be fixed at all, i.e.
whether it is the responsibility of the email package to reject all
kinds of non-sensical data. Garbage in, garbage out.

Barry, can you take a look?

@loewis loewis mannequin assigned warsaw Jan 14, 2008
@Sharebear
Copy link
Mannequin Author

Sharebear mannequin commented Jan 19, 2008

Martin,

I can almost agree with you _if_ I was setting the Content-Transfer-
Encoding myself, however I am not. I am setting the charset and the
library chooses an appropriate Content-Transfer-Encoding to represent
the mime part with. Currently I can't see any way other than reading
the source or writing a test case (and that would require understanding
what the email.mime module was doing "under the hood") for a developer
to find out which Content-Transfer-Encoding was going to be used.

Also, just from a usability point of view I would expect that creating
an invalid mime part would be a little more difficult. Especially
considering the fix should be as small as adding "if not encoding in
valid encodings: raise SensibleException".

@tiran
Copy link
Member

tiran commented Jan 19, 2008

Please provide an unit test which verifies the bug and a fix for the bug.

@warsaw
Copy link
Member

warsaw commented Jan 19, 2008

You're right that this should probably be fixed in the subclass, but you
also have to remember that the parser generally doesn't create subclass
instances. It only creates instances of Message. As long as you can
make it work properly with the parser and generator, I'm okay with
overriding set_charset() in the subclass to do the right thing.

@Sharebear
Copy link
Mannequin Author

Sharebear mannequin commented Jan 19, 2008

I'm beginning to realise this is slightly bigger than I first thought
;-)

Trying to make a nice test case for this issue, I thought it would be a
good idea for the parser to register a defect for invalid content-
transfer-encoding so I can test against that in the test case rather
than fragile substring tests. Unfortunately the parser code isn't the
easiest code to get your head around on a first look.

@Sharebear
Copy link
Mannequin Author

Sharebear mannequin commented Jan 19, 2008

Run out of time to look at this today. In order to write a nice test
case for this issue I need the parser to notice this error in messages.
I've filed bpo-1874 for the parser not reporting the invalid cte in
the msg.defects

@cjw296
Copy link
Contributor

cjw296 commented Mar 5, 2009

Okay, splitting this out a little. I've moved the exception when setting
character set after adding parts out to [bpo-5423].

Here's a simpler example of the problem with setting character sets on
multiparts:

>>> from email.MIMEMultipart import MIMEMultipart
>>> msg = MIMEMultipart()
>>> msg.set_charset('iso-8859-15')
>>> print msg.as_string()
MIME-Version: 1.0
Content-Type: multipart/mixed; charset="iso-8859-15";
        boundary="===============1300027372=="
Content-Transfer-Encoding: quoted-printable

As a programmer, I don't think I've done anything wrong, but that mail
is not valid and causes some fussy MTAs to barf and show the message as
blank.

That said, when would you ever need or want to set the character set on
a MIMEMultipart? I have this in my code, but I suspect I was just
sheep/paranoia programming. When would just making set_charset on a
MIMEMultipart raise an exception cause problems?

@warsaw warsaw assigned bitdancer and unassigned warsaw May 5, 2010
@bitdancer
Copy link
Member

As far as I can tell it is simply wrong per-RFC to put a charset parameter on a mulitpart content-type. So I think this should, indeed, raise an error on the Multipart subtype.

If someone sets any charset, the CTE is set wrong. So code that sets charset is already broken, even though tolerant mailers will accept the resulting message (but some mailers won't, as described in this issue).

So, I think set_charset on MIMEMultipart should be deprecated and turned into a no-op in 3.2.

@bitdancer bitdancer added the easy label Dec 28, 2010
@birkenfeld
Copy link
Member

Fine with me to fix this API during beta.

@bitdancer bitdancer removed their assignment May 16, 2012
@vstinner
Copy link
Member

This issue is not newcomer friendly, I remove the easy keyword.

@MichaelAnckaert
Copy link
Mannequin

MichaelAnckaert mannequin commented Aug 14, 2019

This issue is still present on Python 3.7 and above.

As David suggested set_charset could be turned into a no-op on MIMEMultipart.

I traced set_charset back to inheritance from email.message.Message, would overriding set_charset (and possibly raising a deprecation warning) be an acceptable fix?

@nanjekyejoannah
Copy link
Member

I agree with @victor. I removed the easy tag on this easy

@warsaw
Copy link
Member

warsaw commented Oct 19, 2020

Updating the Python versions to the only active ones on which this bug could conceivably be fixed. I haven't validated that it's still a problem, and I haven't decided whether it's appropriate to backport to 3.9 and 3.8.

I'll work on a patch and see how it goes.

@warsaw warsaw added 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes labels Oct 19, 2020
@warsaw warsaw self-assigned this Oct 19, 2020
@warsaw
Copy link
Member

warsaw commented Oct 19, 2020

The other question is what to do about EmailMessage objects, which don't have a set_charset() method. For now, I'll ignore that.

@warsaw
Copy link
Member

warsaw commented Oct 20, 2020

Actually, I think I am going to close this as won't fix, for two reasons.

First, this only potentially affects the legacy API, and second, in Python 3, the error you get when you do it like the original repro example seems obvious to me.

>>> mp = MIMEMultipart()
>>> t = MIMEText('sample text')
>>> mp.attach(t)
>>> mp.set_charset('utf-8')
Traceback (most recent call last):
  File "/Users/barry/projects/python/cpython/Lib/email/message.py", line 356, in set_charset
    cte(self)
TypeError: 'str' object is not callable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/barry/projects/python/cpython/Lib/email/message.py", line 364, in set_charset
    payload = payload.encode('ascii', 'surrogateescape')
AttributeError: 'list' object has no attribute 'encode'

@warsaw warsaw closed this as completed Oct 20, 2020
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.8 only security fixes 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir topic-email type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

7 participants