classification
Title: mboxMessage.get_payload throws TypeError on malformed content type
Type: behavior Stage: test needed
Components: email Versions: Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, enrico, mapreri, r.david.murray, xtreak
Priority: normal Keywords:

Created on 2019-03-04 11:03 by enrico, last changed 2019-04-14 06:40 by xtreak.

Files
File name Uploaded Description Edit
broken.zip enrico, 2019-03-04 11:03
Messages (2)
msg337091 - (view) Author: Enrico Zini (enrico) Date: 2019-03-04 11:03
This simple code:

```
import mailbox

mbox = mailbox.mbox("broken.mbox")
for msg in mbox:
    msg.get_payload()
```

Fails rather unexpectedly:

```
$ python3 broken.py 
Traceback (most recent call last):
  File "broken.py", line 5, in <module>
    msg.get_payload()
  File "/usr/lib/python3.7/email/message.py", line 267, in get_payload
    payload = bpayload.decode(self.get_param('charset', 'ascii'), 'replace')
TypeError: decode() argument 1 must be str, not tuple
```

(I'm attaching a zip with code and mailbox)

I would have expected either that the part past text/plain is ignored if it doesn't make sense, or that content-type is completely ignored.

I have to process a large mailbox archive, and this is currently how I had to work around this issue, and it's causing me to have to skip email content which would otherwise be reasonably accessible:

https://salsa.debian.org/nm-team/echelon/commit/617ce935a31f6256257ffb24e11a5666306406c3
msg340187 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-04-14 06:40
A simplified reproducer as below. The tuple is returned from here https://github.com/python/cpython/blob/830b43d03cc47a27a22a50d777f23c8e60820867/Lib/email/message.py#L93 and perhaps is an untested code path? The charset gets a tuple value of ('utf-8��', '', '"utf-8Â\xa0"') . 


import mailbox
import tempfile

broken_message = """
From list@murphy.debian.org Wed Sep 24 01:22:15 2003
Date: Wed, 24 Sep 2003 07:05:50 +0200
From: Test test <test@example.or>
To: debian-devel-french@lists.debian.org
Subject: Re: Test
Mime-Version: 1.0
Content-Type: text/plain; charset*=utf-8†''utf-8%C2%A0

trés intéressé
"""

with tempfile.NamedTemporaryFile() as f:
    f.write(broken_message.encode())
    f.seek(0)
    msg = mailbox.mbox(f.name)
    for m in msg:
        print(m.get_payload())

$ ../cpython/python.exe bpo36180.py
Traceback (most recent call last):
  File "bpo36180.py", line 21, in <module>
    print(m.get_payload())
  File "/Users/karthikeyansingaravelan/stuff/python/cpython/Lib/email/message.py", line 267, in get_payload
    payload = bpayload.decode(self.get_param('charset', 'ascii'), 'replace')
TypeError: decode() argument 1 must be str, not tuple
sys:1: ResourceWarning: unclosed file <_io.BufferedRandom name='/var/folders/2b/mhgtnnpx4z943t4cc9yvw4qw0000gn/T/tmp4ddavb6g'>
History
Date User Action Args
2019-04-14 06:40:20xtreaksetnosy: + xtreak
messages: + msg340187
2019-03-04 11:53:02SilentGhostsetversions: + Python 3.7, Python 3.8
nosy: + barry, r.david.murray

components: + email
type: behavior
stage: test needed
2019-03-04 11:04:54maprerisetnosy: + mapreri
2019-03-04 11:03:37enricocreate