classification
Title: Email parser creates a message object that can't be flattened
Type: behavior Stage: patch review
Components: email Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Johannes Löthberg, barry, edeca, maciej.szulik, matrixise, msapiro, r.david.murray
Priority: normal Keywords: patch

Created on 2016-06-14 18:52 by msapiro, last changed 2018-11-03 12:57 by matrixise.

Files
File name Uploaded Description Edit
bad_email msapiro, 2016-06-14 18:52 The problem message
generator.patch msapiro, 2016-06-14 22:54 Suggested fix
Pull Requests
URL Status Linked Edit
PR 1977 open Johannes Löthberg, 2017-06-06 22:48
Messages (16)
msg268580 - (view) Author: Mark Sapiro (msapiro) * Date: 2016-06-14 18:52
The attached file, bad_email, can be parsed via

msg = email.message_from_binary_file(open('bad_email', 'rb'))

but then msg.as_string() prodices the following:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.5/email/message.py", line 159, in as_string
    g.flatten(self, unixfrom=unixfrom)
  File "/usr/lib/python3.5/email/generator.py", line 115, in flatten
    self._write(msg)
  File "/usr/lib/python3.5/email/generator.py", line 189, in _write
    msg.replace_header('content-transfer-encoding', munge_cte[0])
  File "/usr/lib/python3.5/email/message.py", line 559, in replace_header
    raise KeyError(_name)
KeyError: 'content-transfer-encoding'
msg268589 - (view) Author: Mark Sapiro (msapiro) * Date: 2016-06-14 22:54
One additional observation. The original message contained no Content-Transfer-Encoding header even though the message body was raw koi8-r characters. Adding

Content-Transfer-Encoding: 8bit

to the message headers avoids the issue, but that is not a practical solution as the message was Russian spam received by a Mailman list and the resultant KeyError caused problems in Mailman.

We can work on defending against this in Mailman, but I suggest that the munge_cte code in generator._write() avoid the documented possible KeyError raised by replace_header() by using __delitem__() and __setitem__() instead as in the attached generator.patch.
msg279172 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2016-10-22 00:21
While I was reviewing https://gitlab.com/mailman/mailman/merge_requests/197/diffs I noticed the KeyError and it made me thing "hmm, I wonder if this should be turned into one of the email package errors"?
msg295080 - (view) Author: Johannes Löthberg (Johannes Löthberg) * Date: 2017-06-03 15:24
Any updates on this?  I'm having the same problem with some non-spam emails while trying to use some mail-handling tools written in Python.
msg295093 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-06-03 18:10
replace_header has a different semantic than del-and-set (replace_header leaves the header in the same location in the list, rather than appending it to the end...that's it's purpose).  If replace_header is throwing a key error, then I guess we need a look-before-you-leap if statement.

And a test :)

Note that a correct fix would preserve the broken input email, but since we're obviously already doing munging I'm fine with just making this work for now.
msg295101 - (view) Author: Mark Sapiro (msapiro) * Date: 2017-06-03 19:58
I considered look before you leap, but I decided since we're munging the headers anyway, preserving their order is not that critical, but the patch is easy enough. I'll work on that and a test.
msg295105 - (view) Author: Johannes Löthberg (Johannes Löthberg) * Date: 2017-06-03 20:59
Fix: https://github.com/kyrias/cpython/commit/a986a8274a522c73d87360da6930e632a3eb4ebb
Testcase: https://github.com/kyrias/cpython/commit/9a510426522e1d714cd0ea238b14de0fc76862b2

Can start a PR once my CLA signature goes through I guess.
msg295107 - (view) Author: Mark Sapiro (msapiro) * Date: 2017-06-03 21:25
It looks like Johannes beat me to it. Thanks for that, but see my comments in the diff at https://github.com/kyrias/cpython/commit/a986a8274a522c73d87360da6930e632a3eb4ebb
msg295108 - (view) Author: Johannes Löthberg (Johannes Löthberg) * Date: 2017-06-03 22:05
Ah, didn't even see your comment before I did it!  Fix to the comments are on the same branch, will be rebased before PR is up.
msg295430 - (view) Author: Stéphane Wirtel (matrixise) * (Python triager) Date: 2017-06-08 12:35
maybe we could merge the PR, and I could propose a backport for 3.5 and 3.6.

2.7 is affected ?
msg295442 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-06-08 14:20
I'm going to try to review this this weekend.
msg295443 - (view) Author: Stéphane Wirtel (matrixise) * (Python triager) Date: 2017-06-08 14:48
ok, I have tested on 3.6 and 3.5, just with the test. and in this case, we get the errors on both.
if we apply the patch of Johannes, the test passes and there is no issues.

+1 the backports for 3.5 and 3.6 is just a git cherry-picking.
msg296293 - (view) Author: Johannes Löthberg (Johannes Löthberg) * Date: 2017-06-18 20:02
Ping?
msg310513 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-01-23 17:33
Note: I reviewed this a while ago but the review comments haven't been addressed.
msg311272 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-01-30 13:52
Requested a small additional change to the new tests, and then this will be ready to go in.
msg327136 - (view) Author: David Cannings (edeca) Date: 2018-10-05 14:00
Ping on an ETA for this fix?
History
Date User Action Args
2018-11-03 12:57:37matrixisesetversions: + Python 3.8, - Python 3.5
2018-10-05 14:00:41edecasetnosy: + edeca
messages: + msg327136
2018-01-30 13:52:58r.david.murraysetmessages: + msg311272
2018-01-23 17:33:32r.david.murraysetmessages: + msg310513
2018-01-23 10:28:23martin.panterlinkissue32634 superseder
2017-06-18 20:02:09Johannes Löthbergsetmessages: + msg296293
2017-06-08 14:48:36matrixisesetmessages: + msg295443
2017-06-08 14:20:53r.david.murraysetmessages: + msg295442
2017-06-08 12:35:18matrixisesetstage: needs patch -> patch review
2017-06-08 12:35:05matrixisesetnosy: + matrixise
messages: + msg295430
2017-06-06 22:48:03Johannes Löthbergsetpull_requests: + pull_request2043
2017-06-03 22:05:29Johannes Löthbergsetmessages: + msg295108
2017-06-03 21:25:35msapirosetmessages: + msg295107
2017-06-03 20:59:48Johannes Löthbergsetmessages: + msg295105
2017-06-03 19:58:24msapirosetmessages: + msg295101
2017-06-03 18:11:07r.david.murraysetstage: needs patch
type: behavior
versions: + Python 3.6, Python 3.7, - Python 3.4
2017-06-03 18:10:12r.david.murraysetmessages: + msg295093
2017-06-03 15:24:32Johannes Löthbergsetnosy: + Johannes Löthberg
messages: + msg295080
2016-10-22 00:21:27barrysetmessages: + msg279172
2016-06-14 22:54:21msapirosetfiles: + generator.patch
keywords: + patch
messages: + msg268589
2016-06-14 21:22:59maciej.szuliksetnosy: + maciej.szulik
2016-06-14 18:52:52msapirocreate