This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Email parser creates a message object that can't be flattened
Type: behavior Stage: resolved
Components: email Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Johannes Löthberg, barry, cheryl.sabella, diekhans, edeca, maciej.szulik, matrixise, miss-islington, msapiro, r.david.murray
Priority: normal Keywords: patch

Created on 2016-06-14 18:52 by msapiro, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
bad_email msapiro, 2016-06-14 18:52 The problem message
generator.patch msapiro, 2016-06-14 22:54 Suggested fix
Pull Requests
URL Status Linked Edit
PR 1977 closed Johannes Löthberg, 2017-06-06 22:48
PR 18074 merged msapiro, 2020-01-20 03:02
PR 22796 merged miss-islington, 2020-10-19 22:49
PR 22797 merged miss-islington, 2020-10-19 22:49
Messages (22)
msg268580 - (view) Author: Mark Sapiro (msapiro) * (Python triager) Date: 2016-06-14 18:52
The attached file, bad_email, can be parsed via

msg = email.message_from_binary_file(open('bad_email', 'rb'))

but then msg.as_string() prodices the following:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.5/email/message.py", line 159, in as_string
    g.flatten(self, unixfrom=unixfrom)
  File "/usr/lib/python3.5/email/generator.py", line 115, in flatten
    self._write(msg)
  File "/usr/lib/python3.5/email/generator.py", line 189, in _write
    msg.replace_header('content-transfer-encoding', munge_cte[0])
  File "/usr/lib/python3.5/email/message.py", line 559, in replace_header
    raise KeyError(_name)
KeyError: 'content-transfer-encoding'
msg268589 - (view) Author: Mark Sapiro (msapiro) * (Python triager) Date: 2016-06-14 22:54
One additional observation. The original message contained no Content-Transfer-Encoding header even though the message body was raw koi8-r characters. Adding

Content-Transfer-Encoding: 8bit

to the message headers avoids the issue, but that is not a practical solution as the message was Russian spam received by a Mailman list and the resultant KeyError caused problems in Mailman.

We can work on defending against this in Mailman, but I suggest that the munge_cte code in generator._write() avoid the documented possible KeyError raised by replace_header() by using __delitem__() and __setitem__() instead as in the attached generator.patch.
msg279172 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2016-10-22 00:21
While I was reviewing https://gitlab.com/mailman/mailman/merge_requests/197/diffs I noticed the KeyError and it made me thing "hmm, I wonder if this should be turned into one of the email package errors"?
msg295080 - (view) Author: Johannes Löthberg (Johannes Löthberg) * Date: 2017-06-03 15:24
Any updates on this?  I'm having the same problem with some non-spam emails while trying to use some mail-handling tools written in Python.
msg295093 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-06-03 18:10
replace_header has a different semantic than del-and-set (replace_header leaves the header in the same location in the list, rather than appending it to the end...that's it's purpose).  If replace_header is throwing a key error, then I guess we need a look-before-you-leap if statement.

And a test :)

Note that a correct fix would preserve the broken input email, but since we're obviously already doing munging I'm fine with just making this work for now.
msg295101 - (view) Author: Mark Sapiro (msapiro) * (Python triager) Date: 2017-06-03 19:58
I considered look before you leap, but I decided since we're munging the headers anyway, preserving their order is not that critical, but the patch is easy enough. I'll work on that and a test.
msg295105 - (view) Author: Johannes Löthberg (Johannes Löthberg) * Date: 2017-06-03 20:59
Fix: https://github.com/kyrias/cpython/commit/a986a8274a522c73d87360da6930e632a3eb4ebb
Testcase: https://github.com/kyrias/cpython/commit/9a510426522e1d714cd0ea238b14de0fc76862b2

Can start a PR once my CLA signature goes through I guess.
msg295107 - (view) Author: Mark Sapiro (msapiro) * (Python triager) Date: 2017-06-03 21:25
It looks like Johannes beat me to it. Thanks for that, but see my comments in the diff at https://github.com/kyrias/cpython/commit/a986a8274a522c73d87360da6930e632a3eb4ebb
msg295108 - (view) Author: Johannes Löthberg (Johannes Löthberg) * Date: 2017-06-03 22:05
Ah, didn't even see your comment before I did it!  Fix to the comments are on the same branch, will be rebased before PR is up.
msg295430 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2017-06-08 12:35
maybe we could merge the PR, and I could propose a backport for 3.5 and 3.6.

2.7 is affected ?
msg295442 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-06-08 14:20
I'm going to try to review this this weekend.
msg295443 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2017-06-08 14:48
ok, I have tested on 3.6 and 3.5, just with the test. and in this case, we get the errors on both.
if we apply the patch of Johannes, the test passes and there is no issues.

+1 the backports for 3.5 and 3.6 is just a git cherry-picking.
msg296293 - (view) Author: Johannes Löthberg (Johannes Löthberg) * Date: 2017-06-18 20:02
Ping?
msg310513 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-01-23 17:33
Note: I reviewed this a while ago but the review comments haven't been addressed.
msg311272 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-01-30 13:52
Requested a small additional change to the new tests, and then this will be ready to go in.
msg327136 - (view) Author: David Cannings (edeca) Date: 2018-10-05 14:00
Ping on an ETA for this fix?
msg336930 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2019-03-01 16:43
@r.david.murray, it appears that all your requested changes have been addressed on the PR.  Please re-review this when you get a chance.  Thanks!
msg378083 - (view) Author: Mark Diekhans (diekhans) Date: 2020-10-06 01:32
any chance of getting this merged?  A work-around is not obvious
msg378088 - (view) Author: Mark Sapiro (msapiro) * (Python triager) Date: 2020-10-06 02:54
I work around it with
```
class Message(email.message.Message):

    def as_string(self):
        # Work around for https://bugs.python.org/issue27321 and
        # https://bugs.python.org/issue32330.
        try:
            value = email.message.Message.as_string(self)
        except (KeyError, LookupError, UnicodeEncodeError):
            value = email.message.Message.as_bytes(self).decode(
                'ascii', 'replace')
        # Also ensure no unicode surrogates in the returned string.
        return email.utils._sanitize(value)
```
This is easy for me because it's Mailman which already subclasses email.message.Message for other reasons. It is perhaps more difficult if you aren't already subclassing email.message.Message for other purposes.
msg379052 - (view) Author: miss-islington (miss-islington) Date: 2020-10-19 22:49
New changeset bf838227c35212709dc43b3c3c57f8e1655c1d24 by Mark Sapiro in branch 'master':
bpo-27321 Fix email.generator.py to not replace a non-existent header. (GH-18074)
https://github.com/python/cpython/commit/bf838227c35212709dc43b3c3c57f8e1655c1d24
msg379056 - (view) Author: miss-islington (miss-islington) Date: 2020-10-19 23:07
New changeset 371146a3f8a989964e2a9c0efc7d776815410fac by Miss Skeleton (bot) in branch '3.8':
bpo-27321 Fix email.generator.py to not replace a non-existent header. (GH-18074)
https://github.com/python/cpython/commit/371146a3f8a989964e2a9c0efc7d776815410fac
msg379057 - (view) Author: miss-islington (miss-islington) Date: 2020-10-19 23:11
New changeset 72ce82abcf9051b18a05350936de7ecab7306662 by Miss Skeleton (bot) in branch '3.9':
bpo-27321 Fix email.generator.py to not replace a non-existent header. (GH-18074)
https://github.com/python/cpython/commit/72ce82abcf9051b18a05350936de7ecab7306662
History
Date User Action Args
2022-04-11 14:58:32adminsetgithub: 71508
2020-10-19 23:11:42miss-islingtonsetmessages: + msg379057
2020-10-19 23:07:26miss-islingtonsetmessages: + msg379056
2020-10-19 23:02:01barrysetstatus: open -> closed
stage: patch review -> resolved
resolution: fixed
versions: + Python 3.9, Python 3.10, - Python 3.7
2020-10-19 22:49:44miss-islingtonsetpull_requests: + pull_request21753
2020-10-19 22:49:35miss-islingtonsetpull_requests: + pull_request21752
2020-10-19 22:49:22miss-islingtonsetnosy: + miss-islington
messages: + msg379052
2020-10-06 02:54:38msapirosetmessages: + msg378088
2020-10-06 01:32:25diekhanssetnosy: + diekhans
messages: + msg378083
2020-01-20 03:02:16msapirosetpull_requests: + pull_request17467
2019-03-01 16:43:46cheryl.sabellasetnosy: + cheryl.sabella

messages: + msg336930
versions: - Python 3.6
2018-11-03 12:57:37matrixisesetversions: + Python 3.8, - Python 3.5
2018-10-05 14:00:41edecasetnosy: + edeca
messages: + msg327136
2018-01-30 13:52:58r.david.murraysetmessages: + msg311272
2018-01-23 17:33:32r.david.murraysetmessages: + msg310513
2018-01-23 10:28:23martin.panterlinkissue32634 superseder
2017-06-18 20:02:09Johannes Löthbergsetmessages: + msg296293
2017-06-08 14:48:36matrixisesetmessages: + msg295443
2017-06-08 14:20:53r.david.murraysetmessages: + msg295442
2017-06-08 12:35:18matrixisesetstage: needs patch -> patch review
2017-06-08 12:35:05matrixisesetnosy: + matrixise
messages: + msg295430
2017-06-06 22:48:03Johannes Löthbergsetpull_requests: + pull_request2043
2017-06-03 22:05:29Johannes Löthbergsetmessages: + msg295108
2017-06-03 21:25:35msapirosetmessages: + msg295107
2017-06-03 20:59:48Johannes Löthbergsetmessages: + msg295105
2017-06-03 19:58:24msapirosetmessages: + msg295101
2017-06-03 18:11:07r.david.murraysetstage: needs patch
type: behavior
versions: + Python 3.6, Python 3.7, - Python 3.4
2017-06-03 18:10:12r.david.murraysetmessages: + msg295093
2017-06-03 15:24:32Johannes Löthbergsetnosy: + Johannes Löthberg
messages: + msg295080
2016-10-22 00:21:27barrysetmessages: + msg279172
2016-06-14 22:54:21msapirosetfiles: + generator.patch
keywords: + patch
messages: + msg268589
2016-06-14 21:22:59maciej.szuliksetnosy: + maciej.szulik
2016-06-14 18:52:52msapirocreate