Issue 32330: Email parser creates a message object that can't be flattened

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/76511

classification

Title:	Email parser creates a message object that can't be flattened
Type:	behavior	Stage:	patch review
Components:	email	Versions:	Python 3.9, Python 3.8, Python 3.7, Python 3.6, Python 3.5

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	barry, msapiro, r.david.murray
Priority:	normal	Keywords:	patch

Created on 2017-12-15 00:25 by msapiro, last changed 2022-04-11 14:58 by admin.

Files
File name	Uploaded	Description	Edit
bad_email_2.eml	msapiro, 2017-12-15 00:25	Sample message triggering issue

Pull Requests
URL	Status	Linked	Edit
PR 18059	open	msapiro, 2020-01-19 06:38

Messages (5)
msg308353 - (view)	Author: Mark Sapiro (msapiro) *	Date: 2017-12-15 00:25
This is related to https://bugs.python.org/issue27321 but a different exception is thrown for a different reason. This is caused by a defective spam message. I don't actually have the offending message from the wild, but the attached bad_email_2.eml illustrates the problem. The defect is the message declares the content charset as us-ascii, but the body contains non-ascii. When the message is parsed into an email.message.Message object and the objects as_string() method is called, UnicodeEncodeError is thrown as follows: >>> import email >>> with open('bad_email_2.eml', 'rb') as fp: ... msg = email.message_from_binary_file(fp) ... >>> msg.as_string() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.5/email/message.py", line 159, in as_string g.flatten(self, unixfrom=unixfrom) File "/usr/lib/python3.5/email/generator.py", line 115, in flatten self._write(msg) File "/usr/lib/python3.5/email/generator.py", line 181, in _write self._dispatch(msg) File "/usr/lib/python3.5/email/generator.py", line 214, in _dispatch meth(msg) File "/usr/lib/python3.5/email/generator.py", line 243, in _handle_text msg.set_payload(payload, charset) File "/usr/lib/python3.5/email/message.py", line 316, in set_payload payload = payload.encode(charset.output_charset) UnicodeEncodeError: 'ascii' codec can't encode characters in position 31-33: ordinal not in range(128)
msg308361 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2017-12-15 02:34
What would you like to see happen in that situation? Should we use errors=replace like we do for headers? (That seems reasonable to me.) Note that it can be re-serialized as binary.
msg308362 - (view)	Author: Mark Sapiro (msapiro) *	Date: 2017-12-15 03:23
Yes. I think errors=replace is a good solution. In Mailman, we have our own mailman.email.message.Message class which is a subclass of email.message.Message and what we do to work around this and issue27321 is override as_string() with: def as_string(self): # Work around for https://bugs.python.org/issue27321 and # https://bugs.python.org/issue32330. try: value = email.message.Message.as_string(self) except (KeyError, UnicodeEncodeError): value = email.message.Message.as_bytes(self).decode( 'ascii', 'replace') return value
msg308395 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2017-12-15 14:40
I do wonder where you are using the string version of messages :) I actually thought I'd already done this (errors=replace), but obviously not. I don't have time now to work on a patch for this, and the patch in the other issue hasn't be updated to reflect the review I did :(
msg308421 - (view)	Author: Mark Sapiro (msapiro) *	Date: 2017-12-15 19:16
> I do wonder where you are using the string version of messages :) Probably some places where we could use bytes, but one of the problem areas is where we save the content of a message held for moderation.

History
Date	User	Action	Args
2022-04-11 14:58:55	admin	set	github: 76511
2020-01-19 06:38:24	msapiro	set	keywords: + patch stage: patch review pull_requests: + pull_request17453
2020-01-19 06:34:55	msapiro	set	versions: + Python 3.7, Python 3.8, Python 3.9
2017-12-15 19:16:50	msapiro	set	messages: + msg308421
2017-12-15 14:40:10	r.david.murray	set	messages: + msg308395
2017-12-15 03:23:27	msapiro	set	messages: + msg308362
2017-12-15 02:34:11	r.david.murray	set	messages: + msg308361
2017-12-15 00:25:27	msapiro	create