Message 361469 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	msapiro
Recipients	barry, msapiro, r.david.murray
Date	2020-02-06.04:30:03
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1580963403.86.0.928943285577.issue39384@roundup.psfhosted.org>
In-reply-to

Content
I've researched this further, and I know how this happens. The original message contains a text/html part (in my case, the only part) which contains a base64 or quoted-printable body which when decoded contains non-ascii. It is parsed correctly by email.message_from_bytes. It is then processed by Mailman's content filtering which retrieves html payload via part.get_payload(decode=True).decode(ctype, errors='replace')) where part is the text/html part and ctype is 'utf-8' in this case. It then uses elinks, lynx or some other configured command to convert the html payload to plain text and that plain text still contains non-ascii. It then replaces the payload and sets the content type via del part['content-transfer-encoding'] part.set_payload(plain_text) part.set_type('text/plain') And this results in a message which can't be flattened as_bytes. The issue is set_payload() should encode the payload appropriately and in fact, it does if an appropriate charset is given, so this is our error in not providing a charset= argument to set_payload. Closing this and the corresponding PR.

I've researched this further, and I know how this happens. The original message contains a text/html part (in my case, the only part) which contains a base64 or quoted-printable body which when decoded contains non-ascii. It is parsed correctly by email.message_from_bytes.

It is then processed by Mailman's content filtering which retrieves html payload via

    part.get_payload(decode=True).decode(ctype, errors='replace'))

where part is the text/html part and ctype is 'utf-8' in this case. It then uses elinks, lynx or some other configured command to convert the html payload to plain text and that plain text still contains non-ascii.

It then replaces the payload and sets the content type via

    del part['content-transfer-encoding']
    part.set_payload(plain_text)
    part.set_type('text/plain')

And this results in a message which can't be flattened as_bytes.

The issue is set_payload() should encode the payload appropriately and in fact, it does if an appropriate charset is given, so this is our error in not providing a charset= argument to set_payload.

Closing this and the corresponding PR.

History
Date	User	Action	Args
2020-02-06 04:30:03	msapiro	set	recipients: + msapiro, barry, r.david.murray
2020-02-06 04:30:03	msapiro	set	messageid: <1580963403.86.0.928943285577.issue39384@roundup.psfhosted.org>
2020-02-06 04:30:03	msapiro	link	issue39384 messages
2020-02-06 04:30:03	msapiro	create