Message 400764 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	anarcat
Recipients	anarcat, barry, r.david.murray
Date	2021-08-31.18:04:32
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1630433072.94.0.61440036856.issue45066@roundup.psfhosted.org>
In-reply-to

Content
If an email message has a message/rfc822 part and that part is quoted-printable encoded, Python freaks out. Consider this code: import email.parser import email.policy # python 3.9.2 cannot decode this message, it fails with # "email.errors.StartBoundaryNotFoundDefect" mail = """Mime-Version: 1.0 Content-Type: multipart/report; boundary=aaaaaa Content-Transfer-Encoding: 7bit --aaaaaa Content-Type: message/rfc822 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline MIME-Version: 1.0 Content-Type: multipart/alternative; boundary=3D"=3Dbbbbbb" --=3Dbbbbbb Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=3Dutf-8 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx= x --=3Dbbbbbb-- --aaaaaa-- """ msg_abuse = email.parser.Parser(policy=email.policy.default + email.policy.strict).parsestr(mail) That crashes with: email.errors.StartBoundaryNotFoundDefect This should normally work: the sub-message is valid, assuming you decode the content. But if you do not, you end up in this bizarre situation, because the multipart boundary is probably considered to be something like `3D"=3Dbbbbbb"`, and of course the above code crashes with the above exception. If you remove the quoted-printable part from the equation, the parser actually behaves: import email.parser import email.policy # python 3.9.2 cannot decode this message, it fails with # "email.errors.StartBoundaryNotFoundDefect" mail = """Mime-Version: 1.0 Content-Type: multipart/report; boundary=aaaaaa Content-Transfer-Encoding: 7bit --aaaaaa Content-Type: message/rfc822 Content-Disposition: inline MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="=bbbbbb" --=bbbbbb Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=utf-8 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx --=bbbbbb-- --aaaaaa-- """ msg_abuse = email.parser.Parser(policy=email.policy.default + email.policy.strict).parsestr(mail) The above correctly parses the message. This problem causes all sorts of weird issues. In one real-world example, it would just stop parsing headers inside the email because long lines in headers (typical in Received-by headers) would get broken up... So it would not actually fail completely. Or, to be more accurate, by default (ie. if you do not use strict), it does not crash and instead produces invalid data (e.g. a message without a Message-ID or From). On most messages that are encoded this way, the strict mode will actually fail with: email.errors.MissingHeaderBodySeparatorDefect because it will stumble upon a header line that should be a continuation but instead is treated like a full header line, so it's missing a colon (":").

If an email message has a message/rfc822 part *and* that part is
quoted-printable encoded, Python freaks out.

Consider this code:

import email.parser
import email.policy

# python 3.9.2 cannot decode this message, it fails with
# "email.errors.StartBoundaryNotFoundDefect"

mail = """Mime-Version: 1.0
Content-Type: multipart/report;
 boundary=aaaaaa
Content-Transfer-Encoding: 7bit


--aaaaaa
Content-Type: message/rfc822
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary=3D"=3Dbbbbbb"


--=3Dbbbbbb
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=3Dutf-8

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=
x

--=3Dbbbbbb--

--aaaaaa--
"""

msg_abuse = email.parser.Parser(policy=email.policy.default + email.policy.strict).parsestr(mail)

That crashes with: email.errors.StartBoundaryNotFoundDefect

This should normally work: the sub-message is valid, assuming you
decode the content. But if you do not, you end up in this bizarre
situation, because the multipart boundary is probably considered to be
something like `3D"=3Dbbbbbb"`, and of course the above code crashes
with the above exception.

If you remove the quoted-printable part from the equation, the parser actually behaves:

import email.parser
import email.policy

# python 3.9.2 cannot decode this message, it fails with
# "email.errors.StartBoundaryNotFoundDefect"

mail = """Mime-Version: 1.0
Content-Type: multipart/report;
 boundary=aaaaaa
Content-Transfer-Encoding: 7bit


--aaaaaa
Content-Type: message/rfc822
Content-Disposition: inline

MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary="=bbbbbb"


--=bbbbbb
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf-8

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

--=bbbbbb--

--aaaaaa--
"""

msg_abuse = email.parser.Parser(policy=email.policy.default + email.policy.strict).parsestr(mail)

The above correctly parses the message.

This problem causes all sorts of weird issues. In one real-world
example, it would just stop parsing headers inside the email because
long lines in headers (typical in Received-by headers) would get
broken up... So it would not actually fail completely. Or, to be more
accurate, by *default* (ie. if you do not use strict), it does not
crash and instead produces invalid data (e.g. a message without a
Message-ID or From).

On most messages that are encoded this way, the strict mode will
actually fail with: email.errors.MissingHeaderBodySeparatorDefect
because it will stumble upon a header line that should be a
continuation but instead is treated like a full header line, so it's
missing a colon (":").

History
Date	User	Action	Args
2021-08-31 18:04:32	anarcat	set	recipients: + anarcat, barry, r.david.murray
2021-08-31 18:04:32	anarcat	set	messageid: <1630433072.94.0.61440036856.issue45066@roundup.psfhosted.org>
2021-08-31 18:04:32	anarcat	link	issue45066 messages
2021-08-31 18:04:32	anarcat	create