I have noticed an issue (Python 3.8.5) where an email can be read in as bytes, but will not be returned as such with the as_bytes() call if the message is multipart, has a boundary which is not properly quoted, and the message has non-ascii text. It seems to be a result of how multipart messages are treated if the NoBoundaryInMultipartDefect is encountered [See Test #1].
I would argue that attempting to output the test message in the sample script with an 8bit, utf-8 enabled policy should return the original bytes as the 8bit policy should be applied to the "body" portion (any part after the null lines) of the email (as would be the case if it were not multipart) [See Test #4]
Currently it appears that the entire message is treated as headers, applying the strict 7bit, ascii requirement to the entire contents of the message. Furthermore, the msg.preamble is None.
I am also uncertain that attempting to leverage the handle_defect hooks would be helpful as correcting the boundary doesn't seem to work unless you re-parse the message [See Tests #2 and #3]
So the requested change would be to apply the encoding output policy to all portions of a message after the null line ending the headers.
|