Message 106100 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	akuchling
Recipients	akuchling, barry
Date	2010-05-19.20:55:17
SpamBayes Score	0.0011325997
Marked as misclassified	No
Message-id	<1274302519.83.0.146401120937.issue8769@psf.upfronthosting.co.za>
In-reply-to

Content
The attached test program shows how parsing an e-mail message with the email package, then converting the resulting message to a string, fails to round-trip properly. Instead it breaks the encoding of the subject line. The root of the problem: the subject is RFC-2047 quoted, long enough to require line wrapping, and it contains one of the splitchars used by Header.encode() -- meaning a semi-colon or comma. In my example, this is: Subject: =?utf-8?Q?2010_Foundation_Salary_and_Benefits_Report;_Important_Legislative_Efforts?= Parsing the message turns that into a string S. generator.Generator._write_headers() then outputs Header(S).encode(), so it keeps treating the value as an ASCII string, and therefore breaks the header at the semicolon, resulting in: Subject: =?utf-8?Q?2010_Foundation_Salary_and_Benefits_Report;<NEWLINE><SPACE>_Important_Legislative_Efforts?= Newline and space aren't legal in Q encoding, so MUAs give up and display all the =?utf-8?Q? stuff.

The attached test program shows how parsing an e-mail message with the email package, then converting the resulting message to a string, fails to round-trip properly.  Instead it breaks the encoding of the subject line.

The root of the problem: the subject is RFC-2047 quoted, long enough to require line wrapping, and it contains one of the splitchars used by Header.encode() -- meaning a semi-colon or comma.  In my example, this is:

Subject: =?utf-8?Q?2010_Foundation_Salary_and_Benefits_Report;_Important_Legislative_Efforts?=

Parsing the message turns that into a string S.  generator.Generator._write_headers() then outputs Header(S).encode(), so it keeps treating the value as an ASCII string, and therefore breaks the header at the semicolon, resulting in:

Subject: =?utf-8?Q?2010_Foundation_Salary_and_Benefits_Report;<NEWLINE><SPACE>_Important_Legislative_Efforts?=

Newline and space aren't legal in Q encoding, so MUAs give up and display all the =?utf-8?Q? stuff.

History
Date	User	Action	Args
2010-05-19 20:55:20	akuchling	set	recipients: + akuchling, barry
2010-05-19 20:55:19	akuchling	set	messageid: <1274302519.83.0.146401120937.issue8769@psf.upfronthosting.co.za>
2010-05-19 20:55:18	akuchling	link	issue8769 messages
2010-05-19 20:55:17	akuchling	create