classification
Title: Straightforward usage of email package fails to round-trip
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: email.header.Header doesn't fold headers correctly
View: 11492
Assigned To: r.david.murray Nosy List: akuchling, barry, r.david.murray
Priority: normal Keywords: patch

Created on 2010-05-19 20:55 by akuchling, last changed 2011-04-18 14:45 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
email-roundtrip-failure.py akuchling, 2010-05-19 20:55
issue8769.txt akuchling, 2010-05-19 20:59
Messages (4)
msg106100 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2010-05-19 20:55
The attached test program shows how parsing an e-mail message with the email package, then converting the resulting message to a string, fails to round-trip properly.  Instead it breaks the encoding of the subject line.

The root of the problem: the subject is RFC-2047 quoted, long enough to require line wrapping, and it contains one of the splitchars used by Header.encode() -- meaning a semi-colon or comma.  In my example, this is:

Subject: =?utf-8?Q?2010_Foundation_Salary_and_Benefits_Report;_Important_Legislative_Efforts?=

Parsing the message turns that into a string S.  generator.Generator._write_headers() then outputs Header(S).encode(), so it keeps treating the value as an ASCII string, and therefore breaks the header at the semicolon, resulting in:
  
Subject: =?utf-8?Q?2010_Foundation_Salary_and_Benefits_Report;<NEWLINE><SPACE>_Important_Legislative_Efforts?=

Newline and space aren't legal in Q encoding, so MUAs give up and display all the =?utf-8?Q? stuff.
msg106101 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2010-05-19 20:59
The attached patch is a possible fix; it uses the decode_header() and make_header() functions to figure out the encoding properly; it fixes my example, at least.  But does it increase the odds of crashing on messages with malformed headers?  Should it go into 2.7 given that we're at the RC stage?  What about 2.6?

(BTW, Barry, I noticed this because messages being sent through Mailman were coming out with broken subject lines.  The system generating the messages is slightly weird -- doing the UTF-8 quoting is unnecessary since the subject contains no special characters -- but I think Mailman shouldn't be breaking subject lines.  I haven't verified that this Python fix actually fixes Mailman, but I think this is a Python bug, not a Mailman bug.)
msg106102 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2010-05-19 21:00
Minor fix to the patch: the import of Header could actually be removed, since the class is no longer referenced at all with this change.
msg133972 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-18 14:45
This is fixed in 3.2/3.3 by the fix for issue 11492.  The suggested fix for 2.7 is more radical than I'm comfortable with for a point release. I'm open to argument on that, but in the meantime I'm closing the issue with 11492 as the superseder.
History
Date User Action Args
2011-04-18 14:45:33r.david.murraysetstatus: open -> closed
resolution: duplicate
messages: + msg133972

superseder: email.header.Header doesn't fold headers correctly
stage: resolved
2011-03-14 03:30:26r.david.murraysetversions: + Python 3.3
2010-12-27 18:27:41r.david.murraysetversions: + Python 3.1, Python 3.2
2010-12-14 18:41:06r.david.murraysettype: behavior
2010-10-25 19:35:22barrysetassignee: barry -> r.david.murray

nosy: + r.david.murray
2010-05-19 21:00:28akuchlingsetmessages: + msg106102
2010-05-19 20:59:43akuchlingsetfiles: + issue8769.txt

messages: + msg106101
2010-05-19 20:55:18akuchlingcreate