classification
Title: email.parser clips trailing \n of multipart/mixed part if part ends in \r\n
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.0, Python 3.1, Python 3.2, Python 2.7, Python 2.6, Python 2.5
process
Status: closed Resolution: duplicate
Dependencies: Superseder: email feedparser.py CRLFLF bug: $ vs \Z
View: 5610
Assigned To: Nosy List: barry, gvanrossum, r.david.murray
Priority: normal Keywords: patch

Created on 2009-08-10 22:58 by gvanrossum, last changed 2010-01-09 17:50 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
barry.py gvanrossum, 2009-08-10 22:58 demo
issue6681.diff r.david.murray, 2009-08-14 22:12
Messages (6)
msg91466 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2009-08-10 22:58
I am using an edge case of multipart/mixed and find that the
multipart/mix parser in the email package is broken.  See attached
example.  Similar code using cgi.FieldStorage (!) works fine.

The problem happens through the following combination of factors:

1. Content-Length given
2. Content-Transfer-Encoding: 8bit
3. Last two bytes of the part body are '\r\n'

In this case, the final '\n' is removed from the part body, leaving it a
byte short.  Note that interior occurrences of '\r\n' work fine, as does
any other binary data -- it's only a trailing '\r\n' that breaks.

Note that technically perhaps the use of 8bit is invalid; but the same
problem happens when using binary instead.

The problem can be reproduced in Python 3.x using nearly the same demo
by change the cStringIO import to "import io".
msg91467 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2009-08-10 23:38
Older Python versions too.
msg91477 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2009-08-11 15:21
Note that the headers in the subpart don't matter at all.  I'm sure this
is not a problem with MIME parsing, but with line ending issues.  It
might be related to mixing line endings, but we know that the email
package has some line ending problems.
msg91540 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-08-14 03:51
Looks like it is a regular expression issue.  The code is trying to
delete the last linend before the boundary, which belongs to the
boundary according to the RFC, but it does so with the following RE:

    (\r\n|\r|\n)$

This RE matches '\r\n' in '\r\n\n', which is what Guido's message had. 
The code then deletes the number of characters equal to the length of
the match.  So yes, it is a mixed line ending problem.
msg91575 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-08-14 22:12
The only way I can think of to fix this that won't fail in the case
where the body ends with just '\r' (rather than '\r\n' the way the test
body does) is to have feedparser keep track of what the overall line
endings for the stream being parsed are.  ie: to basicly outlaw
mixed-line-ending input (except insofar as such alternate line endings
are encoded in a non-text multipart).  That seems like something that
should only be considered in the context of email 6.0 rather than in a
bug fix.

cgi doesn't use a RE, by the way, it just looks at the last two
chars...and is subject to the same bug if a part ends with '\r' in input
with '\n' line terminators.

I've attached a patch that turns Guido's test into a test case, and
fixes his edge case.  I did not touch the other places where the eol RE
is used, since in those cases there should not be binary data preceeding
the line ending characters.
msg97464 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-01-09 17:50
This turns out to be a duplicate of issue 5610, which has a better solution.
History
Date User Action Args
2010-01-09 17:50:05r.david.murraysetstatus: open -> closed
resolution: duplicate
messages: + msg97464

superseder: email feedparser.py CRLFLF bug: $ vs \Z
stage: patch review -> resolved
2009-08-14 22:12:04r.david.murraysetfiles: + issue6681.diff
priority: normal
messages: + msg91575

keywords: + patch
stage: patch review
2009-08-14 03:51:11r.david.murraysetnosy: + r.david.murray
messages: + msg91540
2009-08-11 15:21:03barrysetnosy: + barry
messages: + msg91477
2009-08-10 23:38:01gvanrossumsetmessages: + msg91467
versions: + Python 2.5
2009-08-10 22:58:32gvanrossumcreate