classification
Title: email feedparser.py CRLFLF bug: $ vs \Z
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: r.david.murray Nosy List: barry, r.david.murray, tony_nelson
Priority: normal Keywords: patch

Created on 2009-03-30 17:59 by tony_nelson, last changed 2010-06-16 17:29 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
feedparser_crlflf.patch tony_nelson, 2009-03-30 17:59 patch to fix feedparser CRLFLF problem, no tests
feedparser_crlflf_test.patch tony_nelson, 2009-03-31 04:21 Test for feedparser CRLFLF problem.
Messages (5)
msg84595 - (view) Author: Tony Nelson (tony_nelson) Date: 2009-03-30 17:59
feedparser.py does not pares mixed newlines properly.  NLCRE_eol, which
is used to search for the various newlines at End Of Line, uses $ to
match the end of string, but $ also matches \n$, due to a wise long-ago
patch by the Effbot.  This causes feedparser to match '\r\n\n' at
'\r\n', and then to remove the last two characters, leaving '\r', thus
eating up a line.  Such mixed line endings can occur if a message with
CRLF line endings is parsed, written out, and then parsed again.

When explicitly searching for various newlines, the \Z end-of-string
marker should be used instead.  There are two improper uses of $ in
feedparser.py.  I don't see any others in the email package.

NLCRE_eol = re.compile('(\r\n|\r|\n)$')

should be:

NLCRE_eol = re.compile('(\r\n|\r|\n)\Z')

and boundary_re also needs the fix.

I can write a test.  Where exactly should it be put?
msg84746 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2009-03-31 03:34
I think this is a good idea.  Does the existing test suite still pass
with this change?

For a long time, email's philosophy was to use native line endings and
never expected mixed eol, and it definitely never expected mixed line
endings, so we'll need at least a few tests for this.  Add them to
Lib/email/test_email.py (for now, I want to split this file up soon).
msg84751 - (view) Author: Tony Nelson (tony_nelson) Date: 2009-03-31 04:21
make test still passes all tests except test_httpservers on my Python
2.6.1 build.  The network resource was not enabled and tk is not available.

The new test for CRLFLF at the end of a message body is added to
Lib/email/test_email at the end of the TestParsers class.  It passes
with the fix patch and fails without it.

What other tests do you want?
msg106960 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-06-03 15:46
I've applied the NLCRE_eol part of the patch, and the test, in r81675.  Tony, can you think of a test case that would demonstrate the problem with the boundaryre?
msg107939 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-06-16 17:29
NLCR_eol patch merged to 2.6 in r82010, py3k in r82011, and 3.1 in r82012.

Since there's been no further feedback I'll close this.  If someone comes up with a test case for the boundary_re, they can open a new issue.
History
Date User Action Args
2011-02-02 21:41:36r.david.murraylinkissue6465 superseder
2010-06-16 17:29:41r.david.murraysetstatus: open -> closed
versions: + Python 3.1, Python 2.7, Python 3.2
type: behavior
messages: + msg107939

resolution: fixed
stage: test needed -> resolved
2010-06-03 15:46:44r.david.murraysetmessages: + msg106960
stage: test needed
2010-05-05 13:46:47barrysetassignee: barry -> r.david.murray

nosy: + r.david.murray
2010-01-09 17:50:05r.david.murraylinkissue6681 superseder
2009-03-31 04:21:18tony_nelsonsetfiles: + feedparser_crlflf_test.patch

messages: + msg84751
versions: - Python 3.1
2009-03-31 03:34:43barrysetmessages: + msg84746
2009-03-31 03:07:17barrysetassignee: barry
versions: + Python 3.1
2009-03-30 17:59:36tony_nelsoncreate