classification
Title: email.feedparser regular expression bug (NLCRE_crack)
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder: email feedparser.py CRLFLF bug: $ vs \Z
View: 5610
Assigned To: Nosy List: barry, jkg, pitrou, r.david.murray, sandro.tosi, terry.reedy, tony_nelson
Priority: normal Keywords: patch

Created on 2009-07-11 19:58 by jkg, last changed 2011-02-02 21:41 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
test_nlcre.py jkg, 2009-07-11 19:58 unittest-based test
nlcre.patch jkg, 2009-07-11 20:01 unified diff for Python 3.1 source
nlcre_full.patch jkg, 2009-07-12 18:08 Combined patch and unit test.
Messages (10)
msg90433 - (view) Author: (jkg) Date: 2009-07-11 19:58
If the parser is fed a chunk which ends with '\r' and the next chunk
begins with '\n', it incorrectly parses this into a line ending with
'\r' and an empty line ending with '\n' instead of a single line ending
with '\r\n'.

Test attached. Patch to follow.
msg90434 - (view) Author: (jkg) Date: 2009-07-11 20:01
Patch.
msg90449 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-07-12 15:45
Can you include your unit test in your patch rather than as a separate
script? Existing unit tests are in Lib/test.
msg90454 - (view) Author: (jkg) Date: 2009-07-12 18:07
Combined patch as requested by pitrou.

(Sorry. This is my first submission.)
msg90463 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-07-12 23:42
Sorry for giving you a slightly wrong indication. The email tests are
called from Lib/test/test_email.py, but it redirects to
Lib/email/test/*. In any case, there's no point in creating a separate
test script for such a detail, you should add your test to one of the
existing test scripts instead.
msg90643 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2009-07-17 22:34
I believe both 2.4 and 3.0 are no longer maintained. 2.5 only gets
security fixes. On the otherhand, fix should go into 3.2 ;-).
msg127756 - (view) Author: Sandro Tosi (sandro.tosi) * (Python committer) Date: 2011-02-02 19:56
I was looking at this bug and tried to reproduce it, but I can't :( I extracted this code:


part1 = 'Content-Type: multipart/related; start=<op.mhtml.1247227666422.e6e72d4c344a2503@192.168.1.20>; boundary=----------1JBOHhxKNnWgkmE17ZJ2Cy\r\nContent-Location: http://localhost/page1.html\r\nSubject: =?utf-8?Q?test?=\r\nMIME-Version: 1.0\r\n\r\n------------1JBOHhxKNnWgkmE17ZJ2Cy\r'
part2 = '\nContent-Disposition: inline; filename=page1.html\r\nContent-Type: text/html; charset=UTF-8; name=page1.html\r\nContent-Id: <op.mhtml.1247227666422.e6e72d4c344a2503@192.168.1.20>\r\nContent-Location: http://localhost/page1.html\r\nContent-Transfer-Encoding: 8bit\r\n\r\n<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">\r\n<html><head><title>Page 1</title></head><body><p>page 1</p></body></html>\r\n------------1JBOHhxKNnWgkmE17ZJ2Cy--\r\n'
expected = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">\r\n<html><head><title>Page 1</title></head><body><p>page 1</p></body></html>'
from email.feedparser import FeedParser
feedparser = FeedParser()
feedparser.feed(part1)
feedparser.feed(part2)
m = feedparser.close()
mm = m.get_payload()
mm[0].get_payload() == expected

from the test attached to this bug, and tried on:

* py3k
* release3.1-maint
* release2.7-maint
* debian 2.6.6

(the first 3 recompiled just before the test) and in all of the cases the last instruction returns True, so I'm actually quite skeptical this is still a bug, or there something I'm missing.

I'm not closing this bug yet, since I'd like to hear first from the people involved back then.
msg127763 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-02-02 21:23
My notes say that this bug is "similar to issue 5610", the fix for which made it in to 2.6.  I meant to come back and see if that fix fixed this bug, but I forgot.  The fix is different, so it is worth verifying that this test case fails in 2.5 but works subsequently.
msg127765 - (view) Author: Sandro Tosi (sandro.tosi) * (Python committer) Date: 2011-02-02 21:34
Lucky as I can be, I have 2.5.5 here and I can confirm that with this version the code above fails - so I think this issue can be closed as fixed by issue5610 (I hope I correctly set all the fields, if not please let me know :)).
msg127768 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-02-02 21:41
Looks good to me.  I've added the #6510 as superseder, though I doubt we'll ever make use of that info :)
History
Date User Action Args
2011-02-02 21:41:36r.david.murraysetsuperseder: email feedparser.py CRLFLF bug: $ vs \Z
messages: + msg127768
nosy: barry, terry.reedy, pitrou, tony_nelson, r.david.murray, jkg, sandro.tosi
2011-02-02 21:34:26sandro.tosisetstatus: open -> closed
nosy: barry, terry.reedy, pitrou, tony_nelson, r.david.murray, jkg, sandro.tosi
messages: + msg127765

resolution: fixed
stage: resolved
2011-02-02 21:23:14r.david.murraysetnosy: + r.david.murray
messages: + msg127763
2011-02-02 19:56:39sandro.tosisetnosy: + sandro.tosi
messages: + msg127756
2009-07-17 22:34:44terry.reedysetnosy: + terry.reedy

messages: + msg90643
versions: + Python 3.2, - Python 2.5, Python 2.4, Python 3.0
2009-07-12 23:42:46pitrousetmessages: + msg90463
2009-07-12 18:08:03jkgsetfiles: + nlcre_full.patch

messages: + msg90454
2009-07-12 15:45:20pitrousetnosy: + pitrou
messages: + msg90449
2009-07-11 20:01:27jkgsetfiles: + nlcre.patch
keywords: + patch
messages: + msg90434
2009-07-11 19:58:38jkgcreate