Message84595
feedparser.py does not pares mixed newlines properly. NLCRE_eol, which
is used to search for the various newlines at End Of Line, uses $ to
match the end of string, but $ also matches \n$, due to a wise long-ago
patch by the Effbot. This causes feedparser to match '\r\n\n' at
'\r\n', and then to remove the last two characters, leaving '\r', thus
eating up a line. Such mixed line endings can occur if a message with
CRLF line endings is parsed, written out, and then parsed again.
When explicitly searching for various newlines, the \Z end-of-string
marker should be used instead. There are two improper uses of $ in
feedparser.py. I don't see any others in the email package.
NLCRE_eol = re.compile('(\r\n|\r|\n)$')
should be:
NLCRE_eol = re.compile('(\r\n|\r|\n)\Z')
and boundary_re also needs the fix.
I can write a test. Where exactly should it be put? |
|
Date |
User |
Action |
Args |
2009-03-30 17:59:37 | tony_nelson | set | recipients:
+ tony_nelson, barry |
2009-03-30 17:59:37 | tony_nelson | set | messageid: <1238435977.23.0.604038870745.issue5610@psf.upfronthosting.co.za> |
2009-03-30 17:59:36 | tony_nelson | link | issue5610 messages |
2009-03-30 17:59:35 | tony_nelson | create | |
|