Message64864
Opening the file in universal newline mode doesn't work for cases where
the 'file' contains multipart MIME data (eg. multipart/form-data) where
one of the included parts is binary data (eg. application/octet-stream).
In that case, blind translation of CRLF to LF may corrupt the binary
data. (Thanks to Thomas Guettler for pointing that out to me.)
FeedParser goes to considerable trouble to split on any conceivable line
boundary but retain whatever line boundary existed in the stream when
putting things back together. (Look at BufferedSubFile's push() code in
feedparser.py.) It was not written on the assumption that it would be
getting LFs only.
The only code that knows enough to know which CRLFs are really line
breaks is the code that is breaking the stream up based on the boundary
markers -- that is the FeedParser code. It isn't safe for the caller to
do any CRLF conversions before calling the Parser. Therefore I believe
the fix needs to be made to the parser.py code, not the docs.
Two people that I know of independently re-discovered this bug in the
last couple of weeks (running Django), after I re-discovered it about
three months ago after Jeremy Dunck re-discovered it a year earlier,
three months after it was originally opened. Maybe a corner case, but
it would be nice, since it is quite difficult for people to track down,
and the fix is so trivial, if the fix could be put in. |
|
Date |
User |
Action |
Args |
2008-04-02 15:37:50 | kmtracey | set | spambayes_score: 0.00407611 -> 0.0040761116 recipients:
+ kmtracey, barry, paul.moore, guettli, anadelonbrin, graham_king, jdunck |
2008-04-02 15:37:50 | kmtracey | set | spambayes_score: 0.00407611 -> 0.00407611 messageid: <1207150670.35.0.572485002245.issue1555570@psf.upfronthosting.co.za> |
2008-04-02 15:37:48 | kmtracey | link | issue1555570 messages |
2008-04-02 15:37:48 | kmtracey | create | |
|