This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: MultiFile.read() includes CRLF boundary
Type: Stage:
Components: Library (Lib) Versions:
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: gvanrossum Nosy List: gvanrossum, loewis, mjpieters
Priority: normal Keywords:

Created on 2001-04-18 22:22 by mjpieters, last changed 2022-04-10 16:03 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
multifile-diff.txt gvanrossum, 2001-09-13 19:59 context diff
Messages (9)
msg4346 - (view) Author: Martijn Pieters (mjpieters) * Date: 2001-04-18 22:22
multifile.MultiFile.readlines()and .read() will return 
a body of a multipart message including the line 
delimiter that is to be regarded part of the boundary.

In a partial multipart message like:

--BoundaryHere
Content-Type: text/plain

1
2
3
4
--BoundaryHere

the message within the delimiters does not include the 
final line delimiter (CRLF or LF or whatnot) after the 
line reading '4'; it is considered part of the 
boundary. MultiFile however, returns it as part of the 
body.

See RFC2046 section 5.1.1. In the usual text 
formatting of the RFC, you'll find the definition and 
explanation in the first two paragraphs of page 19.
msg4347 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2001-09-05 17:54
Logged In: YES 
user_id=6380

I wrote that code and I'm probably culpable.  It's also
always bothered me.

Unassigning it from Barry (it has nothing to do with Barry).
msg4348 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2001-09-13 19:59
Logged In: YES 
user_id=6380

Martijn, here's a fix. Can you test this?

The fix works (how else) by reading ahead one line and
stripping the final newline if the next line is empty.
msg4349 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2001-09-18 14:34
Logged In: YES 
user_id=6380

I've checked in the patch now. Still waiting for Martijn's
feedback before I close the report.
msg4350 - (view) Author: Martijn Pieters (mjpieters) * Date: 2001-09-18 15:09
Logged In: YES 
user_id=116747

Your patch looks sound, apart from the fact it'll only
remove a LF. The Spec says the CRLF is part of the boundary,
and, to account for broken implementations, it should
probably remove and of 'CRLF', 'LF', or 'CR' at the end.
msg4351 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2001-09-18 15:40
Logged In: YES 
user_id=6380

I think that CRLF support in this case isn't worth it. It's
not done elsewhere in this module -- it assumes that line
endings have already been converted to Unix style. Lone CR
is definitely not supported -- none of the code would work.
msg4352 - (view) Author: Martijn Pieters (mjpieters) * Date: 2001-09-18 16:16
Logged In: YES 
user_id=116747

Okay, if all the code depends on line-endings being
Unix-style, the patch has my blessings.
msg4353 - (view) Author: Martijn Pieters (mjpieters) * Date: 2001-10-05 20:59
Logged In: YES 
user_id=116747

I just found again where I ran into this problem; in the
Zope HTTP Range header test suite. The code generates RFC
compliant multi-part mime responses and the test suite uses
MessageFile to see if the correct parts are returned.

See expectMultipleRanges in:

 
http://cvs.zope.org/Zope/lib/python/OFS/tests/testRanges.py?rev=1.3&content-type=text/vnd.viewcvs-markup

Right now there is code there that catches the extra that's
part of the boundary and strips this off; this fails with
Python 2.2a4 because now the \n is stripped but the \r is
still attached!

I am more and more convinced that MessageFile should not
expect that the line endings have been normalized to UNIX
only. Instead, it should handle at least the UNIX \n and the
RFC-compliant \r\n situations.
msg4354 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-09-22 09:30
Logged In: YES 
user_id=21627

Revisions 1.19 and 1.20 have been backed out of multile.py,
to fix #514676, for Python 2.2.2 and 2.3. The resolution is
that you should use the email package to get fully
RFC-conforming processing, and that backwards-compatibility
is the priority for multifile. Thus changing the status from
Fixed to Wont Fix.
History
Date User Action Args
2022-04-10 16:03:58adminsetgithub: 34367
2001-04-18 22:22:25mjpieterscreate