Title: cgi.parse_multipart is broken on 3.x
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.2, Python 3.3
Status: open Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: MHordecki, efosmark, eric.araujo, flox, glyph, gvanrossum, haypo, jonas.wagner, milesck, orsenthil, python-dev, quentel, r.david.murray
Priority: normal Keywords: patch

Created on 2011-06-25 15:09 by jonas.wagner, last changed 2014-02-04 12:15 by taleinat.

File name Uploaded Description Edit
cgi-coverage.diff jonas.wagner, 2011-06-25 15:09 review
cgi-coverage-2.diff jonas.wagner, 2011-07-04 11:26 review
Messages (13)
msg139082 - (view) Author: Jonas Wagner (jonas.wagner) Date: 2011-06-25 15:09
While writing tests for the cgi module I came across what looks like a conversion bug.

cgi.parse_multipart is comparing values it reads from a binary file like with a string literal:
line = fp.readline()
if line.startswith("--"):

This patch adds fixes the issue and adds test for it.
msg139649 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-07-02 13:53
This looks like a conversion bug indeed; network I/O should use bytes.  Strange that no-one caught this, but if there was no test and no users, then bugs can slip.

See also #11066, #8077, #4953, #6234 (also adding some people from those bugs’ nosy fields).
msg139668 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-07-03 01:26
Indeed, Victor's comments on his patch say that he changed code that was in the posted patch to say 'line.startswith(b'--')', and the original patch did use b'--', but the code he checked in is missing the 'b'.  He also asked for more tests.

Victor, any chance you can review this patch, since you were the last one to work on the code in question?
msg139742 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2011-07-04 07:55
The patch seems broken to me.

In cgi.parse_multipart(), the 'boundary' variable can be a string even though it is concatenated to bytes. Its default value is a string, and a string can be given via the pdict argument. There is no validity check other than valid_boundary(), which allows both string and bytes.

Most of the changes to are entirely unrelated. The one test added which tests cgi.parse_multipart() should fail since it uses a string (not bytes) boundary, while the correct boundary for the test is commented out.

I short this patch seems half-baked. IMO reject this patch and fix just the bytes/strings issue with cgi.parse_multipart. Or, as mentioned in the comments, use FieldStorage to implement it and be done with it.
msg139752 - (view) Author: Jonas Wagner (jonas.wagner) Date: 2011-07-04 11:26
Hi Tal,

Thanks a lot for your feedback.

My primary objective was to increase the test coverage for If it is a problem to have the additional tests in this patch I'm happy to create a new issue with a separate patch. 

The default value for the boundary was an oversight, sorry for that.

You are right regarding the commented out boundary as well, I forgot to refresh the patch. Again, sorry.

Do you think valid_boundary should contain a check to ensure it is a byte object?
msg139757 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2011-07-04 12:11
Yes, please submit the other additional tests in a separate issue.

The default value for boundary should surely be b"". A simple test should be added where cgi.parse_multipart() uses the default boundary.

If valid_boundary() is used only for cgi.parse_multipart() then it should be changed to validate that the boundary is a bytes instance (which would also make it simpler). Otherwise (if vaild_boundary() is also used elsewhere) cgi.parse_multipart() should itself check that the boundary is indeed a bytes instance, throwing a TypeError otherwise.

Tip: You should run the relevant tests, making sure they all pass, before submitting a patch. That way you really know that the patch actually works (as far as passing all of the tests).

Thanks for adding more stdlib tests :)
msg139803 - (view) Author: Pierre Quentel (quentel) Date: 2011-07-04 20:44
When the FieldStorage class was fixed there was a discussion in issue 4953 about the module-level functions parse() and parse_multipart(). The code was very similar to methods of the FieldStorage class so the idea was to use FieldStorage inside the functions

The patch proposed in issue 11066 replaced the code in parse_multipart by just :

def parse_multipart(fp, pdict):
    return FieldStorage(fp,environ=pdict)

Did anyone test it ?
msg180417 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-22 18:39
Twisted would really like to see this bug fixed.
msg180425 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-22 19:16
Does anyone who was on this bug previously (e.g. the original author
or the reviewers) know what was holding up the patch? Does it need
more review? More tests? Is there any reason to reject fixing this at
all? (I hope not.) As far as replacing the whole thing with a call
into the other code goes, I'm hesitant if only because we don't have
enough unit tests for the edge cases of the implementation that would
be deleted, so if the wholesale replacement were to break user code we
wouldn't find out until after it's been released. Fixing it seems less
msg180428 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013-01-22 19:22
I personally think, that the "grey area" of multipart form encoding and trying to use email's updated features for parsing was holding it, not the tests. This can be submitted IMO after looking at the "related bugs", I shall do a review on this one today.
msg180429 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-22 19:23
Thank you very much Senthil!
msg180456 - (view) Author: Roundup Robot (python-dev) Date: 2013-01-23 11:01
New changeset a46a0dafcb7a by Senthil Kumaran in branch '3.2':
Issue #12411: Fix to cgi.parse_multipart to correctly use bytes boundaries and

New changeset 59ea872d8b6b by Senthil Kumaran in branch '3.3':
merge from 3.2

New changeset 3d7000549eb1 by Senthil Kumaran in branch 'default':
merge from 3.3
msg180457 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013-01-23 11:05
I updated the patch addressing Ezio's comments in the review system and also condensed the tests. This fixes the parse_multipart's byte handling at "some" level. 

The docstring of parse_multipart say that, this should be deprecated in favor of FieldStorage completely. I will have to trace through the argument and see what should be done here and then I shall close this bug.
Date User Action Args
2014-02-04 12:15:19taleinatsetnosy: - taleinat
2013-01-23 11:05:33orsenthilsetassignee: orsenthil
resolution: fixed
messages: + msg180457
2013-01-23 11:01:36python-devsetnosy: + python-dev
messages: + msg180456
2013-01-22 19:23:25gvanrossumsetmessages: + msg180429
2013-01-22 19:22:17orsenthilsetmessages: + msg180428
2013-01-22 19:16:07gvanrossumsetmessages: + msg180425
2013-01-22 18:41:29glyphsetnosy: + glyph
2013-01-22 18:39:04gvanrossumsetnosy: + gvanrossum
messages: + msg180417
2012-07-21 14:15:03floxsetnosy: + orsenthil
2011-07-04 20:44:02quentelsetmessages: + msg139803
2011-07-04 12:11:23taleinatsetmessages: + msg139757
2011-07-04 11:26:35jonas.wagnersetfiles: + cgi-coverage-2.diff

messages: + msg139752
2011-07-04 07:55:58taleinatsetnosy: + taleinat
messages: + msg139742
2011-07-03 01:26:27r.david.murraysetnosy: + r.david.murray
messages: + msg139668
2011-07-02 13:53:27eric.araujosetnosy: + eric.araujo, efosmark, milesck, MHordecki, flox, quentel

messages: + msg139649
versions: + Python 3.2
2011-06-25 15:16:37r.david.murraysetnosy: + haypo
2011-06-25 15:09:07jonas.wagnercreate