This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: patch 1079734 broke cgi.FieldStorage w/ multipart post req.
Type: Stage:
Components: Library (Lib) Versions: Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: anthonybaxter, arigo, hdiogenes, irmen, jlgijsbers, joshhoyt
Priority: normal Keywords:

Created on 2005-01-31 00:58 by irmen, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
cgibug.py irmen, 2005-01-31 00:58 test program
cgi-email-multipart.diff jlgijsbers, 2005-02-04 10:15 patch
server.py irmen, 2005-02-08 21:35 test for freezing problem (server program)
post.html irmen, 2005-02-08 21:36 test for freezing problem (html form for browser)
Messages (15)
msg24095 - (view) Author: Irmen de Jong (irmen) (Python triager) Date: 2005-01-31 00:58
Patch 1079734 "Make cgi.py use email instead of rfc822
or mimetools" seems to have broken the cgi.FieldStorage
in cases where the request is a multipart post (for
instance, when a file upload form field is used).
See the attached test program.

With cgi.py revision <1.83 (python 2.4 for instance) I
get the expected results;

374
FieldStorage(None, None, [FieldStorage('param1', None,
'Value of param1'), FieldStorage('param2', None, 'Value
of param2'), FieldStorage('file', '', ''),
FieldStorage(None, None, '')])

but with cgi.py rev 1.83 (current) I get this:

374
FieldStorage(None, None, [FieldStorage('param1', None,
'')])


Another thing that I observed (which isn't reproduced
by this test program) is that cgi.FieldStorage.__init__
never completes when the fp is a socket-file (and the
request is again a multipart post). It worked fine with
the old cgi.py.
msg24096 - (view) Author: Josh Hoyt (joshhoyt) Date: 2005-02-04 07:21
Logged In: YES 
user_id=693077

The email.HeaderParser module that is now used for parsing
headers consumes the rest of the body of the message when it
is used. If there was some way of telling the parser to stop
reading when it got to the end of the headers, I think this
problem would be fixed. Unfortunately, there is no external
interface to the email module's parser that lets us do this.
I hope we can come up with a reasonable solution that does
not involve reverting to using deprecated modules.

Thanks for the test case.
msg24097 - (view) Author: Johannes Gijsbers (jlgijsbers) * (Python triager) Date: 2005-02-04 10:15
Logged In: YES 
user_id=469548

Here's a patch. We're interested in two things in the
patched loop: 

* the rest of the multipart/form-data, including the headers
for the current part, for consumption by HeaderParser (this
is the tail variable)
* the rest of the multipart/form-data without the headers
for the current part, for consumption by FieldStorage (this
is the message.get_payload() call)

Josh, Irmen, do you see any problems with this patch?

(BTW, this fix should be ported to the parse_multipart
function as well, when I check it in (and I should make
cgibug.py into a test))
msg24098 - (view) Author: Josh Hoyt (joshhoyt) Date: 2005-02-04 15:45
Logged In: YES 
user_id=693077

Johannes, your patch looks fine to me. It would be nice if
we didn't have to keep reading back each part from the
parsed message, though.

I had an idea for another approach. Use email to parse the
MIME message fully, then convert it to FieldStorage fields.
Parsing could go something like:

== CODE ==

from email.FeedParser import FeedParser
parser = FeedParser()
# Create bogus content-type header...
parser.feed('Content-type: %s ; boundary=%s \r\n\r\n' %
(self.type, self.innerboundary))
parser.feed(self.fp.read())
message = parser.close()

# Then take parsed message and convert to FieldStorage fields

== END CODE ==

This lets the email parser handle all of the complexities of
MIME, but it does mean that we have to accurately re-create
all of the necessary headers. I can cook up a full patch if
anyone thinks this would fly.
msg24099 - (view) Author: Irmen de Jong (irmen) (Python triager) Date: 2005-02-04 22:06
Logged In: YES 
user_id=129426

Johannes: while your patch makes my cgibug.py test case run
fine, it has 2 problems:
1- it runs much slower than the python2.4 code (probably
because of the reading back thing Josh is talking about);
2- it still doesn't fix the second problem that I observed:
cgi.FieldStorage never completes when fp is a socket. I
don't have a separate test case for this yet, sorry.

So Josh: perhaps your idea doesn't have these 2 problems?
msg24100 - (view) Author: Josh Hoyt (joshhoyt) Date: 2005-02-04 23:02
Logged In: YES 
user_id=693077

Irmen, can you try to create a test case where the
cgi.FieldStorage never completes, so I can make sure that
any fix I come up with resolves it?

I will try to put together an implementation where the email
parser parses the whole multipart message.
msg24101 - (view) Author: Irmen de Jong (irmen) (Python triager) Date: 2005-02-06 22:58
Logged In: YES 
user_id=129426

Yes, I'll try to make a test case for that within the next
few days. 
msg24102 - (view) Author: Irmen de Jong (irmen) (Python triager) Date: 2005-02-08 21:40
Logged In: YES 
user_id=129426

I've added a test that shows the 'freezing' problem I talked
about.
Start the server.py, it will listen on port 9000. Open the
post.html in your web browser, enter some form data, and
submit the form.
It will POST to the server.py and if you started that one
with cvs-python (2.5a0) it will freeze on the marked line.
If you start server.py with Python 2.4, it will work fine.
msg24103 - (view) Author: Josh Hoyt (joshhoyt) Date: 2005-04-17 23:39
Logged In: YES 
user_id=693077

I've been playing with using the email parser to do the
whole job of parsing the data, and I think it's too big of a
change. It'd be very hard to ensure API compatibility and
the same behaviour. I think that in the long run, it's the
right thing to do, but I think that the API should change
when that change is made. My recommendation for the time
being is to revert the original patch that I made.
msg24104 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2005-09-16 11:52
Logged In: YES 
user_id=4771

The problem is still there now (5 months later), so should I go ahead and revert the cgi.py changes of r1.83?  I does break user code out there.
msg24105 - (view) Author: Irmen de Jong (irmen) (Python triager) Date: 2005-09-16 13:40
Logged In: YES 
user_id=129426

I vote for revert, so that my code may run again on Python 2.5. 
Also see Josh's comment ,  with which I agree.
msg24106 - (view) Author: Josh Hoyt (joshhoyt) Date: 2005-09-16 15:58
Logged In: YES 
user_id=693077

Yes, please revert the patch.
msg24107 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2005-09-19 09:12
Logged In: YES 
user_id=4771

Reverted (Lib/cgi.py r1.85).
msg24108 - (view) Author: Anthony Baxter (anthonybaxter) (Python triager) Date: 2005-09-20 03:18
Logged In: YES 
user_id=29957

Er - while reverting it makes the code work again, I'm
_really_ unhappy with this as a long-term solution. We
really need to be getting rid of the old, unmaintained, and
generally awful code of things like mimetools. 
msg67625 - (view) Author: Humberto Diógenes (hdiogenes) * Date: 2008-06-02 03:15
> Er - while reverting it makes the code work again, I'm
> _really_ unhappy with this as a long-term solution.

I've addressed almost everything that's discussed here in issue 2849, 
implementing Josh's suggestion of using FeedParser.

It removes rfc822 (but not mimetools [yet]) dependency from the cgi 
module, without the parsing problem pointed by cgibug.py and without 
hanging, as shown in server.py + post.html. Also, preliminary tests 
revealed that the new FieldStorage.read_multi is about 10 times faster 
than the old one.
History
Date User Action Args
2022-04-11 14:56:09adminsetgithub: 41505
2008-06-02 03:15:15hdiogenessetnosy: + hdiogenes
messages: + msg67625
2005-01-31 00:58:24irmencreate