Issue 1112856: patch 1079734 broke cgi.FieldStorage w/ multipart post req.

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/41505

classification

Title:	patch 1079734 broke cgi.FieldStorage w/ multipart post req.
Type:		Stage:
Components:	Library (Lib)	Versions:	Python 2.5

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	anthonybaxter, arigo, hdiogenes, irmen, jlgijsbers, joshhoyt
Priority:	normal	Keywords:

Created on 2005-01-31 00:58 by irmen, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
cgibug.py	irmen, 2005-01-31 00:58	test program
cgi-email-multipart.diff	jlgijsbers, 2005-02-04 10:15	patch
server.py	irmen, 2005-02-08 21:35	test for freezing problem (server program)
post.html	irmen, 2005-02-08 21:36	test for freezing problem (html form for browser)

Messages (15)
msg24095 - (view)	Author: Irmen de Jong (irmen)	Date: 2005-01-31 00:58
Patch 1079734 "Make cgi.py use email instead of rfc822 or mimetools" seems to have broken the cgi.FieldStorage in cases where the request is a multipart post (for instance, when a file upload form field is used). See the attached test program. With cgi.py revision <1.83 (python 2.4 for instance) I get the expected results; 374 FieldStorage(None, None, [FieldStorage('param1', None, 'Value of param1'), FieldStorage('param2', None, 'Value of param2'), FieldStorage('file', '', ''), FieldStorage(None, None, '')]) but with cgi.py rev 1.83 (current) I get this: 374 FieldStorage(None, None, [FieldStorage('param1', None, '')]) Another thing that I observed (which isn't reproduced by this test program) is that cgi.FieldStorage.__init__ never completes when the fp is a socket-file (and the request is again a multipart post). It worked fine with the old cgi.py.
msg24096 - (view)	Author: Josh Hoyt (joshhoyt)	Date: 2005-02-04 07:21
Logged In: YES user_id=693077 The email.HeaderParser module that is now used for parsing headers consumes the rest of the body of the message when it is used. If there was some way of telling the parser to stop reading when it got to the end of the headers, I think this problem would be fixed. Unfortunately, there is no external interface to the email module's parser that lets us do this. I hope we can come up with a reasonable solution that does not involve reverting to using deprecated modules. Thanks for the test case.
msg24097 - (view)	Author: Johannes Gijsbers (jlgijsbers) *	Date: 2005-02-04 10:15
Logged In: YES user_id=469548 Here's a patch. We're interested in two things in the patched loop: * the rest of the multipart/form-data, including the headers for the current part, for consumption by HeaderParser (this is the tail variable) * the rest of the multipart/form-data without the headers for the current part, for consumption by FieldStorage (this is the message.get_payload() call) Josh, Irmen, do you see any problems with this patch? (BTW, this fix should be ported to the parse_multipart function as well, when I check it in (and I should make cgibug.py into a test))
msg24098 - (view)	Author: Josh Hoyt (joshhoyt)	Date: 2005-02-04 15:45
Logged In: YES user_id=693077 Johannes, your patch looks fine to me. It would be nice if we didn't have to keep reading back each part from the parsed message, though. I had an idea for another approach. Use email to parse the MIME message fully, then convert it to FieldStorage fields. Parsing could go something like: == CODE == from email.FeedParser import FeedParser parser = FeedParser() # Create bogus content-type header... parser.feed('Content-type: %s ; boundary=%s \r\n\r\n' % (self.type, self.innerboundary)) parser.feed(self.fp.read()) message = parser.close() # Then take parsed message and convert to FieldStorage fields == END CODE == This lets the email parser handle all of the complexities of MIME, but it does mean that we have to accurately re-create all of the necessary headers. I can cook up a full patch if anyone thinks this would fly.
msg24099 - (view)	Author: Irmen de Jong (irmen)	Date: 2005-02-04 22:06
Logged In: YES user_id=129426 Johannes: while your patch makes my cgibug.py test case run fine, it has 2 problems: 1- it runs much slower than the python2.4 code (probably because of the reading back thing Josh is talking about); 2- it still doesn't fix the second problem that I observed: cgi.FieldStorage never completes when fp is a socket. I don't have a separate test case for this yet, sorry. So Josh: perhaps your idea doesn't have these 2 problems?
msg24100 - (view)	Author: Josh Hoyt (joshhoyt)	Date: 2005-02-04 23:02
Logged In: YES user_id=693077 Irmen, can you try to create a test case where the cgi.FieldStorage never completes, so I can make sure that any fix I come up with resolves it? I will try to put together an implementation where the email parser parses the whole multipart message.
msg24101 - (view)	Author: Irmen de Jong (irmen)	Date: 2005-02-06 22:58
Logged In: YES user_id=129426 Yes, I'll try to make a test case for that within the next few days.
msg24102 - (view)	Author: Irmen de Jong (irmen)	Date: 2005-02-08 21:40
Logged In: YES user_id=129426 I've added a test that shows the 'freezing' problem I talked about. Start the server.py, it will listen on port 9000. Open the post.html in your web browser, enter some form data, and submit the form. It will POST to the server.py and if you started that one with cvs-python (2.5a0) it will freeze on the marked line. If you start server.py with Python 2.4, it will work fine.
msg24103 - (view)	Author: Josh Hoyt (joshhoyt)	Date: 2005-04-17 23:39
Logged In: YES user_id=693077 I've been playing with using the email parser to do the whole job of parsing the data, and I think it's too big of a change. It'd be very hard to ensure API compatibility and the same behaviour. I think that in the long run, it's the right thing to do, but I think that the API should change when that change is made. My recommendation for the time being is to revert the original patch that I made.
msg24104 - (view)	Author: Armin Rigo (arigo) *	Date: 2005-09-16 11:52
Logged In: YES user_id=4771 The problem is still there now (5 months later), so should I go ahead and revert the cgi.py changes of r1.83? I does break user code out there.
msg24105 - (view)	Author: Irmen de Jong (irmen)	Date: 2005-09-16 13:40
Logged In: YES user_id=129426 I vote for revert, so that my code may run again on Python 2.5. Also see Josh's comment , with which I agree.
msg24106 - (view)	Author: Josh Hoyt (joshhoyt)	Date: 2005-09-16 15:58
Logged In: YES user_id=693077 Yes, please revert the patch.
msg24107 - (view)	Author: Armin Rigo (arigo) *	Date: 2005-09-19 09:12
Logged In: YES user_id=4771 Reverted (Lib/cgi.py r1.85).
msg24108 - (view)	Author: Anthony Baxter (anthonybaxter)	Date: 2005-09-20 03:18
Logged In: YES user_id=29957 Er - while reverting it makes the code work again, I'm _really_ unhappy with this as a long-term solution. We really need to be getting rid of the old, unmaintained, and generally awful code of things like mimetools.
msg67625 - (view)	Author: Humberto Diógenes (hdiogenes) *	Date: 2008-06-02 03:15
> Er - while reverting it makes the code work again, I'm > _really_ unhappy with this as a long-term solution. I've addressed almost everything that's discussed here in issue 2849, implementing Josh's suggestion of using FeedParser. It removes rfc822 (but not mimetools [yet]) dependency from the cgi module, without the parsing problem pointed by cgibug.py and without hanging, as shown in server.py + post.html. Also, preliminary tests revealed that the new FieldStorage.read_multi is about 10 times faster than the old one.

History
Date	User	Action	Args
2022-04-11 14:56:09	admin	set	github: 41505
2008-06-02 03:15:15	hdiogenes	set	nosy: + hdiogenes messages: + msg67625
2005-01-31 00:58:24	irmen	create