Message 125884 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	v+python
Recipients	r.david.murray, v+python
Date	2011-01-10.08:34:34
SpamBayes Score	1.6869284e-12
Marked as misclassified	No
Message-id	<1294648490.09.0.987257947586.issue10879@psf.upfronthosting.co.za>
In-reply-to

Content
In attempting to review issue 4953, I discovered a conundrum in handling of multipart/formdata. cgi.py has claimed for some time (at least since 2.4) that it "handles" file storage for uploading large files. I looked at the code in 2.6 that handles such, and it uses the rfc822.Message method, which parses headers from any object supporting readline(). In particular, it doesn't attempt to read message bodies, and there is code in cgi.py to perform that. There is still code in 3.2 cgi.py to read message bodies, but... rfc822 has gone away, and been replaced with the email package. Theoretically this is good, but the cgi FieldStorage read_multi method now parses the whole CGI input and then iteration parcels out items to FieldStorage instances. There is a significant difference here: email reads everything into memory (if I understand it correctly). That will never work to upload large or many files when combined with a Web server that launches CGI programs with memory limits. I see several possible actions that could be taken: 1) Documentation. While it is doubtful that any is using 3.x CGI, and this makes it more doubtful, the present code does not match the documentation, because while the documenteation claims to handle file uploads as files, rather than in-memory blobs, the current code does not do that. 2) If there is a method in the email package that corresponds to rfc822.Message, parsing only headers, I couldn't find it. Perhaps it is possible to feed just headers to BytesFeedParser, and stop, and get the same sort of effect. However, this is not the way the cgi.py presently is coded. And if there is a better API, for parsing only headers, that is or could be exposed by email, that might be handy. 3) The 2.6 cgi.py does not claim to support nested multipart/ stuff, only one level. I'm not sure if any present or planned web browsers use nested multipart/ stuff... I guess it would require a nested <form> tag? which is illegal HTML last I checked. So perhaps the general logic flow of 2.6 cgi.py could be reinstated, with a technique to feed only headers to BytesFeedParser, together with reinstating the MIME body parsing in cgi.py,b and this could make a solution that works. I discovered this, beacuase I couldn't figure out where a bunch of the methods in cgi.py were called from, particularly read_lines_to_outerboundary, and make_file. They seemed to be called much too late in the process. It wasn't until I looked back at 2.6 code that I could see that there was a transition from using rfc822 only for headers to using email for parsing the whole data stream, and that that was the cause of the documentation not seeming to match the code logic. I have no idea if this problem is in 2.7, as I don't have it installed here for easy reference, and I'm personally much more interested in 3.2.

In attempting to review issue 4953, I discovered a conundrum in handling of multipart/formdata.

cgi.py has claimed for some time (at least since 2.4) that it "handles" file storage for uploading large files. I looked at the code in 2.6 that handles such, and it uses the rfc822.Message method, which parses headers from any object supporting readline(). In particular, it doesn't attempt to read message bodies, and there is code in cgi.py to perform that.

There is still code in 3.2 cgi.py to read message bodies, but... rfc822 has gone away, and been replaced with the email package. Theoretically this is good, but the cgi FieldStorage read_multi method now parses the whole CGI input and then iteration parcels out items to FieldStorage instances. There is a significant difference here: email reads everything into memory (if I understand it correctly). That will never work to upload large or many files when combined with a Web server that launches CGI programs with memory limits.

I see several possible actions that could be taken:
1) Documentation. While it is doubtful that any is using 3.x CGI, and this makes it more doubtful, the present code does not match the documentation, because while the documenteation claims to handle file uploads as files, rather than in-memory blobs, the current code does not do that.

2) If there is a method in the email package that corresponds to rfc822.Message, parsing only headers, I couldn't find it. Perhaps it is possible to feed just headers to BytesFeedParser, and stop, and get the same sort of effect. However, this is not the way the cgi.py presently is coded. And if there is a better API, for parsing only headers, that is or could be exposed by email, that might be handy.

3) The 2.6 cgi.py does not claim to support nested multipart/ stuff, only one level. I'm not sure if any present or planned web browsers use nested multipart/ stuff... I guess it would require a nested <form> tag? which is illegal HTML last I checked. So perhaps the general logic flow of 2.6 cgi.py could be reinstated, with a technique to feed only headers to BytesFeedParser, together with reinstating the MIME body parsing in cgi.py,b and this could make a solution that works.

I discovered this, beacuase I couldn't figure out where a bunch of the methods in cgi.py were called from, particularly read_lines_to_outerboundary, and make_file. They seemed to be called much too late in the process. It wasn't until I looked back at 2.6 code that I could see that there was a transition from using rfc822 only for headers to using email for parsing the whole data stream, and that that was the cause of the documentation not seeming to match the code logic. I have no idea if this problem is in 2.7, as I don't have it installed here for easy reference, and I'm personally much more interested in 3.2.

History
Date	User	Action	Args
2011-01-10 08:34:50	v+python	set	recipients: + v+python, r.david.murray
2011-01-10 08:34:50	v+python	set	messageid: <1294648490.09.0.987257947586.issue10879@psf.upfronthosting.co.za>
2011-01-10 08:34:34	v+python	link	issue10879 messages
2011-01-10 08:34:34	v+python	create