Issue 10879: cgi memory usage

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/55088

classification

Title:	cgi memory usage
Type:	enhancement	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.3

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	ethan.furman, r.david.murray, v+python
Priority:	normal	Keywords:

Created on 2011-01-10 08:34 by v+python, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (5)
msg125884 - (view)	Author: Glenn Linderman (v+python) *	Date: 2011-01-10 08:34
In attempting to review issue 4953, I discovered a conundrum in handling of multipart/formdata. cgi.py has claimed for some time (at least since 2.4) that it "handles" file storage for uploading large files. I looked at the code in 2.6 that handles such, and it uses the rfc822.Message method, which parses headers from any object supporting readline(). In particular, it doesn't attempt to read message bodies, and there is code in cgi.py to perform that. There is still code in 3.2 cgi.py to read message bodies, but... rfc822 has gone away, and been replaced with the email package. Theoretically this is good, but the cgi FieldStorage read_multi method now parses the whole CGI input and then iteration parcels out items to FieldStorage instances. There is a significant difference here: email reads everything into memory (if I understand it correctly). That will never work to upload large or many files when combined with a Web server that launches CGI programs with memory limits. I see several possible actions that could be taken: 1) Documentation. While it is doubtful that any is using 3.x CGI, and this makes it more doubtful, the present code does not match the documentation, because while the documenteation claims to handle file uploads as files, rather than in-memory blobs, the current code does not do that. 2) If there is a method in the email package that corresponds to rfc822.Message, parsing only headers, I couldn't find it. Perhaps it is possible to feed just headers to BytesFeedParser, and stop, and get the same sort of effect. However, this is not the way the cgi.py presently is coded. And if there is a better API, for parsing only headers, that is or could be exposed by email, that might be handy. 3) The 2.6 cgi.py does not claim to support nested multipart/ stuff, only one level. I'm not sure if any present or planned web browsers use nested multipart/ stuff... I guess it would require a nested <form> tag? which is illegal HTML last I checked. So perhaps the general logic flow of 2.6 cgi.py could be reinstated, with a technique to feed only headers to BytesFeedParser, together with reinstating the MIME body parsing in cgi.py,b and this could make a solution that works. I discovered this, beacuase I couldn't figure out where a bunch of the methods in cgi.py were called from, particularly read_lines_to_outerboundary, and make_file. They seemed to be called much too late in the process. It wasn't until I looked back at 2.6 code that I could see that there was a transition from using rfc822 only for headers to using email for parsing the whole data stream, and that that was the cause of the documentation not seeming to match the code logic. I have no idea if this problem is in 2.7, as I don't have it installed here for easy reference, and I'm personally much more interested in 3.2.
msg125888 - (view)	Author: Glenn Linderman (v+python) *	Date: 2011-01-10 09:45
Trying to code some of this, it would be handy if BytesFeedParser.feed would return a status, indicating if it has seen the end of the headers yet. But that would only work if it is parsing as it goes, rather than just buffering, with all the real parsing work being done at .close time.
msg125902 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2011-01-10 13:40
The email package does have a 'parser headers only' mode, but it doesn't do what you want, since it reads the remainder of the file and sets it as the payload of the single, un-nested Message object it returns. Adding a flag to tell it to stop parsing instead of doing that will probably be fairly simple, but is a feature request. However, I'm not clear on how that helps. Doesn't FieldStorage also load everything into memory? There's an open feature request for providing a way to use alternate backing stores for the bodies of message parts in the email package, which would address this issue.
msg125923 - (view)	Author: Glenn Linderman (v+python) *	Date: 2011-01-10 20:17
R. David said: However, I'm not clear on how that helps. Doesn't FieldStorage also load everything into memory? I say: FieldStorage in 2.x (for x <= 6, at least) copies incoming file data to a file, using limited size read/write operations. Non-file data is buffered in memory. In 3.x, FieldStorage doesn't work. The code that is there, though, for multipart/ data, would call email to do all the parsing, which would happen to include file data, which always comes in as part of a multipart/ data stream. This would prevent cgi from being used to accept large files in a limited environment. Sadly, there is code is place that would the copy the memory buffers to files, and act like they were buffered... but process limits do not care that the memory usage is only temporary...
msg126968 - (view)	Author: Glenn Linderman (v+python) *	Date: 2011-01-25 00:25
Issue 4953 has somewhat resolved this issue by using email only for parsing headers (more like 2.x did). So this issue could be closed, or could be left open to point out the required additional features needed from email before cgi.py can use it for handling body parts as well as headers.

History
Date	User	Action	Args
2022-04-11 14:57:11	admin	set	github: 55088
2020-07-21 04:43:59	methane	set	status: open -> closed resolution: fixed stage: resolved
2020-07-20 20:51:15	Rhodri James	set	nosy: - Rhodri James
2019-08-03 14:53:33	Rhodri James	set	nosy: + ethan.furman, Rhodri James
2011-01-25 00:25:15	v+python	set	nosy: v+python, r.david.murray messages: + msg126968
2011-01-10 20:17:18	v+python	set	nosy: v+python, r.david.murray messages: + msg125923
2011-01-10 13:41:07	r.david.murray	set	nosy: v+python, r.david.murray versions: - Python 3.1, Python 3.2
2011-01-10 13:40:55	r.david.murray	set	type: enhancement messages: + msg125902 nosy: v+python, r.david.murray
2011-01-10 09:45:45	v+python	set	nosy: v+python, r.david.murray messages: + msg125888
2011-01-10 08:34:34	v+python	create