Issue 1112549: cgi.FieldStorage memory usage can spike in line-oriented ops

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/41503

classification

Title:	cgi.FieldStorage memory usage can spike in line-oriented ops
Type:		Stage:
Components:	Library (Lib)	Versions:	Python 2.3

process

Status:	closed	Resolution:	accepted
Dependencies:		Superseder:
Assigned To:	gvanrossum	Nosy List:	chrism, georg.brandl, gvanrossum, nnorwitz
Priority:	release blocker	Keywords:

Created on 2005-01-30 13:40 by chrism, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
FieldStorage-readline.patch	chrism, 2005-01-30 13:40	Fieldstorage readline patch.
FieldStorage-readline.patch	chrism, 2005-04-01 02:56	FieldStorage readline patch with proper boundary recognition (after Guido's comment)
test_cgi.py	chrism, 2005-04-03 03:42	Updated test_cgi.py
FieldStorage-readline-svn-50879.patch	chrism, 2006-07-27 21:34	FieldStorage readline patch against just-post-2.5-beta1 svn checkout
test_cgi-svn-50879.patch	chrism, 2006-07-27 21:35	Patch to test_cgi.py which exercises readline patch just-post-2.5b1
test_output_test_cgi-svn-50879.patch	chrism, 2006-08-07 17:51	Patch to tests/output/test_cgi to make regrtest.py pass when two below patches are applied.
cgi-module-fixes-bundle-svn-50879.patch	chrism, 2006-08-10 11:08

Messages (16)
msg24079 - (view)	Author: Chris McDonough (chrism)	Date: 2005-01-30 13:40
Various parts of cgi.FieldStorage call its "read_lines_to_outerboundary", "read_lines" and "skip_lines" methods. These methods use the "readline" method of the file object that represents an input stream. The input stream is typically data supplied by an untrusted source (such as a user uploading a file from a web browser). The input data is not required by the RFC 822/1521/1522/1867 specifications to contain any newline characters. For example, it is within the bounds of the specification to supply a a multipart/form-data input stream with a "file-data" part that consists of a 2GB string composed entirely of "x" characters (which happens to be something I did that led me to noticing this bug). The simplest fix is to make use of the "size" argument of the readline method of the file object where it is used within all parts of FieldStorage that make use of it. A patch against the Python 2.3.4 cgi.py module that does this is attached.
msg24080 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2005-03-31 22:13
Logged In: YES user_id=6380 Methinks that the fix isn't quite right: it would incorrectly recognize as a boundary a very long line starting with "--" followed by the appropriate random string at offset 2**16. This could probably be taken care of by adding a flag that is true initially and after that keeps track of whether the previous line ended in \n. Also, there's a call to fp.readline() in parse_multipart() that you didn't patch -- it wouldn't help because that code is saving the lines in a list anyway, but isn't that code vulnerable as well? Or is it not used?
msg24081 - (view)	Author: Chris McDonough (chrism)	Date: 2005-04-01 02:56
Logged In: YES user_id=32974 Re: parse_multipart.. yes, it looks like there's no use fixing that as it just turns around and puts the line into a list.. it is vulnerable but just by virtue of its non-use of a tempfile, it appears doomed anyway for large requests. I don't know of anything that uses it. Good catch wrt boundary recognition bug, I'm uploading another patch.
msg24082 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2005-04-01 04:48
Logged In: YES user_id=6380 Can I tweak you into uploading a unit test?
msg24083 - (view)	Author: Chris McDonough (chrism)	Date: 2005-04-03 03:42
Logged In: YES user_id=32974 An updated test_cgi.py is attached. I test both the readline behavior and add a test for basic multipart parsing.
msg24084 - (view)	Author: Chris McDonough (chrism)	Date: 2005-04-03 04:00
Logged In: YES user_id=32974 FYI, I'd be happy to do the merging here if you wanted to give me checkin access.
msg24085 - (view)	Author: Chris McDonough (chrism)	Date: 2006-07-27 21:42
Logged In: YES user_id=32974 The files I've just uploaded are revisions to the cgi and test_cgi modules for the current state of the SVN trunk. If someone could apply these, it would be appreciated, or give me access and I'll be happy to. FTR, this is a bug which exposes systems which use the cgi.FieldStorage class (most Python web frameworks do) to a denial of service potential.
msg24086 - (view)	Author: Neal Norwitz (nnorwitz) *	Date: 2006-08-07 05:50
Logged In: YES user_id=33168 Doesn't this require a change to test/output/test_cgi or something like that? Does this test pass when run under regrtest.py ? The verify(x == y) should be vereq(x, y). type(a) != type(0), should be not isinstance(a, int). The last chunk of the patch in cgi.py should be: last_line_lfend = line.endswith('\n') rather than the 4 lines of if/else. I don't know if this patch really addresses the problem or creates other problems. However, if someone more knowledgable is confident about this patch, I'm fine with this going in. At this point, it might be better to wait for 2.5.1 though.
msg24087 - (view)	Author: Chris McDonough (chrism)	Date: 2006-08-07 17:51
Logged In: YES user_id=32974 Yup, test/output/test_cgi did need fixing. Apologies, I did not understand the test regime. A new patch file named test_output_test_cgi- svn-50879.patch has been uploaded with the required change. regrtest.py now passes. As far as verify vs. vereq, the test_cgi module uses verify all over the place. I'm apt to not change all of those places, so to change it in just that one place is probably ineffective. Same for type comparison vs. isinstance. I'm trying to make the smallest change possible as opposed to refactoring the test module. I've uploaded a patch which contains a) the fix to cgi.py, b) the fix to test_cgi.py, and the fix to output/test_cgi. The stylistic change wrt to last_line_lfend is fine with me, but I'll leave that jugment to someone else. I'm not sure how to ensure the fix doesn't create other problems other than what has already been done by proving it still passes the tests it was subjected to in the "old" test suite and adds new tests that prove it no longer has the denial of service problem.
msg24088 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2006-08-09 21:58
Logged In: YES user_id=6380 +1. minor nits: in the main patch: instead of + if line.endswith('\n'): + last_line_lfend = True + else: + last_line_lfend = False you can just use last_line_lfend = line.endswith('\n') in the unit test: instead of if type(a) != type(0): use if not isinstance(a, int): so that if some future release changes file.closed to return a bool (as it should :-) this test won't break. Is tehre a reason why you're not patching the fp.readline() call in parse_multipart()? It would seem to have the same issue (even if it isn't used in Zope :-).
msg24089 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2006-08-09 22:23
Logged In: YES user_id=6380 BTW it would be better if all patches were in a single file -- then you can delete the older patches (if SF lets you do that).
msg24090 - (view)	Author: Chris McDonough (chrism)	Date: 2006-08-10 11:08
Logged In: YES user_id=32974 wrt parse_multipart: this function just turns around and puts the output from readline() into a list, as opposed to FieldStorage, which writes its chunked output to a tempfile, so just adding an argument to readline within it would not be useful. We'd only be able to fix the memory consumption issue in parse_multipart if we were to make it also use a tempfile, but it's not clear that not writing a tempfile isn't its expected behavior as the docs for parse_multipart state: Returns a dictionary just like parse_qs() keys are the field names, each value is a list of values for that field. This is easy to use but not much good if you are expecting megabytes to be uploaded -- in that case, use the FieldStorage class instead which is much more flexible. Is it OK to write a tempfile in this function (e.g. does that make it not useful on stuff like embedded systems)? If not, maybe we should just deprecate parse_multipart? I do find things that use it if I google hard enough but there are only 187 hits when I google for "cgi.parse_multipart" and 53,600 hits when I google for "cgi.FieldStorage". I'm uploading another file with your style change suggestions. It bundles all fixes to cgi.py, test_cgi.py, and test_cgi and includes the style changes, but does nothing about parse_multipart.
msg24091 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2006-08-10 14:24
Logged In: YES user_id=6380 OK, let's forget about parse_multipart(). Feel free to add a doc patch and a comment that deprecate it (actual warnings are not a good idea IMO).
msg24092 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2006-08-10 17:44
Logged In: YES user_id=6380 Checked in as r51190 (Chris's patch plus a warning added to an XXX comment for parse_multipart()) and r51191 (Misc/NEWS).
msg24093 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2006-08-10 17:46
Logged In: YES user_id=849994 Is this a backportable fix?
msg24094 - (view)	Author: Neal Norwitz (nnorwitz) *	Date: 2006-08-10 18:12
Logged In: YES user_id=33168 Yes, I believe so.

History
Date	User	Action	Args
2022-04-11 14:56:09	admin	set	github: 41503
2005-01-30 13:40:48	chrism	create