This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: cgi.FieldStorage forgets to unquote field names when parsing multipart/form-data
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: mlk, r.david.murray, v+python
Priority: normal Keywords: patch

Created on 2011-02-21 10:18 by mlk, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
tcgi.py mlk, 2011-02-21 10:18 test for cgi.FieldStorage parsing
cgi-patch.patch mlk, 2011-02-21 10:31
Messages (9)
msg128950 - (view) Author: Sergey Schetinin (mlk) Date: 2011-02-21 10:18
Tested on Python 2.7, but probably affects all versions. Test case is attached.

The reason this went unnoticed until now is that browsers are very conservative when quoting field names, so most field names are the same after their quoting.

Related bug in WebOb: https://bitbucket.org/ianb/webob/issue/2
msg128951 - (view) Author: Sergey Schetinin (mlk) Date: 2011-02-21 10:31
And here's a patch.
msg129019 - (view) Author: Sergey Schetinin (mlk) Date: 2011-02-22 00:48
I've dug into the RFCs and tested various browsers.

RFC 2388 (the one defining multipart/form-data) says: 

Field names originally in non-ASCII character sets may be encoded
within the value of the "name" parameter using the standard method
described in RFC 2047.

RFC 2047 in turn defines the coding sometimes seen in email headers ("=?iso-8859-1?q?this is some text?=").

That means that this report is invalid. And I was misled by the bug that belongs to Google Chrome (which is the browser I was doing initial testing with).

I tested this with the following html form:

<form action="handle" method="POST" enctype="multipart/form-data">
<button name='"%22' type="submit" value="">Test</button>
</form>

Here are the headers submitted by various browsers:

IE 8: 
  Content-Disposition: form-data; name=""%22"
Firefox 4.0b11:
  Content-Disposition: form-data; name="\"%22"
Chrome 9:
  Content-Disposition: form-data; name="%22%22"

And the Chrome one is the one clearly invalid.

cgi still does no decoding of parameters as per RFC 2047, but browsers do not use that encoding for non-ASCII field names anyway (they just put the field names in UTF-8), so that might be unnecessary.

Please close this bug at your own judgement.
msg129022 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-02-22 01:21
Thanks for doing the research.  As far as I know we've never had a request to support RFC2047 in FieldStorage, so presumably no browsers actually generate it.
msg129398 - (view) Author: Glenn Linderman (v+python) * Date: 2011-02-25 18:36
Just some comments for the historical record:
During the discussion of issue 4953 research and testing revealed that browsers send back their cgi data using the same charset as the page that they are responding to.  So the only way that quoting would be necessary on field names would be if they were quoted funny, as in your example here.  It is somewhat unlikely that people would go to the trouble of coding field names that contain " and ' and % characters, just to mess themselves up (which ones do that, depend on which quote character is used for the name in the HTML and whether the enctype is "multipart/form-data" or URL encoding).

And Firefox 3.6... provides

name=""%22"

and that presently works with Python 3.2 CGI!  But that might mean that for Firefox 4.x, providing the "\"%22", CGI might pass through the "\"?  And really, the dequoting must be incorrectly coded for the Firefox 3.6 to "work".
msg129400 - (view) Author: Sergey Schetinin (mlk) Date: 2011-02-25 18:46
It does work (Python 2.7.1 here):

>>> import cgi
>>> cgi.parse_header('Content-Disposition: form-data; name=""%22"')
('Content-Disposition: form-data', {'name': '"%22'})
>>> cgi.parse_header('Content-Disposition: form-data; name="\\"%22"')
('Content-Disposition: form-data', {'name': '"%22'})

However as the unescaping is done sequential .replace, one can construct a header to make it unescape incorrectly:

>>> cgi.parse_header('Content-Disposition: form-data; name="\\\\"%22"')
('Content-Disposition: form-data', {'name': '"%22'})

Which should be:
('Content-Disposition: form-data', {'name': '\\"%22'})

That probably doesn't matter anyway.
msg129401 - (view) Author: Sergey Schetinin (mlk) Date: 2011-02-25 18:58
I wanted to add that the fact that browsers encode the field names in the page encoding does not change that they should escape the header according to RFC 2047.

The percent-encoding used in the field name has nothing to do with multipart/form-data or headers encoding or even html attribute value escaping. There's no reason for Chrome to percent-escape the quotation mark in the field name and my use of the percent sign in the field name is only to show that Chrome does not escape the percent sign itself and that there's no way to recover the data from the header sent by Chrome.

I imagine there could be a non-ASCII field name that, when encoded in some encoding, will produce something SQL-injection-like: '"; other="xx"'. That string would make the header parse into something completely different. With IE8 and FF 3.6 it looks like it would be very simple. The same applies to uploaded files names too, so it's not just a  matter of choosing sane field names.

That's all a browsers' problem though.
msg129415 - (view) Author: Glenn Linderman (v+python) * Date: 2011-02-25 20:09
Sergey says:
I wanted to add that the fact that browsers encode the field names in the page encoding does not change that they should escape the header according to RFC 2047.

I respond:
True, but RFC 2047 is _so_ weird, that it seems that browsers have a better solution.  RFC 2047 is needed for 7-bit mail servers, from which it seems to have been inherited by specs for HTTP (but I've never seen it used by a browser, have you?).  It would be nicer if HTTP had a header that allowed definition of the charset used for subsequent headers.  Right now, the code processing form data has to assume a particular encoding for headers & data, and somehow make sure that all the <form>s that use the same code have the same encoding.

Sergey says:
I imagine there could be a non-ASCII field name that, when encoded in some encoding, will produce something SQL-injection-like: '"; other="xx"'. That string would make the header parse into something completely different. With IE8 and FF 3.6 it looks like it would be very simple. The same applies to uploaded files names too, so it's not just a  matter of choosing sane field names.

That's all a browsers' problem though.

I respond:
Perhaps there is, although it depends on how the parser is written what injection techniques would work, and it also depends on having a followon parameter with dangerous semantics to incorrectly act on.

It isn't just a problem for the browsers, but for every CGI script that parses such parameters.
msg129428 - (view) Author: Sergey Schetinin (mlk) Date: 2011-02-25 21:06
I don't think that's a security issue, just something that would break with certain filenames. And I said that it's a browsers' problem in the sense that it can only be fixed / caught on their side.
History
Date User Action Args
2022-04-11 14:57:13adminsetgithub: 55478
2011-02-25 21:06:55mlksetnosy: v+python, r.david.murray, mlk
messages: + msg129428
2011-02-25 20:09:39v+pythonsetnosy: v+python, r.david.murray, mlk
messages: + msg129415
2011-02-25 18:58:41mlksetnosy: v+python, r.david.murray, mlk
messages: + msg129401
2011-02-25 18:46:57mlksetnosy: v+python, r.david.murray, mlk
messages: + msg129400
2011-02-25 18:36:25v+pythonsetnosy: + v+python
messages: + msg129398
2011-02-22 01:21:47r.david.murraysetstatus: open -> closed

messages: + msg129022
resolution: not a bug
stage: resolved
2011-02-22 00:48:26mlksetmessages: + msg129019
2011-02-21 21:54:52r.david.murraysetnosy: + r.david.murray
2011-02-21 10:31:24mlksetfiles: + cgi-patch.patch

messages: + msg128951
keywords: + patch
2011-02-21 10:18:23mlkcreate