classification
Title: cgi.FieldStorage can't parse multipart part headers with Content-Length and no filename in Content-Disposition
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.6, Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Peter Landry, X-Istence, berker.peksag, pradeep, python-dev, quentel, vstinner
Priority: normal Keywords: patch

Created on 2015-07-31 16:07 by Peter Landry, last changed 2016-11-18 05:21 by X-Istence. This issue is now closed.

Files
File name Uploaded Description Edit
cgi_multipart.patch Peter Landry, 2015-07-31 16:07 Test case and naive bugfix review
cgi_multipart.patch Peter Landry, 2015-08-07 15:12 Improved patch that removes the header when present review
Messages (14)
msg247751 - (view) Author: Peter Landry (Peter Landry) * Date: 2015-07-31 16:07
`cgi.FieldStorage` can't parse a multipart with a `Content-Length` header set on a part:

```Python 3.4.3 (default, May 22 2015, 15:35:46)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.49)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cgi
>>> from io import BytesIO
>>>
>>> BOUNDARY = "JfISa01"
>>> POSTDATA = """--JfISa01
... Content-Disposition: form-data; name="submit-name"
... Content-Length: 5
...
... Larry
... --JfISa01"""
>>> env = {
...     'REQUEST_METHOD': 'POST',
...     'CONTENT_TYPE': 'multipart/form-data; boundary={}'.format(BOUNDARY),
...     'CONTENT_LENGTH': str(len(POSTDATA))}
>>> fp = BytesIO(POSTDATA.encode('latin-1'))
>>> fs = cgi.FieldStorage(fp, environ=env, encoding="latin-1")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 571, in __init__
    self.read_multi(environ, keep_blank_values, strict_parsing)
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 726, in read_multi
    self.encoding, self.errors)
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 573, in __init__
    self.read_single()
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 736, in read_single
    self.read_binary()
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 758, in read_binary
    self.file.write(data)
TypeError: must be str, not bytes
>>>
```

This happens because of a mismatch between the code that creates a temp file to write to and the code that chooses to read in binary mode or not:

* the presence of `filename` in the `Content-Disposition` header triggers creation of a binary mode file
* the present of a `Content-Length` header for the part triggers a binary read

When `Content-Length` is present but `filename` is absent, `bytes` are written to the non-binary temp file, causing the error above.

I've reviewed the relevant RFCs, and I'm not really sure what the correct way to handle this is. I don't believe `Content-Length` is addressed for part bodies in the MIME spec[0], and HTTP has its own semantics[1].

At the very least, I think this behavior is confusing and unexpected. Some libraries, like Retrofit[2], will by default include `Content-Length`, and break when submitting POST data to a python server.

I've made an attempt to work in the way I'd expect, and attached a patch, but I'm really not sure if it's the proper decision. My patch kind of naively accepts the existing semantics of `Content-Length` that presume bytes, and treats the creation of a non-binary file as the "bug".

[0]: http://www.ietf.org/rfc/rfc2045.txt
[1]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4
[2]: http://square.github.io/retrofit/
msg247752 - (view) Author: Peter Landry (Peter Landry) * Date: 2015-07-31 16:08
I realized my formatting was poor, making it hard to quickly test the issue. Here's a cleaner version:

    import cgi
    from io import BytesIO

    BOUNDARY = "JfISa01"
    POSTDATA = """--JfISa01
    Content-Disposition: form-data; name="submit-name"
    Content-Length: 5

    Larry
    --JfISa01"""
    env = {
        'REQUEST_METHOD': 'POST',
        'CONTENT_TYPE': 'multipart/form-data; boundary={}'.format(BOUNDARY),
        'CONTENT_LENGTH': str(len(POSTDATA))}
    fp = BytesIO(POSTDATA.encode('latin-1'))
    fs = cgi.FieldStorage(fp, environ=env, encoding="latin-1")
msg247753 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-07-31 16:13
@Pierre Quentel: Hi! Are you still working on CGI? Can you please review this patch? Thanks.

--

Previous large change in the cgi module: issue #4953. Pierre helped a lot on this one.
msg247799 - (view) Author: Pierre Quentel (quentel) * Date: 2015-08-01 07:37
Yes, I will be able to review the patch next week

2015-07-31 18:13 GMT+02:00 STINNER Victor <report@bugs.python.org>:

>
> STINNER Victor added the comment:
>
> @Pierre Quentel: Hi! Are you still working on CGI? Can you please review
> this patch? Thanks.
>
> --
>
> Previous large change in the cgi module: issue #4953. Pierre helped a lot
> on this one.
>
> ----------
> nosy: +quentel
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue24764>
> _______________________________________
>
msg248185 - (view) Author: Pierre Quentel (quentel) * Date: 2015-08-07 10:47
I don't really see why there is a Content-Length in the headers of a
multipart form data. The specification at
http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.2 doesn't
mention it, and it is absent in the example that looks like the one tested
by Peter :

Content-Type: multipart/form-data; boundary=AaB03x
--AaB03x
Content-Disposition: form-data; name="submit-name"

Larry
--AaB03x
Content-Disposition: form-data; name="files"; filename="file1.txt"
Content-Type: text/plain
... contents of file1.txt ...
--AaB03x--

In case a user agent would insert it, I think the best would be to
ignore it. That is, inside read_multi(), after

            headers = parser.close()

add these lines :

            if 'content-length' in headers:
                del headers['content-length']

It's hard to see the potential side effects but I think it's cleaner
than the proposed patch, which is not correct anyway for another
reason : the attribute value is set to a bytes objects, instead of a
string.

Peter, does this make sense ? If so, can you submit another patch ?
msg248198 - (view) Author: Peter Landry (Peter Landry) * Date: 2015-08-07 15:06
Yeah, I think that makes the most sense to me as well. I tried to make a minimum-impact patch, but this feels cleaner.

If we remove the Content-Length header, the `limit` kwarg might occur at an odd place in the part itself, but that feels unavoidable if someone submits an incorrect Content-Length for the request itself.

I'll re-work the patch and make sure the tests I added still add value. Thanks!
msg248199 - (view) Author: Peter Landry (Peter Landry) * Date: 2015-08-07 15:12
A new patch that simply removes Content-Length from part headers when present.
msg248258 - (view) Author: Pierre Quentel (quentel) * Date: 2015-08-08 09:43
Victor, you can apply the patch and close the issue.
Le 7 août 2015 17:12, "Peter Landry" <report@bugs.python.org> a écrit :

>
> Peter Landry added the comment:
>
> A new patch that simply removes Content-Length from part headers when
> present.
>
> ----------
> Added file: http://bugs.python.org/file40145/cgi_multipart.patch
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue24764>
> _______________________________________
>
msg248306 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-08-08 22:56
Not today. I'm in holiday. Ping me in two weeks or find another core dev.
msg248720 - (view) Author: Pradeep (pradeep) Date: 2015-08-17 07:09
Hi Victor, 
I found similar problem at 
https://bugs.launchpad.net/barbican/+bug/1485452, 
is problem mentioned in the bug is same with mentioned bug?
msg248729 - (view) Author: Peter Landry (Peter Landry) * Date: 2015-08-17 14:20
Pradeep, that error seems to be in Barbican. This bug and patch only addresses content-length headers in MIME multipart messages.
msg248778 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-08-18 17:24
New changeset 11e9f34169d1 by Victor Stinner in branch '3.4':
cgi.FieldStorage.read_multi ignores Content-Length
https://hg.python.org/cpython/rev/11e9f34169d1

New changeset 5b9209e4c3e4 by Victor Stinner in branch '3.5':
(Merge 3.4) cgi.FieldStorage.read_multi ignores Content-Length
https://hg.python.org/cpython/rev/5b9209e4c3e4

New changeset 0ff1acc89cf0 by Victor Stinner in branch 'default':
(Merge 3.5) cgi.FieldStorage.read_multi ignores Content-Length
https://hg.python.org/cpython/rev/0ff1acc89cf0
msg248779 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-08-18 17:26
I applied the fix, thanks Peter for the report and the fix, thanks Pierre for the review.

> https://bugs.launchpad.net/barbican/+bug/1485452, 
> is problem mentioned in the bug is same with mentioned bug?

I don't know, but you can try to apply the patch locally if you want, or download the Python 3.4 using Mercurial.
msg281076 - (view) Author: Bert JW Regeer (X-Istence) * Date: 2016-11-18 05:21
I've uploaded a patchset to bug #27777 that fixes this issue by fixing make_file, and doesn't cause Python to throw out the content-length information.

It also fixes FieldStorage for when you provide it a non-multipart form submission and there is no list in FS.

Please see http://bugs.python.org/issue27777
History
Date User Action Args
2016-11-18 05:21:51X-Istencesetnosy: + X-Istence
messages: + msg281076
2016-06-13 20:03:47berker.peksaglinkissue27308 superseder
2015-08-18 17:45:31vstinnersetstatus: open -> closed
resolution: fixed
2015-08-18 17:26:58vstinnersetmessages: + msg248779
2015-08-18 17:24:39python-devsetnosy: + python-dev
messages: + msg248778
2015-08-17 14:20:35Peter Landrysetmessages: + msg248729
2015-08-17 07:09:04pradeepsetnosy: + pradeep
messages: + msg248720
2015-08-10 21:16:26berker.peksagsetnosy: + berker.peksag
stage: patch review
type: behavior

versions: - Python 3.2, Python 3.3
2015-08-08 22:56:01vstinnersetmessages: + msg248306
2015-08-08 09:43:05quentelsetmessages: + msg248258
2015-08-07 15:12:11Peter Landrysetfiles: + cgi_multipart.patch

messages: + msg248199
2015-08-07 15:06:48Peter Landrysetmessages: + msg248198
2015-08-07 10:48:00quentelsetmessages: + msg248185
2015-08-01 07:37:06quentelsetmessages: + msg247799
2015-07-31 16:13:48vstinnersetnosy: + quentel
messages: + msg247753
2015-07-31 16:08:47Peter Landrysetmessages: + msg247752
2015-07-31 16:07:21Peter Landrycreate