This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Peter Landry
Recipients Peter Landry, vstinner
Date 2015-07-31.16:07:20
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1438358841.56.0.603079767616.issue24764@psf.upfronthosting.co.za>
In-reply-to
Content
`cgi.FieldStorage` can't parse a multipart with a `Content-Length` header set on a part:

```Python 3.4.3 (default, May 22 2015, 15:35:46)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.49)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cgi
>>> from io import BytesIO
>>>
>>> BOUNDARY = "JfISa01"
>>> POSTDATA = """--JfISa01
... Content-Disposition: form-data; name="submit-name"
... Content-Length: 5
...
... Larry
... --JfISa01"""
>>> env = {
...     'REQUEST_METHOD': 'POST',
...     'CONTENT_TYPE': 'multipart/form-data; boundary={}'.format(BOUNDARY),
...     'CONTENT_LENGTH': str(len(POSTDATA))}
>>> fp = BytesIO(POSTDATA.encode('latin-1'))
>>> fs = cgi.FieldStorage(fp, environ=env, encoding="latin-1")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 571, in __init__
    self.read_multi(environ, keep_blank_values, strict_parsing)
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 726, in read_multi
    self.encoding, self.errors)
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 573, in __init__
    self.read_single()
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 736, in read_single
    self.read_binary()
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 758, in read_binary
    self.file.write(data)
TypeError: must be str, not bytes
>>>
```

This happens because of a mismatch between the code that creates a temp file to write to and the code that chooses to read in binary mode or not:

* the presence of `filename` in the `Content-Disposition` header triggers creation of a binary mode file
* the present of a `Content-Length` header for the part triggers a binary read

When `Content-Length` is present but `filename` is absent, `bytes` are written to the non-binary temp file, causing the error above.

I've reviewed the relevant RFCs, and I'm not really sure what the correct way to handle this is. I don't believe `Content-Length` is addressed for part bodies in the MIME spec[0], and HTTP has its own semantics[1].

At the very least, I think this behavior is confusing and unexpected. Some libraries, like Retrofit[2], will by default include `Content-Length`, and break when submitting POST data to a python server.

I've made an attempt to work in the way I'd expect, and attached a patch, but I'm really not sure if it's the proper decision. My patch kind of naively accepts the existing semantics of `Content-Length` that presume bytes, and treats the creation of a non-binary file as the "bug".

[0]: http://www.ietf.org/rfc/rfc2045.txt
[1]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4
[2]: http://square.github.io/retrofit/
History
Date User Action Args
2015-07-31 16:07:21Peter Landrysetrecipients: + Peter Landry, vstinner
2015-07-31 16:07:21Peter Landrysetmessageid: <1438358841.56.0.603079767616.issue24764@psf.upfronthosting.co.za>
2015-07-31 16:07:21Peter Landrylinkissue24764 messages
2015-07-31 16:07:21Peter Landrycreate