Message247751
`cgi.FieldStorage` can't parse a multipart with a `Content-Length` header set on a part:
```Python 3.4.3 (default, May 22 2015, 15:35:46)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.49)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cgi
>>> from io import BytesIO
>>>
>>> BOUNDARY = "JfISa01"
>>> POSTDATA = """--JfISa01
... Content-Disposition: form-data; name="submit-name"
... Content-Length: 5
...
... Larry
... --JfISa01"""
>>> env = {
... 'REQUEST_METHOD': 'POST',
... 'CONTENT_TYPE': 'multipart/form-data; boundary={}'.format(BOUNDARY),
... 'CONTENT_LENGTH': str(len(POSTDATA))}
>>> fp = BytesIO(POSTDATA.encode('latin-1'))
>>> fs = cgi.FieldStorage(fp, environ=env, encoding="latin-1")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 571, in __init__
self.read_multi(environ, keep_blank_values, strict_parsing)
File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 726, in read_multi
self.encoding, self.errors)
File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 573, in __init__
self.read_single()
File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 736, in read_single
self.read_binary()
File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 758, in read_binary
self.file.write(data)
TypeError: must be str, not bytes
>>>
```
This happens because of a mismatch between the code that creates a temp file to write to and the code that chooses to read in binary mode or not:
* the presence of `filename` in the `Content-Disposition` header triggers creation of a binary mode file
* the present of a `Content-Length` header for the part triggers a binary read
When `Content-Length` is present but `filename` is absent, `bytes` are written to the non-binary temp file, causing the error above.
I've reviewed the relevant RFCs, and I'm not really sure what the correct way to handle this is. I don't believe `Content-Length` is addressed for part bodies in the MIME spec[0], and HTTP has its own semantics[1].
At the very least, I think this behavior is confusing and unexpected. Some libraries, like Retrofit[2], will by default include `Content-Length`, and break when submitting POST data to a python server.
I've made an attempt to work in the way I'd expect, and attached a patch, but I'm really not sure if it's the proper decision. My patch kind of naively accepts the existing semantics of `Content-Length` that presume bytes, and treats the creation of a non-binary file as the "bug".
[0]: http://www.ietf.org/rfc/rfc2045.txt
[1]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4
[2]: http://square.github.io/retrofit/ |
|
Date |
User |
Action |
Args |
2015-07-31 16:07:21 | Peter Landry | set | recipients:
+ Peter Landry, vstinner |
2015-07-31 16:07:21 | Peter Landry | set | messageid: <1438358841.56.0.603079767616.issue24764@psf.upfronthosting.co.za> |
2015-07-31 16:07:21 | Peter Landry | link | issue24764 messages |
2015-07-31 16:07:21 | Peter Landry | create | |
|