classification
Title: cgi.FieldStorage fails to handle multipart/form-data when \r\n appears at end of 65535 bytes without other newlines
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Ben.Hearsum, catlee, flox, orsenthil, python-dev, rail, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2013-06-08 00:46 by catlee, last changed 2013-06-17 13:39 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
cgi-test-cpython.patch catlee, 2013-06-08 00:46 test for handling of 65,535 byte files without newlines
cgi-cpython.patch catlee, 2013-06-08 00:52 proposed fix
test_cgi_server.py catlee, 2013-06-10 15:23 simple http server that demonstrates failure
issue18167-2.7.patch serhiy.storchaka, 2013-06-12 10:35 Patch for 2.7 review
issue18167-3.3.patch serhiy.storchaka, 2013-06-12 10:36 Patch for 3.3 review
Messages (7)
msg190784 - (view) Author: Chris AtLee (catlee) Date: 2013-06-08 00:46
cgi.FieldStorage uses fp.readline(1 << 16) to read in POSTed file data if no content length has been specified. All HTTP clients I've looked at terminate the file body with CRLF and then the final MIME boundary. If the file body is 65,535 bytes, and doesn't contain \n or \r\n, then fp.readline(1 << 16) will return the original 65,535 bytes of the file plus the \r from the final \r\n sequence before the final boundary string. Since \r isn't considered a line ending, it gets considered as part of the POSTed file data, and you end up with an extra \r at the end of the file data.
msg190785 - (view) Author: Chris AtLee (catlee) Date: 2013-06-08 00:52
This is a possible fix to this issue. It's not as clean as I'd like, but the simpler versions I tried could end up with the entire file contents in memory for degenerate (or malicious) inputs.

The trick is handling the case where the current line ends with \r. We can't know if this is just a normal character in the file, or represents the end of a line until we see the start of the next line.
msg190914 - (view) Author: Chris AtLee (catlee) Date: 2013-06-10 15:23
To demonstrate how to hit this in a real use case, run the attached script which implements a simple http server that saves POSTed files to a local file "got_data". It returns the sha1sum of the POSTed file as the http response.

Then, create a test file consisting of 65,535 null bytes, and submit with curl.

-> % dd if=/dev/zero of=data bs=1 count=65535                                                                                           
65535+0 records in
65535+0 records out
65535 bytes (66 kB) copied, 0.0890487 s, 736 kB/s

-> % sha1sum data
391edab7225a1de662ebc3a1a670a20d8e6a226b  data

-> % curl -Fdata=@data http://localhost:8080/
8dd623ef130a8cd3e97086101a6e1255a91fb916%
msg191019 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-06-12 10:35
Thank you for your report, but your patch looks overcomplicated, it fails on 'x'*65535+'\r'+'y'*65535 and hangs on 'x'*65535+'\r'.

Here is a simpler patch with tests.
msg191208 - (view) Author: Chris AtLee (catlee) Date: 2013-06-15 14:09
Thanks, your patch is definitely much simpler!

I was worried about the case where you have interrupted \r\n that appears in the middle of the content. But that case is handled by the next readline(), which returns a single \n.

One question about the tests you've attached - would it be better to be explicit about the line endings in check()? Do triple quoted strings in python always use \n for EOL regardless of the source code EOL format?
msg191212 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-06-15 14:56
> Do triple quoted strings in python always use \n for EOL regardless of the source code EOL format?

Python parser always interprets EOL as \n in string literals.
msg191333 - (view) Author: Roundup Robot (python-dev) Date: 2013-06-17 13:38
New changeset 63058453a4cc by Serhiy Storchaka in branch '2.7':
Issue #18167: cgi.FieldStorage no more fails to handle multipart/form-data
http://hg.python.org/cpython/rev/63058453a4cc

New changeset a48f65bac986 by Serhiy Storchaka in branch '3.3':
Issue #18167: cgi.FieldStorage no more fails to handle multipart/form-data
http://hg.python.org/cpython/rev/a48f65bac986

New changeset 17ec73a3a854 by Serhiy Storchaka in branch 'default':
Issue #18167: cgi.FieldStorage no more fails to handle multipart/form-data
http://hg.python.org/cpython/rev/17ec73a3a854
History
Date User Action Args
2013-06-17 13:39:17serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2013-06-17 13:38:26python-devsetnosy: + python-dev
messages: + msg191333
2013-06-17 04:26:21railsetnosy: + rail
2013-06-15 14:56:21serhiy.storchakasetmessages: + msg191212
2013-06-15 14:09:26catleesetmessages: + msg191208
2013-06-12 10:36:23serhiy.storchakasetfiles: + issue18167-3.3.patch
versions: + Python 3.3, Python 3.4
2013-06-12 10:35:18serhiy.storchakasetfiles: + issue18167-2.7.patch

nosy: + serhiy.storchaka
messages: + msg191019

assignee: serhiy.storchaka
2013-06-12 06:33:58serhiy.storchakasetstage: patch review
2013-06-12 03:25:37Ben.Hearsumsetnosy: + Ben.Hearsum
2013-06-10 15:23:49catleesetfiles: + test_cgi_server.py

messages: + msg190914
2013-06-09 08:29:35floxsetnosy: + flox
2013-06-08 03:41:22orsenthilsetnosy: + orsenthil
2013-06-08 00:52:33catleesetfiles: + cgi-cpython.patch

messages: + msg190785
2013-06-08 00:46:59catleecreate