cgi module cannot handle POST with multipart/form-data in 3.x #49203

oopos · 2009-01-15T09:02:46Z

BPO	4953
Nosy	@warsaw, @birkenfeld, @amauryfa, @vstinner, @merwok, @bitdancer, @florentx, @PierreQuentel
Files	full_source_and_error.zip: full source and error show opsuper.pl: Perl code for process post files from form unittest.zip: cgi.FieldStorage unittest for Multipart form-data and associated files. http.zip: Module cgi_new.py and tests cgitest-python3.py cgi_tests.zip: cgi_test.py and associated files adder.html: uses get, works adderpost.html: just hanged get to post, makes adder.cgi hang localCGIServer.py: my wrapper around http.server to handle localhost adder.cgi: action script used by adder.html. adderpost.html, hangs with adderpost.html

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2011-01-14.13:12:27.362>
created_at = <Date 2009-01-15.09:02:45.958>
labels = ['type-bug', 'library', 'expert-unicode']
title = 'cgi module cannot handle POST with multipart/form-data in 3.x'
updated_at = <Date 2011-01-15.01:19:29.678>
user = 'https://bugs.python.org/oopos'

bugs.python.org fields:

activity = <Date 2011-01-15.01:19:29.678>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2011-01-14.13:12:27.362>
closer = 'vstinner'
components = ['Library (Lib)', 'Unicode']
creation = <Date 2009-01-15.09:02:45.958>
creator = 'oopos'
dependencies = []
files = ['12753', '12768', '14235', '20217', '20270', '20385', '20393', '20394', '20395', '20396']
hgrepos = []
issue_num = 4953
keywords = ['patch']
message_count = 130.0
messages = ['79892', '79893', '79894', '79896', '79898', '79901', '79939', '79981', '80000', '88960', '88962', '89112', '91444', '91449', '91711', '91741', '95288', '95292', '95330', '107959', '108221', '121864', '125035', '125065', '125086', '125088', '125100', '125105', '125106', '125108', '125114', '125152', '125153', '125158', '125159', '125178', '125181', '125201', '125217', '125235', '125237', '125241', '125402', '125403', '125410', '125419', '125426', '125428', '125474', '125481', '125483', '125501', '125524', '125533', '125543', '125546', '125556', '125557', '125558', '125563', '125570', '125583', '125629', '125633', '125637', '125698', '125839', '125840', '125885', '125886', '125892', '125893', '125900', '125901', '125921', '125926', '125928', '125930', '125931', '125935', '125952', '125992', '125993', '125994', '126035', '126060', '126062', '126065', '126066', '126075', '126117', '126124', '126140', '126145', '126152', '126160', '126161', '126162', '126164', '126165', '126167', '126173', '126175', '126199', '126205', '126207', '126210', '126212', '126214', '126215', '126219', '126222', '126232', '126233', '126242', '126249', '126251', '126253', '126256', '126257', '126262', '126264', '126266', '126267', '126268', '126269', '126289', '126293', '126301', '126307']
nosy_count = 17.0
nosy_names = ['barry', 'georg.brandl', 'amaury.forgeotdarc', 'ggenellina', 'vstinner', 'andyharrington', 'eric.araujo', 'grahamd', 'v+python', 'r.david.murray', 'oopos', 'tcourbon', 'tobias', 'flox', 'pebbe', 'quentel', 'erob']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'needs patch'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue4953'
versions = ['Python 3.2']

oopos · 2009-01-15T09:02:40Z

Python 3.0 (r30:67507, Dec 3 2008, 20:14:27) [MSC v.1500 32 bit
(Intel)] on win32
---------------------------------------------
Hi,I user Python!

UnicodeDecodeError Python 3.0: G:\cgi\python\python.exe
Thu Jan 15 16:46:34 2009

A problem occurred in a Python script. Here is the sequence of function
calls leading up to the error, in the order they occurred.
G:\webserver\xampp\cgi-bin\testupload.py in ()
107
108 # get form
109 opsform = cgi.FieldStorage()
110
111 print ("<br>","form-data:","<br>",opsform,"<br>")
opsform undefined, cgi = <module 'cgi' from 'G:\cgi\python\lib\cgi.py'>,
cgi.FieldStorage = <class 'cgi.FieldStorage'>
G:\cgi\python\lib\cgi.py in __init__(self=FieldStorage(None, None, []),
fp=None, headers={'content-length': '671631', 'content-type':
'multipart/form-data;
boundary=---------------------------9699301019407'}, outerboundary='',
environ=<os._Environ object at 0x00C90C50>, keep_blank_values=0,
strict_parsing=0)
477 self.read_urlencoded()
478 elif ctype[:10] == 'multipart/':
479 self.read_multi(environ, keep_blank_values,
strict_parsing)
480 else:
481 self.read_single()
self = FieldStorage(None, None, []), self.read_multi = <bound method
FieldStorage.read_multi of FieldStorage(None, None, [])>, environ =
<os._Environ object at 0x00C90C50>, keep_blank_values = 0,
strict_parsing = 0
G:\cgi\python\lib\cgi.py in read_multi(self=FieldStorage(None, None,
[]), environ=<os._Environ object at 0x00C90C50>, keep_blank_values=0,
strict_parsing=0)
597 # Create bogus content-type header for proper multipart
parsing
598 parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' %
(self.type, ib))
599 parser.feed(self.fp.read())
600 full_msg = parser.close()
601 # Get subparts
parser = <email.feedparser.FeedParser object at 0x00DD5650>, parser.feed
= <bound method FeedParser.feed of <email.feedparser.FeedParser object
at 0x00DD5650>>, self = FieldStorage(None, None, []), self.fp =
<io.TextIOWrapper object at 0x00BE3FB0>, self.fp.read = <bound method
TextIOWrapper.read of <io.TextIOWrapper object at 0x00BE3FB0>>
G:\cgi\python\lib\io.py in read(self=<io.TextIOWrapper object at
0x00BE3FB0>, n=-1)
1722 # Read everything.
1723 result = (self._get_decoded_chars() +
1724 decoder.decode(self.buffer.read(), final=True))
1725 self._set_decoded_chars('')
1726 self._snapshot = None
decoder = <encodings.gbk.IncrementalDecoder object at 0x00DB7AB0>,
decoder.decode = <built-in method decode of IncrementalDecoder object at
0x00DB7AB0>, self = <io.TextIOWrapper object at 0x00BE3FB0>, self.buffer
= <io.BufferedReader object at 0x00BE3F90>, self.buffer.read = <bound
method BufferedReader.read of <io.BufferedReader object at 0x00BE3F90>>,
final undefined

UnicodeDecodeError: 'gbk' codec can't decode bytes in position 157-158:
illegal multibyte sequence
args = ('gbk',
b'-----------------------------9699301019407\r\n...-----------------------------9699301019407--\r\n',
157, 159, 'illegal multibyte sequence')
encoding = 'gbk'
end = 159
object =
b'-----------------------------9699301019407\r\n...-----------------------------9699301019407--\r\n'
reason = 'illegal multibyte sequence'
start = 157
with_traceback = <built-in method with_traceback of
UnicodeDecodeError object at 0x00DB7BF0>

l:\tmp_dir\tmpxyeojf.html contains the description of this error.

-------------------------------------------
Hi,
I am newbie for python under the windows.
I find that the cgi module always work wrong for the binary files to upload.
I find that it cannot auto to discern the files' mode and alway use the
default mode : 'TEXT'.
So I want to change the sys.stdin 's mode to BINARY to support the
binary files.
I got this way:
import msvcrt,os
msvcrt.setmode(0,os.O_BINARY) # for stdin
msvcrt.setmode(1,os.O_BINARY) # for stdout
but it isn't work,too.
I know use C progam language can use this function:
freopen("somefilename","mode","stdin or stdout") to redirect the file
flow.
Can every one help me ?

Best Regards
oopos

I

amauryfa · 2009-01-15T10:59:25Z

Does it work if you change your script like this:
opsform = cgi.FieldStorage(open(sys.stdin.fileno(), 'rb'))

oopos · 2009-01-15T11:58:30Z

To Amaury Forgeot d'Arc :

Thank you.

That error have sloved with your way:
[quote]Does it work if you change your script like this:
opsform = cgi.FieldStorage(open(sys.stdin.fileno(), 'rb'))[/quote]

Now,The new problem come out:
[code] 97 """Push some new data into this object."""

98 # Handle any previous leftovers

99 data, self._partial = self._partial + data, ''

100 # Crack into lines, but preserve the newlines on the end
of each

101 parts = NLCRE_crack.split(data)

data = b'-----------------------------7d91f41a302f4
\nCo...\x0e\x0f\x0c\x10\x17\x14\x18\x18\x17\x14\x16\x16', self =
<email.feedparser.BufferedSubFile object at 0x00DD5270>, self._partial
= ''

TypeError: Can't convert 'bytes' object to str implicitly
[/code]

I find that the CGI LIB didn't use bytes flow, it always use string
flow.

More info in the attch file:

amauryfa · 2009-01-15T12:43:51Z

OK, another try. Please replace the previous version with these 3 lines:

  encoding = os.environ.get('HTTP_TRANSFER_ENCODING')
  stdin = open(sys.stdin.fileno(), 'r', encoding=encoding)
  opsform = cgi.FieldStorage(stdin)

oopos · 2009-01-15T14:34:45Z

Thank you for time.
Now,I try with you saied.Bu it is taken wrong as before.

See the files:

amauryfa · 2009-01-15T16:45:23Z

Thanks for the test case. I reproduced it easily.
There is indeed a real problem in CGI streams.

The first thing to do is to start python with the -u option (add it to
the end of the first #! line), so that stdin yields bytes instead of
unicode chars, and \r\n are not translated on Windows.

Even then, I noticed that in the multipart/form-data section, text
fields are utf-8 encoded, but the file content is raw binary.
(FWIW, I use Firefox and Apache on Windows)
No encoding seems to be specified, neither in the content, nor in the
environment (no HTTP_TRANSFER_ENCODING)

And of course, the email.parser.FeedParser object used to parse it
accepts only unicode, not bytes.
Help needed.

ggenellina · 2009-01-16T08:13:34Z

An attempt to more accurately describe the issue, to attract more
knowledgeable people, I hope...

oopos · 2009-01-16T22:08:58Z

Hehe.
I only want to use python to slove the upload files instead of PHP.but
it is hard to me or I have no much time to leran it.
Now,I learn Perl quickly and use it upload files, it works ok.

Thank you . I will take more time to learn Python language well.
Best Regards!

This is my code: (Perl Language)

ggenellina · 2009-01-17T03:40:52Z

You should stick to Python 2.6 (or even 2.5) for web programming - 3.0
is not mature enough. I thought this was a feasibility study on porting
an existing application to Python 3.0 -- not your first steps in the
language.

tercero12 · 2009-06-05T18:29:41Z

I'm working on a web framework for Python 3. Naturally this is a
blocker for me. I was kinda expecting this to be addressed in 3.1 but
now that rc1 is out and I don't see anything about it, I'm wondering
about the status of this bug. Can we get a status update?

bitdancer · 2009-06-05T18:46:19Z

Can you provide a test case that clearly demonstrates the problem
(preferably a unit test, but anything easily reproducible will do)? I'm
not sure what to do with the code attached to the case.

tercero12 · 2009-06-08T20:39:29Z

I've attached unittest.zip. Simply unzip it to a directory and run it.
I've included a Python2.x version of the unittest for the sake of
clarity. The 2.x version was developed on 2.4. The 3.x version was
developed on 3.0.1 and 3.1rc1 (with identical results).

It seems that there are several issues with cgi.FieldStorage and
multi-part form data.

Does Formstation read in a Bytes or String?
-- It seems to expect a String but this yields invalid results for
uploading files.
-- A stream of Bytes would make more sense but loses it Pythonic
"Batteries included" nature if the user has to decode the encoding
manually for each form field.

tobias · 2009-08-10T13:50:35Z

Actually, I think this whole issue is more complex. For example,
consider a (fictious) CGI script where users can upload an image and a
description and the script sends a success/error message in return.

In this case, one has to:

read the HTTP request header from stdin as US-ASCII
read the image from stdin as raw binary data
read the description from stdin as a string in some encoding
write the HTTP response header to stdout as US-ASCII
write the response message to stdout in some (other) encoding
write error messages to server log via stderr as US-ASCII
Also, there are cases when a cgi script should return binary data
instead (e.g., images or archive files) or apply a transfer encoding
(e.g., gzip).

Although FieldStorage only cares about reading, it still has to cope
with intermixed textual and binary data. So the only practical way in my
opinion is to use raw binary data and have FieldStorage decode strings
on demand, since only the programmer knows whether a field should
contain text or binary data. FieldStorage should offer two methods for
this purpose: one for reading binary data and another for reading and
decoding strings on-the-fly (possibly using a default encoding passed to
its constructor).

tercero12 · 2009-08-10T14:42:24Z

I think you hit the nail on the head. Now we just need (someone) to
code it.

tercero12 · 2009-08-18T18:58:20Z

I thought I'd take a crack at this today. I soon figured out the real
issue. It is the email.parser module that handles the decoding of
Multipart/form-data things...and it is also still quote broken w.r.t.
handling Bytes. So this issue is dependent on
http://bugs.python.org/issue4661 before it can be fixed.

warsaw · 2009-08-19T20:19:25Z

Please, please, please contact the email-sig and help pitch in. For
many reasons I simply haven't had the cycles to work on this and I don't
see that happening any time soon. There are folks willing to work on
the package in the email-sig and I will add my $0.02 with design
suggestions, but we really need manpower and motivation to get the email
package into shape.

tcourbon · 2009-11-15T13:09:03Z

*bump*

Hi there Pythoners !

As Timothy Farrell I'm currently working (or rather, toying since it
just for fun) on a Python3 web framework. I just started but when it
come to file upload I experience issues which I believe are connected to
that issue.

Just want if their was any progress since last messages or I should jump
in and try writing my own multipart parser (as I doubt I have the
required skill to contribute to the email parser).

tercero12 · 2009-11-15T14:18:11Z

Perhaps this update should go in the linked email bug. The email team
has a goal of fixing the email module in time for the 3.2 release. I
also, feel as though I lack the skill to fix the email module, but it
goes beyond that since they're potentially having to change some
interfaces. See this document:
http://wiki.python.org/moin/Email%20SIG/DesignThoughts

Once the email module is fixed, the cgi module will be trivial to fix.
I'm confident enough to handle it. I don't think anyone can give you a
date so you'll have to make the custom solution decision based on your
timeframe and patience.

tcourbon · 2009-11-16T08:47:31Z

It seems that there wasn't work on that issue (which look complicated by
the way). I'll wait, there is so much other aspects of a web framework
to play with :)
Thank anyway for the pointer.

gvanrossum · 2010-06-16T22:24:21Z

Could this be related to bpo-8077?

tercero12 · 2010-06-20T01:35:08Z

Yes, they are related but not quite the same. The solution to this problem will likely include a fix to the problem described in bpo-8077.

vpython · 2010-11-21T05:24:38Z

Regarding http://bugs.python.org/issue4953#msg91444 POST with multipart/form-data encoding can use UTF-8, other stuff is restricted to ASCII!

From http://www.w3.org/TR/html401/interact/forms.html:
Note. The "get" method restricts form data set values to ASCII characters. Only the "post" method (with enctype="multipart/form-data") is specified to cover the entire [ISO10646] character set.

Hence cgi formdata can safely decode text fields using UTF-8 decoding (experimentally, that is the encoding used by Firefox to support the entire ISO10646 character set).

PierreQuentel · 2011-01-02T08:51:37Z

Hi,

I have started working on the port of a simplified version of Karrigell (a web framework) to Python3. I experienced the same problem as the other posters : in the current version, file upload doesn't work. So I've been working on the cgi module for a few days and now have a version which correctly manages file uploads in the tests I made

The problem in the current version (3.2b2) is that all data is read from sys.stdin, which reads strings, not bytes. This obviously can't work properly to upload binary files. In the proposed version, for multipart/form-data type, all data is read as bytes from sys.stdin.buffer ; in the CGI script, the Python interpreter must be launched with the -u option, as suggested by Amaury, otherwise sys.stdin.buffer.read() only returns the beginning of the data stream

The headers inside the multipart/form-data are decoded to a string using sys.stdin.encoding and passed to a FeedParser (which requires strings) ; then the data is read from sys.stdin.buffer (bytes) until a boundary is found

If the field is a file, the file object in self.file stores bytes, and the attribute "value" is a byte string. If it is not a file, the value is decoded to a string, always using sys.stdin.encoding, as for all other fields for other types of forms

Other cosmetic changes :

replaced "while 1" by "while True"
replaced "if type(value) == type([])" by "if isintance(value,list)"

Attached file : zip with cgi_new.py and tests in a folder called "http"
Tested with Python 3.2b2 (r32b2:87398, Dec 19 2010, 22:51:00) [MSC v.1500 32 bit (Intel)] on win32 ; files and CGI scripts served by Apache 2.2

bitdancer · 2011-01-02T16:19:48Z

Thank you very much for working on this! I'll try to take a look at the patch soon. A couple quick comments based on your posting: first, the email module now has a BytesFeedparser that will accept a byte stream, which I hope might simplify your patch. Second, it would be very helpful if you could upload your patch as an 'svn diff' against the current py3k trunk (see python.org/dev for details on how to do that). That will make review and application of the patch much much simpler. (This would be true even if more of the code in cgi.py has changed than not.) If you don't want to set up an svn checkout, then a context diff against the copy of cgi.py you started with would be second best. Please post any files individually as .patch or .diff or .txt files...these are preferred in the tracker over .zip files because they can be viewed without downloading.

PierreQuentel · 2011-01-13T22:09:05Z

Comment ça, no up to date patch ? cgi_32.patch is up to date, the API changes are documented, the unittests work, what else do you want ?

pitrou · 2011-01-13T22:14:56Z

Comment ça, no up to date patch ? cgi_32.patch is up to date, the API
changes are documented, the unittests work, what else do you want ?

The O_BINARY stuff looks obsolete to me.

vpython · 2011-01-13T22:27:14Z

The O_BINARY stuff was probably necessary because bpo-10841 is not yet in the build Pierre was using? I agree it in not necessary with the fix for that issue, but neither does it hurt.

It could be stripped out, if you think that is best, Antoine.

But there is a working patch.

merwok · 2011-01-13T22:39:57Z

Can one person please

Sum up the discussion and its outcome briefly
Remove all patches and replace them with one diff with docs, tests and code (even if you have new files, you don’t have to put them in a zip, use svn add and they will show up in the svn diff, which is really easier to review)

erob · 2011-01-13T22:43:36Z

+1

vstinner · 2011-01-13T23:54:54Z

I tested cgi_32.patch on Windows with Apache:

a test with a binary file works: I get a binary file instead of a text file
a test with a non-ASCII character (a\xe9b) works: the text is correctly decoded

I used the test script from full_source_and_error.zip.

Comments on cgi_32.patch:

 - I don't understand why FieldStorage changes sys.stdout and sys.stderr (see remarks about IOMix above): please remove the charset argument (it is also confusing to have two encoding arguments). it should be done somewhere else
 - please remove the O_BINARY hack: the patch is for Python 3.2 and I closed issue python/issues-test-cpython#10841. If you would like a backport, another patch should be written later
 - "encoding = 'latin-1' # ?": write a real comment or remove it
 - 'self.fp.read(...) # bytes': you should add a test on the type if you are not sure that fp.read() gives bytes
 - "file: the file(-like) object from which you can read the data *as bytes*": you should mention that TextIOWrapper is also tolerated (accepted?)
 - you may set fp directly to sys.stdin.buffer (instead of sys.stdin) if fp is None (it will be easier after removing the O_BINARY thing)
 - the patch adds a tab in an empty line, please don't do that :-)
 - you should add a (private?) attribute to FieldStorage to decide if it works on bytes or unicode, instead of using "self.filename is not None" test (eg. self._use_bytes = (self.filename is not None)
 - i don't like the idea of having a generic self.__write() method supporting bytes and unicode. it would prefer two methods, eg. self.__write_text() and self.__write_binary() (they can share a third private method)
 - i don't like "stream_encoding" name: what is the "stream" here? do you process a "file", a "string" or a "stream"? why not just "self.encoding"?
 - "import email.parser,email.feedparser" one import is useless here. I prefer "from email.feedparser import FeedParser" because you get directly a ImportError if the symbol is missing. And it's already faster to get FeedParser instead of email.feedparser.FeedParser in a loop (dummy micro-optimization)
 - even I like the following change, please do it in a separated patch:
-            if type(value) is type([]):
+            if isinstance(value,list):

I really don't like the IOMix thing:

sys.stdout.write() should not accept bytes
FieldStorage should not replace sys.stdout and sys.stderr: if you want to set the encoding of these files, set PYTHONIOENCODING environment variable before running your program (it changes also the encoding of sys.stdio)
IOMix should not accept bytes *and* unicode. It's better to have an explicit API like stdout.write('unicode') and stdout.buffer.write(b'bytes)

Most parts of the patch are correct and fix real bugs. Since cgi is broken currently (eg. it doesn't handle binary files correctly), anything making the situation better would be nice.

I vote +0 to commit the patch now (if the release manager agrees), and +1 if all of my remarks are fixed.

vpython · 2011-01-14T01:52:02Z

Victor, thanks for your comments, and interest in this bug. Other than the existence of the charset parameter, and whether or not to include IOMix, I think all of the others could be fixed later, and do not hurt at present. So I will just comment on those two comments.

I would prefer to see FieldStorage not have the charset attribute also, but I don't have the practice to produce an alternate patch, and I can see that it would be a convenience for some CGI scripts to specify that parameter, and have one API call do all the work necessary to adjust the IO streams, and read all the parameters, and then the rest of the logic of the web app can follow. Personally, I adjust the stdout/stderr streams earlier in my scripts, and only optionally call FieldStorage, if I determine the request needs such.

I've been using IOMix for some months (I have a version for both Python 2 and 3), and it solves a real problem in generating web page data streams... the data stream should be bytes, but a lot of the data is manipulated using str, which would then need to be decoded. The default encoding of stdout is usually wrong, so must somehow be changed. And when you have chunks of bytes (in my experience usually from a database or file) to copy to the output stream, if your prior write was str, and then you write bytes to sys.stdout.binary, you have to also remember to flush the TextIOBuffer first. IOMix provides a convenient solution to all these problems, doing the flushing for you automatically, and just taking what comes and doing the right thing. If I hadn't already invented IOMix to help write web pages, I would want to :)

grahamd · 2011-01-14T06:08:08Z

FWIW, keep in mind that cgi.FieldStorage is also quite often used in WSGI scripts in arbitrary WSGI servers which have got nothing to do with CGI. Having cgi.FieldStorage muck around with stdout/stderr under WSGI, even where using a CGI/WSGI bridge, would potentially be a bad thing to do, especially in embedded systems like mod_wsgi where sys.stdout and sys.stderr are replaced with file like objects that map onto Apache error logging. Even in non embedded systems, you could very well screw up any application logging done via stdout/stderr and break the application.

So, the default or common code paths should never play with sys.stdout or sys.stderr. It is already a PITA that the implementation falls back to using sys.argv when QUERY_STRING isn't defined which also could produce strange results under a WSGI server. In other words, please don't go adding any more code which makes the wrong assumption that this is only used in CGI scripts.

vpython · 2011-01-14T06:17:38Z

Graham, Thanks for your comments. Fortunately, if the new charset parameter is not supplied, no mucking with stdout or stderr is done, which is the only reason I cannot argue strongly against the feature, which I would have implemented as a separate API... it doesn't get in the way if you don't use it.

I would be happy to see the argv code removed, but it has been there longer than I have been a Python user, so I just live with it ... and don't pass arguments to my CGI scripts anyway. I've assumed that is some sort of a debug feature, but I also saw some code in the HTTPCGIServer and http.server that apparently, on some platforms, actually do pass parameters to CGI on the command lines. I would be happy to see that code removed too, but it also predates my Python experience. And no signs of "if debug:" by either of them!

PierreQuentel · 2011-01-14T08:23:41Z

@victor

Thanks for the comments

"- I don't understand why FieldStorage changes sys.stdout and sys.stderr (see remarks about IOMix above): please remove the charset argument (it is also confusing to have two encoding arguments). it should be done somewhere else"

done

"please remove the O_BINARY hack: the patch is for Python 3.2 and I closed issue bpo-10841. If you would like a backport, another patch should be written later"

done

""encoding = 'latin-1' # ?": write a real comment or remove it"

I removed this part

"'self.fp.read(...) # bytes': you should add a test on the type if you are not sure that fp.read() gives bytes"

added tests in read_urlencoded(), read_multi() and read_binary()

"file: the file(-like) object from which you can read the data *as bytes*": you should mention that TextIOWrapper is also tolerated (accepted?)"

not done : here "file" is not the argument passed to the FieldStorage constructor, but the attribute of values returned from calls to FieldStorage. In the new implementation, its read() method always returns bytes

"you may set fp directly to sys.stdin.buffer (instead of sys.stdin) if fp is None (it will be easier after removing the O_BINARY thing)"

done

"the patch adds a tab in an empty line, please don't do that :-)"

done (hopefully :-)

"you should add a (private?) attribute to FieldStorage to decide if it works on bytes or unicode, instead of using "self.filename is not None" test (eg. self._use_bytes = (self.filename is not None)"

done

"i don't like the idea of having a generic self.__write() method supporting bytes and unicode. it would prefer two methods, eg. self.__write_text() and self.__write_binary() (they can share a third private method)"

not done, the argument of __write is always bytes

"i don't like "stream_encoding" name: what is the "stream" here? do you process a "file", a "string" or a "stream"? why not just "self.encoding"?"

done

"import email.parser,email.feedparser" one import is useless here. I prefer "from email.feedparser import FeedParser" because you get directly a ImportError if the symbol is missing. And it's already faster to get FeedParser instead of email.feedparser.FeedParser in a loop (dummy micro-optimization)

done

" even I like the following change, please do it in a separated patch:

```
       if type(value) is type([]):
```

+ if isinstance(value,list):"

not done

"I really don't like the IOMix thing:"

removed

"I vote +0 to commit the patch now (if the release manager agrees), and +1 if all of my remarks are fixed."

should be close to +0.8 now ;-)

erob · 2011-01-14T09:25:56Z

+1

thanks for this input. I agree for the most part. However if the io
semantics in python 3 is radically different than on python 2, I could
have expected that WSGI scripts would similarly depend on a newer
type of file descriptor access using the sys module.

cheers!

vpython · 2011-01-14T10:04:20Z

Pierre, Thank you for the new patch, with the philosophy of "it's broke, so let's produce something the committers like to get it fixed".

I see you overlooked removing the second use of O_BINARY. Locally, I removed that also, and tested your newest patch, and it still functions great for me.

PierreQuentel · 2011-01-14T10:48:19Z

Glenn, you read my mind ;-)

Thanks for mentioning the O_BINARY thing. New (last !) patch attached

vstinner · 2011-01-14T13:12:26Z

r87996+r87997 adds encoding and errors argument to parse_qs() and parse_qsl() of urllib.parse. It is needed to decoded correctly %XX syntax in cgi.

r87998 is the patch on the cgi module.

Changes with cgi_32.patch:

Use TextIOWrapper instead of TextIOBase, because TextIOBase has no buffer
attribute
typo in a docstring: "it must must" => "must"
(docstring) default: sys.stdin => default: sys.stdin.buffer
PEP-8: hasattr(a,b) => hasattr(a, b) (same for isinstance) and
"encoding = 'utf-8'" => "encoding='utf-8'" (in the argument list)
"xxx.decode(...) # str": remove useless # str comment. decode() always give
unicode in Python 3 (same change for ".encode() # bytes")
Rename "next" variables to "next_boundary" because next is a builtin
function in Python 3 (unrelated change).
FieldStorage.innerboundary and FieldStorage.outerboundary are bytes objects:
encode innerboundary in the constructor, and raise an error if outerboundary
is not a bytes object
Rename _use_bytes to _binary_file
isinstance(bytes) test: write the type, not the value, in the error message
Replace line[:2] == b'--' by line.startswith(b'--'), and then replace
line.strip() by line.rstrip()
test_fieldstorage_multipart() uses ASCII (and specifiy the encoding to FieldStorage)
add FieldStorage.errors attribute: pass it to parse_qsl()
add errors attribute to FieldStorage: same default value than urllib.parse.unquote(): 'replace'
parse(): pass encoding argument to parse_qs()
FieldStorage: pass encoding and errors arguments to parse_qsl()

Because the patch on TextIOBase, it patched the docstring:
---
fp : file pointer; default: sys.stdin.buffer
(not used when the request method is GET)
Can be :
1. an instance of (a subclass of) TextIOWrapper, in this case it
must provide an attribute "buffer" = the binary layer that returns
bytes from its read() method, and preferably an attribute
"encoding" (defaults to latin-1)
2. an object whose read() and readline() methods return bytes
---
becomes
---
fp : file pointer; default: sys.stdin.buffer
(not used when the request method is GET)
Can be :
1. a TextIOWrapper object
2. an object whose read() and readline() methods return bytes
---

Replace "type(value) is type([])" is done in another commit: r87999.

I consider that the work on this issue is done, and so I close it. If I am wrong, explain why and repoen the issue.

Please test the cgi as much as possible before Python 3.2 final: reopen the issue if it doesn't work.

vstinner · 2011-01-14T13:14:10Z

Oh, I forgot to credit the author(s): who wrote the patch?

PierreQuentel · 2011-01-14T14:07:53Z

Thanks a lot Victor !

I wrote the patch : Pierre Quentel (pierre.quentel@gmail.com) with many
inputs by Glenn Linderman

2011/1/14 STINNER Victor <report@bugs.python.org>

STINNER Victor <victor.stinner@haypocalc.com> added the comment:

Oh, I forgot to credit the author(s): who wrote the patch?

----------

Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue4953\>

vstinner · 2011-01-14T14:15:23Z

TODO: Add more tests to test_cgi. What is the latest patch for test_cgi?

PierreQuentel · 2011-01-14T14:41:19Z

My latest patch for test_cgi is in cgi_32.patch

I will try to add more tests later

vstinner · 2011-01-14T14:57:57Z

haypo>> What is the latest patch for test_cgi?
quentel> My latest patch for test_cgi is in cgi_32.patch

Ok, but cgi_32.patch doesn't add any test. I only adapt existing tests for your other changes.

I remove cgi_32.patch because it was commited.

vstinner · 2011-01-14T15:00:04Z

Remove cgi_plus_tests.diff: it looks to be an old version of cgi_32.patch.

@r.david.murray: Did you write cgi_plus_tests.diff, or is it based on the work on Pierre Quentel?

vstinner · 2011-01-14T15:03:00Z

Remove tmpy44zj7.html and tmpav1vve.html: a similar file is included in full_source_and_error.zip.

bitdancer · 2011-01-14T19:11:10Z

Victor: we normally leave the patch file that was committed attached to the issue for future reference.

The _plus_tests file was just the original patch plus the existing cgi tests adjusted to pass in bytes instead of strings to cgi, if I recall correctly. Nor original code of mine in either part :)

vpython · 2011-01-14T19:41:03Z

Thanks to Pierre for producing patch after patch and testing testing testing, and to Victor for committing it, as well as others that contributed in smaller ways, as I tried to. I look forward to 3.2 rc1 so I can discard all my temporary patched copies of cgi.py

vstinner · 2011-01-14T21:36:09Z

Because I'm unable to read the whole history and analyze each file attached to this issue, I opened bpo-10911 to ask to write more tests for the cgi module.

vstinner · 2011-01-15T01:19:30Z

Le vendredi 14 janvier 2011 à 19:11 +0000, R. David Murray a écrit :

Victor: we normally leave the patch file that was committed attached
to the issue for future reference.

Sorry, but there were too much files. I was trying to figure out if
there is something useful in the files.

The _plus_tests file was just the original patch plus the existing cgi
tests adjusted to pass in bytes instead of strings to cgi, if I recall
correctly. Nor original code of mine in either part :)

Ok.

oopos mannequin added performance Performance or resource usage topic-unicode labels Jan 15, 2009

ggenellina mannequin added the stdlib Python modules in the Lib dir label Jan 16, 2009

ggenellina mannequin changed the title ~~Cannot upload binary file from form ?~~ cgi module cannot handle POST with multipart/form-data in 3.0 Jan 16, 2009

ggenellina mannequin added type-bug An unexpected behavior, bug, or error and removed performance Performance or resource usage labels Jan 16, 2009

pitrou removed the release-blocker label Jan 13, 2011

erob mannequin changed the title ~~cgi module cannot handle POST with multipart/form-data in 3.x~~ cgi module cannot handle POST with multipart/form-data in 3.x Jan 13, 2011

vstinner closed this as completed Jan 14, 2011

PierreQuentel mannequin changed the title ~~cgi module cannot handle POST with multipart/form-data in 3.x~~ cgi module cannot handle POST with multipart/form-data in 3.x Jan 14, 2011

ezio-melotti transferred this issue from another repository Apr 10, 2022

cgi module cannot handle POST with multipart/form-data in 3.x #49203

cgi module cannot handle POST with multipart/form-data in 3.x #49203

Comments

oopos mannequin commented Jan 15, 2009

oopos mannequin commented Jan 15, 2009

amauryfa commented Jan 15, 2009

oopos mannequin commented Jan 15, 2009

amauryfa commented Jan 15, 2009

oopos mannequin commented Jan 15, 2009

amauryfa commented Jan 15, 2009

ggenellina mannequin commented Jan 16, 2009

oopos mannequin commented Jan 16, 2009

ggenellina mannequin commented Jan 17, 2009

tercero12 mannequin commented Jun 5, 2009

bitdancer commented Jun 5, 2009

tercero12 mannequin commented Jun 8, 2009

tobias mannequin commented Aug 10, 2009

tercero12 mannequin commented Aug 10, 2009

tercero12 mannequin commented Aug 18, 2009

warsaw commented Aug 19, 2009

tcourbon mannequin commented Nov 15, 2009

tercero12 mannequin commented Nov 15, 2009

tcourbon mannequin commented Nov 16, 2009

gvanrossum commented Jun 16, 2010

tercero12 mannequin commented Jun 20, 2010

vpython mannequin commented Nov 21, 2010

PierreQuentel mannequin commented Jan 2, 2011

bitdancer commented Jan 2, 2011

PierreQuentel mannequin commented Jan 13, 2011

pitrou commented Jan 13, 2011

vpython mannequin commented Jan 13, 2011

merwok commented Jan 13, 2011

erob mannequin commented Jan 13, 2011

vstinner commented Jan 13, 2011

vpython mannequin commented Jan 14, 2011

grahamd mannequin commented Jan 14, 2011

vpython mannequin commented Jan 14, 2011

PierreQuentel mannequin commented Jan 14, 2011

erob mannequin commented Jan 14, 2011

vpython mannequin commented Jan 14, 2011

PierreQuentel mannequin commented Jan 14, 2011

vstinner commented Jan 14, 2011

vstinner commented Jan 14, 2011

PierreQuentel mannequin commented Jan 14, 2011

vstinner commented Jan 14, 2011

PierreQuentel mannequin commented Jan 14, 2011

vstinner commented Jan 14, 2011

vstinner commented Jan 14, 2011

vstinner commented Jan 14, 2011

bitdancer commented Jan 14, 2011

vpython mannequin commented Jan 14, 2011

vstinner commented Jan 14, 2011

vstinner commented Jan 15, 2011