New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cgi module cannot handle POST with multipart/form-data in 3.x #49203
Comments
Python 3.0 (r30:67507, Dec 3 2008, 20:14:27) [MSC v.1500 32 bit UnicodeDecodeError Python 3.0: G:\cgi\python\python.exe A problem occurred in a Python script. Here is the sequence of function UnicodeDecodeError: 'gbk' codec can't decode bytes in position 157-158: l:\tmp_dir\tmpxyeojf.html contains the description of this error. ------------------------------------------- Best Regards I |
Does it work if you change your script like this: |
To Amaury Forgeot d'Arc : Thank you. That error have sloved with your way: Now,The new problem come out: 98 # Handle any previous leftovers 99 data, self._partial = self._partial + data, '' 100 # Crack into lines, but preserve the newlines on the end 101 parts = NLCRE_crack.split(data) data = b'-----------------------------7d91f41a302f4 TypeError: Can't convert 'bytes' object to str implicitly I find that the CGI LIB didn't use bytes flow, it always use string More info in the attch file: |
OK, another try. Please replace the previous version with these 3 lines: encoding = os.environ.get('HTTP_TRANSFER_ENCODING')
stdin = open(sys.stdin.fileno(), 'r', encoding=encoding)
opsform = cgi.FieldStorage(stdin) |
Thank you for time. See the files: |
Thanks for the test case. I reproduced it easily. The first thing to do is to start python with the -u option (add it to Even then, I noticed that in the multipart/form-data section, text And of course, the email.parser.FeedParser object used to parse it |
An attempt to more accurately describe the issue, to attract more |
Hehe. Thank you . I will take more time to learn Python language well. This is my code: (Perl Language) |
You should stick to Python 2.6 (or even 2.5) for web programming - 3.0 |
I'm working on a web framework for Python 3. Naturally this is a |
Can you provide a test case that clearly demonstrates the problem |
I've attached unittest.zip. Simply unzip it to a directory and run it. It seems that there are several issues with cgi.FieldStorage and
|
Actually, I think this whole issue is more complex. For example, In this case, one has to:
Although FieldStorage only cares about reading, it still has to cope |
I think you hit the nail on the head. Now we just need (someone) to |
I thought I'd take a crack at this today. I soon figured out the real |
Please, please, please contact the email-sig and help pitch in. For |
*bump* Hi there Pythoners ! As Timothy Farrell I'm currently working (or rather, toying since it Just want if their was any progress since last messages or I should jump |
Perhaps this update should go in the linked email bug. The email team Once the email module is fixed, the cgi module will be trivial to fix. |
It seems that there wasn't work on that issue (which look complicated by |
Could this be related to bpo-8077? |
Yes, they are related but not quite the same. The solution to this problem will likely include a fix to the problem described in bpo-8077. |
Regarding http://bugs.python.org/issue4953#msg91444 POST with multipart/form-data encoding can use UTF-8, other stuff is restricted to ASCII! From http://www.w3.org/TR/html401/interact/forms.html: Hence cgi formdata can safely decode text fields using UTF-8 decoding (experimentally, that is the encoding used by Firefox to support the entire ISO10646 character set). |
Hi, I have started working on the port of a simplified version of Karrigell (a web framework) to Python3. I experienced the same problem as the other posters : in the current version, file upload doesn't work. So I've been working on the cgi module for a few days and now have a version which correctly manages file uploads in the tests I made The problem in the current version (3.2b2) is that all data is read from sys.stdin, which reads strings, not bytes. This obviously can't work properly to upload binary files. In the proposed version, for multipart/form-data type, all data is read as bytes from sys.stdin.buffer ; in the CGI script, the Python interpreter must be launched with the -u option, as suggested by Amaury, otherwise sys.stdin.buffer.read() only returns the beginning of the data stream The headers inside the multipart/form-data are decoded to a string using sys.stdin.encoding and passed to a FeedParser (which requires strings) ; then the data is read from sys.stdin.buffer (bytes) until a boundary is found If the field is a file, the file object in self.file stores bytes, and the attribute "value" is a byte string. If it is not a file, the value is decoded to a string, always using sys.stdin.encoding, as for all other fields for other types of forms Other cosmetic changes :
Attached file : zip with cgi_new.py and tests in a folder called "http" |
Thank you very much for working on this! I'll try to take a look at the patch soon. A couple quick comments based on your posting: first, the email module now has a BytesFeedparser that will accept a byte stream, which I hope might simplify your patch. Second, it would be very helpful if you could upload your patch as an 'svn diff' against the current py3k trunk (see python.org/dev for details on how to do that). That will make review and application of the patch much much simpler. (This would be true even if more of the code in cgi.py has changed than not.) If you don't want to set up an svn checkout, then a context diff against the copy of cgi.py you started with would be second best. Please post any files individually as .patch or .diff or .txt files...these are preferred in the tracker over .zip files because they can be viewed without downloading. |
Comment ça, no up to date patch ? cgi_32.patch is up to date, the API changes are documented, the unittests work, what else do you want ? |
The O_BINARY stuff looks obsolete to me. |
The O_BINARY stuff was probably necessary because bpo-10841 is not yet in the build Pierre was using? I agree it in not necessary with the fix for that issue, but neither does it hurt. It could be stripped out, if you think that is best, Antoine. But there is a working patch. |
Can one person please
|
+1 |
I tested cgi_32.patch on Windows with Apache:
I used the test script from full_source_and_error.zip. Comments on cgi_32.patch: - I don't understand why FieldStorage changes sys.stdout and sys.stderr (see remarks about IOMix above): please remove the charset argument (it is also confusing to have two encoding arguments). it should be done somewhere else
- please remove the O_BINARY hack: the patch is for Python 3.2 and I closed issue python/issues-test-cpython#10841. If you would like a backport, another patch should be written later
- "encoding = 'latin-1' # ?": write a real comment or remove it
- 'self.fp.read(...) # bytes': you should add a test on the type if you are not sure that fp.read() gives bytes
- "file: the file(-like) object from which you can read the data *as bytes*": you should mention that TextIOWrapper is also tolerated (accepted?)
- you may set fp directly to sys.stdin.buffer (instead of sys.stdin) if fp is None (it will be easier after removing the O_BINARY thing)
- the patch adds a tab in an empty line, please don't do that :-)
- you should add a (private?) attribute to FieldStorage to decide if it works on bytes or unicode, instead of using "self.filename is not None" test (eg. self._use_bytes = (self.filename is not None)
- i don't like the idea of having a generic self.__write() method supporting bytes and unicode. it would prefer two methods, eg. self.__write_text() and self.__write_binary() (they can share a third private method)
- i don't like "stream_encoding" name: what is the "stream" here? do you process a "file", a "string" or a "stream"? why not just "self.encoding"?
- "import email.parser,email.feedparser" one import is useless here. I prefer "from email.feedparser import FeedParser" because you get directly a ImportError if the symbol is missing. And it's already faster to get FeedParser instead of email.feedparser.FeedParser in a loop (dummy micro-optimization)
- even I like the following change, please do it in a separated patch:
- if type(value) is type([]):
+ if isinstance(value,list): I really don't like the IOMix thing:
Most parts of the patch are correct and fix real bugs. Since cgi is broken currently (eg. it doesn't handle binary files correctly), anything making the situation better would be nice. I vote +0 to commit the patch now (if the release manager agrees), and +1 if all of my remarks are fixed. |
Victor, thanks for your comments, and interest in this bug. Other than the existence of the charset parameter, and whether or not to include IOMix, I think all of the others could be fixed later, and do not hurt at present. So I will just comment on those two comments. I would prefer to see FieldStorage not have the charset attribute also, but I don't have the practice to produce an alternate patch, and I can see that it would be a convenience for some CGI scripts to specify that parameter, and have one API call do all the work necessary to adjust the IO streams, and read all the parameters, and then the rest of the logic of the web app can follow. Personally, I adjust the stdout/stderr streams earlier in my scripts, and only optionally call FieldStorage, if I determine the request needs such. I've been using IOMix for some months (I have a version for both Python 2 and 3), and it solves a real problem in generating web page data streams... the data stream should be bytes, but a lot of the data is manipulated using str, which would then need to be decoded. The default encoding of stdout is usually wrong, so must somehow be changed. And when you have chunks of bytes (in my experience usually from a database or file) to copy to the output stream, if your prior write was str, and then you write bytes to sys.stdout.binary, you have to also remember to flush the TextIOBuffer first. IOMix provides a convenient solution to all these problems, doing the flushing for you automatically, and just taking what comes and doing the right thing. If I hadn't already invented IOMix to help write web pages, I would want to :) |
FWIW, keep in mind that cgi.FieldStorage is also quite often used in WSGI scripts in arbitrary WSGI servers which have got nothing to do with CGI. Having cgi.FieldStorage muck around with stdout/stderr under WSGI, even where using a CGI/WSGI bridge, would potentially be a bad thing to do, especially in embedded systems like mod_wsgi where sys.stdout and sys.stderr are replaced with file like objects that map onto Apache error logging. Even in non embedded systems, you could very well screw up any application logging done via stdout/stderr and break the application. So, the default or common code paths should never play with sys.stdout or sys.stderr. It is already a PITA that the implementation falls back to using sys.argv when QUERY_STRING isn't defined which also could produce strange results under a WSGI server. In other words, please don't go adding any more code which makes the wrong assumption that this is only used in CGI scripts. |
Graham, Thanks for your comments. Fortunately, if the new charset parameter is not supplied, no mucking with stdout or stderr is done, which is the only reason I cannot argue strongly against the feature, which I would have implemented as a separate API... it doesn't get in the way if you don't use it. I would be happy to see the argv code removed, but it has been there longer than I have been a Python user, so I just live with it ... and don't pass arguments to my CGI scripts anyway. I've assumed that is some sort of a debug feature, but I also saw some code in the HTTPCGIServer and http.server that apparently, on some platforms, actually do pass parameters to CGI on the command lines. I would be happy to see that code removed too, but it also predates my Python experience. And no signs of "if debug:" by either of them! |
Thanks for the comments "- I don't understand why FieldStorage changes sys.stdout and sys.stderr (see remarks about IOMix above): please remove the charset argument (it is also confusing to have two encoding arguments). it should be done somewhere else" done "please remove the O_BINARY hack: the patch is for Python 3.2 and I closed issue bpo-10841. If you would like a backport, another patch should be written later" done ""encoding = 'latin-1' # ?": write a real comment or remove it" I removed this part "'self.fp.read(...) # bytes': you should add a test on the type if you are not sure that fp.read() gives bytes" added tests in read_urlencoded(), read_multi() and read_binary() "file: the file(-like) object from which you can read the data *as bytes*": you should mention that TextIOWrapper is also tolerated (accepted?)" not done : here "file" is not the argument passed to the FieldStorage constructor, but the attribute of values returned from calls to FieldStorage. In the new implementation, its read() method always returns bytes "you may set fp directly to sys.stdin.buffer (instead of sys.stdin) if fp is None (it will be easier after removing the O_BINARY thing)" done "the patch adds a tab in an empty line, please don't do that :-)" done (hopefully :-) "you should add a (private?) attribute to FieldStorage to decide if it works on bytes or unicode, instead of using "self.filename is not None" test (eg. self._use_bytes = (self.filename is not None)" done "i don't like the idea of having a generic self.__write() method supporting bytes and unicode. it would prefer two methods, eg. self.__write_text() and self.__write_binary() (they can share a third private method)" not done, the argument of __write is always bytes "i don't like "stream_encoding" name: what is the "stream" here? do you process a "file", a "string" or a "stream"? why not just "self.encoding"?" done
done " even I like the following change, please do it in a separated patch:
+ if isinstance(value,list):" not done "I really don't like the IOMix thing:" removed "I vote +0 to commit the patch now (if the release manager agrees), and +1 if all of my remarks are fixed." should be close to +0.8 now ;-) |
+1 thanks for this input. I agree for the most part. However if the io cheers! |
Pierre, Thank you for the new patch, with the philosophy of "it's broke, so let's produce something the committers like to get it fixed". I see you overlooked removing the second use of O_BINARY. Locally, I removed that also, and tested your newest patch, and it still functions great for me. |
Glenn, you read my mind ;-) Thanks for mentioning the O_BINARY thing. New (last !) patch attached |
r87996+r87997 adds encoding and errors argument to parse_qs() and parse_qsl() of urllib.parse. It is needed to decoded correctly %XX syntax in cgi. r87998 is the patch on the cgi module. Changes with cgi_32.patch:
Because the patch on TextIOBase, it patched the docstring: Replace "type(value) is type([])" is done in another commit: r87999. I consider that the work on this issue is done, and so I close it. If I am wrong, explain why and repoen the issue. Please test the cgi as much as possible before Python 3.2 final: reopen the issue if it doesn't work. |
Oh, I forgot to credit the author(s): who wrote the patch? |
Thanks a lot Victor ! I wrote the patch : Pierre Quentel (pierre.quentel@gmail.com) with many 2011/1/14 STINNER Victor <report@bugs.python.org>
|
TODO: Add more tests to test_cgi. What is the latest patch for test_cgi? |
My latest patch for test_cgi is in cgi_32.patch I will try to add more tests later |
haypo>> What is the latest patch for test_cgi? Ok, but cgi_32.patch doesn't add any test. I only adapt existing tests for your other changes. I remove cgi_32.patch because it was commited. |
Remove cgi_plus_tests.diff: it looks to be an old version of cgi_32.patch. @r.david.murray: Did you write cgi_plus_tests.diff, or is it based on the work on Pierre Quentel? |
Remove tmpy44zj7.html and tmpav1vve.html: a similar file is included in full_source_and_error.zip. |
Victor: we normally leave the patch file that was committed attached to the issue for future reference. The _plus_tests file was just the original patch plus the existing cgi tests adjusted to pass in bytes instead of strings to cgi, if I recall correctly. Nor original code of mine in either part :) |
Thanks to Pierre for producing patch after patch and testing testing testing, and to Victor for committing it, as well as others that contributed in smaller ways, as I tried to. I look forward to 3.2 rc1 so I can discard all my temporary patched copies of cgi.py |
Because I'm unable to read the whole history and analyze each file attached to this issue, I opened bpo-10911 to ask to write more tests for the cgi module. |
Le vendredi 14 janvier 2011 à 19:11 +0000, R. David Murray a écrit :
Sorry, but there were too much files. I was trying to figure out if
Ok. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: