Message 126219 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	amaury.forgeotdarc, andyharrington, barry, eric.araujo, erob, flox, georg.brandl, ggenellina, oopos, pebbe, pitrou, quentel, r.david.murray, tcourbon, tobias, v+python, vstinner
Date	2011-01-13.23:54:53
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<1294962895.05.0.719397761716.issue4953@psf.upfronthosting.co.za>
In-reply-to

Content
I tested cgi_32.patch on Windows with Apache: - a test with a binary file works: I get a binary file instead of a text file - a test with a non-ASCII character (a\xe9b) works: the text is correctly decoded I used the test script from full_source_and_error.zip. Comments on cgi_32.patch: - I don't understand why FieldStorage changes sys.stdout and sys.stderr (see remarks about IOMix above): please remove the charset argument (it is also confusing to have two encoding arguments). it should be done somewhere else - please remove the O_BINARY hack: the patch is for Python 3.2 and I closed issue #10841. If you would like a backport, another patch should be written later - "encoding = 'latin-1' # ?": write a real comment or remove it - 'self.fp.read(...) # bytes': you should add a test on the type if you are not sure that fp.read() gives bytes - "file: the file(-like) object from which you can read the data as bytes": you should mention that TextIOWrapper is also tolerated (accepted?) - you may set fp directly to sys.stdin.buffer (instead of sys.stdin) if fp is None (it will be easier after removing the O_BINARY thing) - the patch adds a tab in an empty line, please don't do that :-) - you should add a (private?) attribute to FieldStorage to decide if it works on bytes or unicode, instead of using "self.filename is not None" test (eg. self._use_bytes = (self.filename is not None) - i don't like the idea of having a generic self.__write() method supporting bytes and unicode. it would prefer two methods, eg. self.__write_text() and self.__write_binary() (they can share a third private method) - i don't like "stream_encoding" name: what is the "stream" here? do you process a "file", a "string" or a "stream"? why not just "self.encoding"? - "import email.parser,email.feedparser" one import is useless here. I prefer "from email.feedparser import FeedParser" because you get directly a ImportError if the symbol is missing. And it's already faster to get FeedParser instead of email.feedparser.FeedParser in a loop (dummy micro-optimization) - even I like the following change, please do it in a separated patch: - if type(value) is type([]): + if isinstance(value,list): I really don't like the IOMix thing: - sys.stdout.write() should not accept bytes - FieldStorage should not replace sys.stdout and sys.stderr: if you want to set the encoding of these files, set PYTHONIOENCODING environment variable before running your program (it changes also the encoding of sys.stdio) - IOMix should not accept bytes and unicode. It's better to have an explicit API like stdout.write('unicode') and stdout.buffer.write(b'bytes) Most parts of the patch are correct and fix real bugs. Since cgi is broken currently (eg. it doesn't handle binary files correctly), anything making the situation better would be nice. I vote +0 to commit the patch now (if the release manager agrees), and +1 if all of my remarks are fixed.

I tested cgi_32.patch on Windows with Apache:
 - a test with a binary file works: I get a binary file instead of a text file
 - a test with a non-ASCII character (a\xe9b) works: the text is correctly decoded

I used the test script from full_source_and_error.zip.

Comments on cgi_32.patch:

 - I don't understand why FieldStorage changes sys.stdout and sys.stderr (see remarks about IOMix above): please remove the charset argument (it is also confusing to have two encoding arguments). it should be done somewhere else
 - please remove the O_BINARY hack: the patch is for Python 3.2 and I closed issue #10841. If you would like a backport, another patch should be written later
 - "encoding = 'latin-1' # ?": write a real comment or remove it
 - 'self.fp.read(...) # bytes': you should add a test on the type if you are not sure that fp.read() gives bytes
 - "file: the file(-like) object from which you can read the data *as bytes*": you should mention that TextIOWrapper is also tolerated (accepted?)
 - you may set fp directly to sys.stdin.buffer (instead of sys.stdin) if fp is None (it will be easier after removing the O_BINARY thing)
 - the patch adds a tab in an empty line, please don't do that :-)
 - you should add a (private?) attribute to FieldStorage to decide if it works on bytes or unicode, instead of using "self.filename is not None" test (eg. self._use_bytes = (self.filename is not None)
 - i don't like the idea of having a generic self.__write() method supporting bytes and unicode. it would prefer two methods, eg. self.__write_text() and self.__write_binary() (they can share a third private method)
 - i don't like "stream_encoding" name: what is the "stream" here? do you process a "file", a "string" or a "stream"? why not just "self.encoding"?
 - "import email.parser,email.feedparser" one import is useless here. I prefer "from email.feedparser import FeedParser" because you get directly a ImportError if the symbol is missing. And it's already faster to get FeedParser instead of email.feedparser.FeedParser in a loop (dummy micro-optimization)
 - even I like the following change, please do it in a separated patch:
-            if type(value) is type([]):
+            if isinstance(value,list):


I really don't like the IOMix thing:

 - sys.stdout.write() should not accept bytes
 - FieldStorage should not replace sys.stdout and sys.stderr: if you want to set the encoding of these files, set PYTHONIOENCODING environment variable before running your program (it changes also the encoding of sys.stdio)
 - IOMix should not accept bytes *and* unicode. It's better to have an explicit API like stdout.write('unicode') and stdout.buffer.write(b'bytes)


Most parts of the patch are correct and fix real bugs. Since cgi is broken currently (eg. it doesn't handle binary files correctly), anything making the situation better would be nice.

I vote +0 to commit the patch now (if the release manager agrees), and +1 if all of my remarks are fixed.

History
Date	User	Action	Args
2011-01-13 23:54:55	vstinner	set	recipients: + vstinner, barry, georg.brandl, amaury.forgeotdarc, ggenellina, pitrou, andyharrington, eric.araujo, v+python, r.david.murray, oopos, tcourbon, tobias, flox, pebbe, quentel, erob
2011-01-13 23:54:55	vstinner	set	messageid: <1294962895.05.0.719397761716.issue4953@psf.upfronthosting.co.za>
2011-01-13 23:54:53	vstinner	link	issue4953 messages
2011-01-13 23:54:53	vstinner	create