Author vstinner
Recipients amaury.forgeotdarc, barry, eric.araujo, erob, flox, ggenellina, gvanrossum, oopos, pebbe, pitrou, quentel, r.david.murray, tcourbon, tercero12, tobias, v+python, vstinner
Date 2011-01-10.13:03:49
SpamBayes Score 0.0
Marked as misclassified No
Message-id <201101101403.48304.victor.stinner@haypocalc.com>
In-reply-to <1294575984.07.0.700034944239.issue4953@psf.upfronthosting.co.za>
Content
Some comments on cgi_diff_20110109.txt, especially on FieldStorage 
constructor.

Le dimanche 09 janvier 2011 13:26:24, vous avez écrit :
> Here is the diff file for the revised version of cgi.py

+            import msvcrt
+            msvcrt.setmode (0, os.O_BINARY) # stdin  = 0
+            msvcrt.setmode (1, os.O_BINARY) # stdout = 1
+            msvcrt.setmode (2, os.O_BINARY) # stderr = 2

Why do you change stdout and stderr mode? Is it needed? Instead of 0, you 
should use sys.stdin.fileno() with a try/except on .fileno() because stdin can 
be a StringIO object:

   >>> o=io.StringIO()    
   >>> o.fileno()
   io.UnsupportedOperation: fileno

I suppose that it's better to do nothing if sys.stdin has no .fileno() method.

More generally, I don't think that the cgi module should touch sys.stdin mode: 
it impacts the whole process, not only the cgi module. Eg. change sys.stdin 
mode in Python 3.1 will break the interperter because the Python parser in 
Pytohn 3.1 doesn't know how to handle \r\n end of line. If you need binary 
stdin, I should backport my patch for #10841 (for std*, FileIO and the 
parser).

----
def __init__(self, fp=None, headers=None, outerboundary="",
             environ=os.environ, keep_blank_values=0, strict_parsing=0,
             limit=None):
...
if 'QUERY_STRING' in environ:
   qs = environ['QUERY_STRING']
elif sys.argv[1:]:
   qs = sys.argv[1]
else:
   qs = ""
fp = BytesIO(qs.encode('ascii')) # bytes
----

With Python 3.2, you should use environ=environ.os.environb by default to 
avoid unnecessary conversion (os.environb --decode--> os.environ --encode--> 
qs). To decode sys.argv, ASCII is not the right encoding: you should use 
qs.encode(locale.getpreferredencoding(), 'surrogateescape') because Python 
decodes the environment and the command line arguments from 
locale.getpreferredencoding()+'surrogateescape', so it is the exact reverse 
operation and you get the original raw bytes.

For Python 3.1, use also qs.encode(locale.getpreferredencoding(), 
'surrogateescape') to encode the environment variable.

So for Python 3.2, it becomes something like:
----
def __init__(self, fp=None, headers=None, outerboundary="",
             environ=os.environb, keep_blank_values=0, strict_parsing=0,
             limit=None):
...
if 'QUERY_STRING' in environ:
   qs = environ[b'QUERY_STRING']
elif sys.argv[1:]:
   qs = sys.argv[1]
else:
   qs = b""
if isinstance(qs, str):
   encoding = locale.getpreferredencoding()
   qs = qs.encode(encoding, 'surrogateescape'))
fp = BytesIO(qs)
----
If you would like to support byte *and* Unicode environment (eg. 
environ=os.environ and environ=os.environb), you should do something a little 
bit more complex: see os.get_exec_path(). I can work on a patch if you would 
like to. A generic function should maybe be added to the os module, function 
with an optional environ argument (as os.get_exec_path()).

---
if fp is None:
   fp = sys.stdin
if fp is sys.stdin:
   ...
---
you should use sys.stdin.buffer if fp is None, and accept sys.stdin.buffer in 
the second test. Something like:
---
stdin = sys.stdin
if isinstance(fp,TextIOBase):
   stdin_buffer = stdin.buffer
else:
   stdin_buffer = stdin
if fp is None:
   fp = stdin_buffer
if fp is stdin or fp is stdin_buffer:
   ...
---

Don't you think that a warning would be appropriate if sys.stdin is passed 
here?
---
        # self.fp.read() must return bytes
        if isinstance(fp,TextIOBase):
            self.fp = fp.buffer
        else:
            self.fp = fp
---
Maybe a DeprecationWarning if we would like to drop support of TextIOWrapper 
later :-)

For the else case: you should maybe add a strict test on the type, eg. check 
for RawIOBase or BufferedIOBase subclass, isinstance(fp, (io.RawIOBase, 
io.BufferedIOBase)). It would avoid to check that fp.read() returns a bytes 
object (or get an ugly error later).

Set sys.stdin.buffer.encoding attribute is not a good idea. Why do you modify 
fp, instead of using a separated attribute on FieldStorage (eg. 
self.fp_encoding)?
---
        # field keys and values (except for files) are returned as strings
        # an encoding is required to decode the bytes read from self.fp
        if hasattr(fp,'encoding'):
            self.fp.encoding = fp.encoding
        else:
            self.fp.encoding = 'latin-1' # ?
---

I only read the constructor code.
History
Date User Action Args
2011-01-10 13:03:51vstinnersetrecipients: + vstinner, gvanrossum, barry, amaury.forgeotdarc, ggenellina, pitrou, eric.araujo, v+python, r.david.murray, oopos, tercero12, tcourbon, tobias, flox, pebbe, quentel, erob
2011-01-10 13:03:49vstinnerlinkissue4953 messages
2011-01-10 13:03:49vstinnercreate