This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author v+python
Recipients amaury.forgeotdarc, barry, eric.araujo, erob, flox, ggenellina, oopos, pebbe, pitrou, quentel, r.david.murray, tcourbon, tercero12, tobias, v+python, vstinner
Date 2011-01-10.23:07:23
SpamBayes Score 8.511568e-10
Marked as misclassified No
Message-id <1294700847.11.0.053526051363.issue4953@psf.upfronthosting.co.za>
In-reply-to
Content
Victor said:
I mean: you should pass sys.stdin.buffer instead of sys.stdin.

I say:
That would be possible, but it is hard to leave it at default, in that case, because sys.stdin will, by default, not be a binary stream.  It is a convenience for FieldStorage to have a useful default for its input, since RFC 3875 declares that the message body is obtained from "standard input".

Pierre said:
I wish it could be as simple, but I'm afraid it's not. On my PC, sys.stdin.encoding is cp-1252. I tested a multipart/form-data with an INPUT field, and I entered the euro character, which is encoded  \x80 in cp-1252

If I use the encoding defined for sys.stdin (cp-1252) to decode the bytes received on sys.stdin.buffer, I get the correct value in the cgi script ; if I set the encoding to latin-1 in FieldStorage, since \x80 maps to undefined in latin-1, I get a UnicodeEncodeError if I try to print the value ("character maps to <undefined>")

I say:
Interesting. I'm curious what your system (probably Windows since you mention cp-) and browser, and HTTP server is, that you used for that test.  Is it possible to capture the data stream for that test?  Describe how, and at what stage the data stream was captured, if you can capture it.  Most interesting would be on the interface between browser and HTTP server.

RFC 3875 states (section 4.1.3) what the default encodings should be, but I see that the first possibility is "system defined".  On the other hand, it seems to imply that it should be a system definition specifically defined for particular media types, not just a general system definition such as might be used as a default encoding for file handles... after all, most Web communication crosses system boundaries.  So lacking a system defined definition for text/ types, it then indicates that the default for text/ types is Latin-1.

I wonder what result you get with the same browser, at the web page http://rishida.net/tools/conversion/ by entering the euro symbol into the Characters entry field, and choosing convert.
History
Date User Action Args
2011-01-10 23:07:27v+pythonsetrecipients: + v+python, barry, amaury.forgeotdarc, ggenellina, pitrou, vstinner, eric.araujo, r.david.murray, oopos, tercero12, tcourbon, tobias, flox, pebbe, quentel, erob
2011-01-10 23:07:27v+pythonsetmessageid: <1294700847.11.0.053526051363.issue4953@psf.upfronthosting.co.za>
2011-01-10 23:07:23v+pythonlinkissue4953 messages
2011-01-10 23:07:23v+pythoncreate