Message 78236 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	hdima
Recipients	hdima, pitrou, pje
Date	2008-12-23.14:07:39
SpamBayes Score	1.7993856e-10
Marked as misclassified	No
Message-id	<4950F0A8.2010304@hlabs.spb.ru>
In-reply-to	<1230039788.12104.20.camel@localhost>

Content
Antoine Pitrou wrote: > Le mardi 23 décembre 2008 à 11:15 +0000, Dmitry Vasiliev a écrit : >> OK, I've attached PEP-333 compatible fixes for wsgiref. > > I may be mistaken, but it seems that your patch forces iso-8859-1 > encoding of http bodies. No, just as PEP said str used as a container for binary data. For example to return UTF-8 encoded data you can use the following code: def app(environ, start_response): ... return [data.encode("utf-8").decode("iso-8859-1")] I don't like it but I guess it's strictly follow the PEP (actually I didn't notice this sections before): """ On Python platforms where the str or StringType type is in fact Unicode-based (e.g. Jython, IronPython, Python 3000, etc.), all "strings" referred to in this specification must contain only code points representable in ISO-8859-1 encoding (\u0000 through \u00FF, inclusive). It is a fatal error for an application to supply strings containing any other Unicode character or code point. Similarly, servers and gateways must not supply strings to an application containing any other Unicode characters. Again, all strings referred to in this specification must be of type str or StringType, and must not be of type unicode or UnicodeType. And, even if a given platform allows for more than 8 bits per character in str/StringType objects, only the lower 8 bits may be used, for any value referred to in this specification as a "string". """ We definitely need to use bytes in the future but it requires PEP update and some migration guide.

Antoine Pitrou wrote:
> Le mardi 23 décembre 2008 à 11:15 +0000, Dmitry Vasiliev a écrit :
>> OK, I've attached PEP-333 compatible fixes for wsgiref.
> 
> I may be mistaken, but it seems that your patch forces iso-8859-1
> encoding of http bodies.

No, just as PEP said str used as a container for binary data. For 
example to return UTF-8 encoded data you can use the following code:

     def app(environ, start_response):
         ...
         return [data.encode("utf-8").decode("iso-8859-1")]

I don't like it but I guess it's strictly follow the PEP (actually I 
didn't notice this sections before):

"""
On Python platforms where the str or StringType type is in fact 
Unicode-based (e.g. Jython, IronPython, Python 3000, etc.), all 
"strings" referred to in this specification must contain only code 
points representable in ISO-8859-1 encoding (\u0000 through \u00FF, 
inclusive). It is a fatal error for an application to supply strings 
containing any other Unicode character or code point. Similarly, servers 
and gateways must not supply strings to an application containing any 
other Unicode characters.

Again, all strings referred to in this specification must be of type str 
or StringType, and must not be of type unicode or UnicodeType. And, even 
if a given platform allows for more than 8 bits per character in 
str/StringType objects, only the lower 8 bits may be used, for any value 
referred to in this specification as a "string".
"""

We definitely need to use bytes in the future but it requires PEP update 
and some migration guide.

History
Date	User	Action	Args
2008-12-23 14:07:41	hdima	set	recipients: + hdima, pje, pitrou
2008-12-23 14:07:40	hdima	link	issue4718 messages
2008-12-23 14:07:39	hdima	create