Author grahamd
Recipients claudep, grahamd
Date 2012-12-14.09:11:34
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
You can't try UTF-8 and then fall back to ISO-8859-1. PEP 3333 requires it always be ISO-8859-1. If an application needs it as something else, it is the web applications job to do it.

The relevant part of the PEP is:

"""On Python platforms where the str or StringType type is in fact Unicode-based (e.g. Jython, IronPython, Python 3, etc.), all "strings" referred to in this specification must contain only code points representable in ISO-8859-1 encoding (\u0000 through \u00FF, inclusive). It is a fatal error for an application to supply strings containing any other Unicode character or code point. Similarly, servers and gateways must not supply strings to an application containing any other Unicode characters."""

By converting as UTF-8 you would be breaking the requirement that only code points representable in ISO-8859-1 encoding (\u0000 through \u00FF, inclusive) are passed through.

So it is inconvenient if your expectation is that will always be UTF-8, but is how it has to work. This is because it could be something other than UTF-8, yet still be able to be successfully converted as UTF-8. In that case the application would get something totally different to the original which is wrong.

So, the WSGI server cannot ever make any assumptions and the WSGI application always has to be the one which converts it to the correct Unicode string. The only way that can be done and still pass through a native string, is that it is done as ISO-8859-1 (which is byte preserving), allowing the application to go back to bytes and then back to Unicode in correct encoding.
Date User Action Args
2012-12-14 09:11:35grahamdsetrecipients: + grahamd, claudep
2012-12-14 09:11:35grahamdsetmessageid: <>
2012-12-14 09:11:35grahamdlinkissue16679 messages
2012-12-14 09:11:34grahamdcreate