Author aclover
Recipients aclover, claudep, grahamd, pje
Date 2012-12-16.12:03:31
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
WSGI's usage of ISO-8859-1 for all HTTP-byte-originated strings is very much deliberate; we needed a way to preserve the original input bytes whilst still using unicode strings, and at the time surrogateescape was not available. The result is counter-intuitive but at least it is finally consistent; the expectation is that most web authors will be using some kind of web framework or input-reading library that will hide away the unpleasant details.

See and for the background discussion.

In any case we cannot assume a path is UTF-8 - not every URI is known to have come from an IRI so RFC 3987 does not necessarily apply. UTF-8-with-Latin1-fallback is also undesirable in itself as it adds ambiguity - an ISO-8859-1 byte sequence that by coincidence happens to be a valid UTF-8 byte sequence will get mangled.
Date User Action Args
2012-12-16 12:03:32acloversetrecipients: + aclover, pje, grahamd, claudep
2012-12-16 12:03:32acloversetmessageid: <>
2012-12-16 12:03:32acloverlinkissue16679 messages
2012-12-16 12:03:31aclovercreate