Author aclover
Recipients aclover, claudep, grahamd, pje
Date 2012-12-16.12:03:31
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1355659412.62.0.985557563504.issue16679@psf.upfronthosting.co.za>
In-reply-to
Content
WSGI's usage of ISO-8859-1 for all HTTP-byte-originated strings is very much deliberate; we needed a way to preserve the original input bytes whilst still using unicode strings, and at the time surrogateescape was not available. The result is counter-intuitive but at least it is finally consistent; the expectation is that most web authors will be using some kind of web framework or input-reading library that will hide away the unpleasant details.

See http://mail.python.org/pipermail/web-sig/2007-December/thread.html#3002 and http://mail.python.org/pipermail/web-sig/2010-July/thread.html#4473 for the background discussion.

In any case we cannot assume a path is UTF-8 - not every URI is known to have come from an IRI so RFC 3987 does not necessarily apply. UTF-8-with-Latin1-fallback is also undesirable in itself as it adds ambiguity - an ISO-8859-1 byte sequence that by coincidence happens to be a valid UTF-8 byte sequence will get mangled.
History
Date User Action Args
2012-12-16 12:03:32acloversetrecipients: + aclover, pje, grahamd, claudep
2012-12-16 12:03:32acloversetmessageid: <1355659412.62.0.985557563504.issue16679@psf.upfronthosting.co.za>
2012-12-16 12:03:32acloverlinkissue16679 messages
2012-12-16 12:03:31aclovercreate