Message177591
WSGI's usage of ISO-8859-1 for all HTTP-byte-originated strings is very much deliberate; we needed a way to preserve the original input bytes whilst still using unicode strings, and at the time surrogateescape was not available. The result is counter-intuitive but at least it is finally consistent; the expectation is that most web authors will be using some kind of web framework or input-reading library that will hide away the unpleasant details.
See http://mail.python.org/pipermail/web-sig/2007-December/thread.html#3002 and http://mail.python.org/pipermail/web-sig/2010-July/thread.html#4473 for the background discussion.
In any case we cannot assume a path is UTF-8 - not every URI is known to have come from an IRI so RFC 3987 does not necessarily apply. UTF-8-with-Latin1-fallback is also undesirable in itself as it adds ambiguity - an ISO-8859-1 byte sequence that by coincidence happens to be a valid UTF-8 byte sequence will get mangled. |
|
Date |
User |
Action |
Args |
2012-12-16 12:03:32 | aclover | set | recipients:
+ aclover, pje, grahamd, claudep |
2012-12-16 12:03:32 | aclover | set | messageid: <1355659412.62.0.985557563504.issue16679@psf.upfronthosting.co.za> |
2012-12-16 12:03:32 | aclover | link | issue16679 messages |
2012-12-16 12:03:31 | aclover | create | |
|