Author Andrew Clover
Recipients Andrew Clover, aclover, animus, claudep, docs@python, grahamd, martin.panter, pje
Date 2016-04-21.17:55:05
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
> Why only PATH_INFO is encoded in such a manner, but QUERY_STRING is passed without any changes and does not requires any latin-1 to utf-8 recodings?

Laziness: QUERY_STRING should be pure-ASCII, making any such transcoding a no-op.

In principle a user agent *can* submit non-ASCII characters in a query string without %-encoding them, but it's not standards-conformant and most browsers don't usually do it (exception: apparently curl as above), so it's not worth adding a layer of hopefully-fixing-but-potentially-mangling to this variable to support a situation that shouldn't arise for normal requests.

PATH_INFO only requires special handling because of the sad, sad historical artefact of the CGI spec requiring it to have URL-decoding applied to it at the gateway, thus making the non-ASCII characters pop out of the percentage woodwork.

@Graham can you share more about how those test results were generated and displayed? The Gunicorn results are about what I would expect - the double-decoding of PATH_INFO is arguably undesirable when curl submits raw bytes, but ultimately that's an unspecified situation so I don't really case.

The output from Apache, on the other hand, is odd - something appears to have mangled the results at the reporting stage as not only is there double-decoding but also some double-backslashes. It looks like the strings have been put through ascii(repr()) or something?
Date User Action Args
2016-04-21 17:55:05Andrew Cloversetrecipients: + Andrew Clover, pje, grahamd, aclover, docs@python, martin.panter, animus, claudep
2016-04-21 17:55:05Andrew Cloversetmessageid: <>
2016-04-21 17:55:05Andrew Cloverlinkissue16679 messages
2016-04-21 17:55:05Andrew Clovercreate