Author pitrou
Recipients gvanrossum, janssen, jimjjewett, loewis, mgiuca, orsenthil, pitrou, thomaspinckney3
Date 2008-08-13.17:51:33
SpamBayes Score 6.92149e-06
Marked as misclassified No
Message-id <1218649889.5646.13.camel@fsol>
In-reply-to <1218647121.49.0.344183013642.issue3300@psf.upfronthosting.co.za>
Content
Le mercredi 13 août 2008 à 17:05 +0000, Bill Janssen a écrit :
> I think it's worth remembering that a very large proportion of the use
> of Python's urllib.unquote() is in implementations of Web server
> frameworks of one sort or another.  We can't control what the browsers
> that talk to such frameworks produce;

Yes, we do. Browsers will use whatever charset is specified in the HTML
for the query part; and, as for the path part, they should't produce it
themselves, they just follow a link which should already be
percent-quoted in the HTML.

(URL rewriting at the HTTP server level can make this more complicated,
since it can turn a query fragment into a path fragment or vice-versa;
however, most modern frameworks alleviate the need for such rewriting,
since they allow to specify flexible mapping rules at the framework
level)

The situation in which we can't control the encoding is when getting the
URLs from third-part content (e.g. some Web page which we didn't produce
ourselves, or some link in an e-email). But in those cases there's less
use cases for unquoting the URL rather than use it as-is. The only time
I've wanted to unquote such an URL was to do some processing of HTTP
referrers in order to extract which search queries had led people to
visit a Web site.
History
Date User Action Args
2008-08-13 17:51:37pitrousetrecipients: + pitrou, gvanrossum, loewis, jimjjewett, janssen, orsenthil, thomaspinckney3, mgiuca
2008-08-13 17:51:36pitroulinkissue3300 messages
2008-08-13 17:51:33pitroucreate