Message 71092 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	pitrou
Recipients	gvanrossum, janssen, jimjjewett, loewis, mgiuca, orsenthil, pitrou, thomaspinckney3
Date	2008-08-13.17:51:33
SpamBayes Score	6.921489e-06
Marked as misclassified	No
Message-id	<1218649889.5646.13.camel@fsol>
In-reply-to	<1218647121.49.0.344183013642.issue3300@psf.upfronthosting.co.za>

Content
Le mercredi 13 août 2008 à 17:05 +0000, Bill Janssen a écrit : > I think it's worth remembering that a very large proportion of the use > of Python's urllib.unquote() is in implementations of Web server > frameworks of one sort or another. We can't control what the browsers > that talk to such frameworks produce; Yes, we do. Browsers will use whatever charset is specified in the HTML for the query part; and, as for the path part, they should't produce it themselves, they just follow a link which should already be percent-quoted in the HTML. (URL rewriting at the HTTP server level can make this more complicated, since it can turn a query fragment into a path fragment or vice-versa; however, most modern frameworks alleviate the need for such rewriting, since they allow to specify flexible mapping rules at the framework level) The situation in which we can't control the encoding is when getting the URLs from third-part content (e.g. some Web page which we didn't produce ourselves, or some link in an e-email). But in those cases there's less use cases for unquoting the URL rather than use it as-is. The only time I've wanted to unquote such an URL was to do some processing of HTTP referrers in order to extract which search queries had led people to visit a Web site.

Le mercredi 13 août 2008 à 17:05 +0000, Bill Janssen a écrit :
> I think it's worth remembering that a very large proportion of the use
> of Python's urllib.unquote() is in implementations of Web server
> frameworks of one sort or another.  We can't control what the browsers
> that talk to such frameworks produce;

Yes, we do. Browsers will use whatever charset is specified in the HTML
for the query part; and, as for the path part, they should't produce it
themselves, they just follow a link which should already be
percent-quoted in the HTML.

(URL rewriting at the HTTP server level can make this more complicated,
since it can turn a query fragment into a path fragment or vice-versa;
however, most modern frameworks alleviate the need for such rewriting,
since they allow to specify flexible mapping rules at the framework
level)

The situation in which we can't control the encoding is when getting the
URLs from third-part content (e.g. some Web page which we didn't produce
ourselves, or some link in an e-email). But in those cases there's less
use cases for unquoting the URL rather than use it as-is. The only time
I've wanted to unquote such an URL was to do some processing of HTTP
referrers in order to extract which search queries had led people to
visit a Web site.

History
Date	User	Action	Args
2008-08-13 17:51:37	pitrou	set	recipients: + pitrou, gvanrossum, loewis, jimjjewett, janssen, orsenthil, thomaspinckney3, mgiuca
2008-08-13 17:51:36	pitrou	link	issue3300 messages
2008-08-13 17:51:33	pitrou	create