Message 71091 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	gvanrossum
Recipients	gvanrossum, janssen, jimjjewett, loewis, mgiuca, orsenthil, pitrou, thomaspinckney3
Date	2008-08-13.17:17:19
SpamBayes Score	4.393866e-06
Marked as misclassified	No
Message-id	<ca471dc20808131017t2176cdbfwc70439529887feb6@mail.gmail.com>
In-reply-to	<1218647121.49.0.344183013642.issue3300@psf.upfronthosting.co.za>

Content
> Bill Janssen <bill.janssen@gmail.com> added the comment: > > Erik van der Poel at Google has now chimed in with stats on current URL > usage: > > ``...the bottom line is that escaped non-utf-8 is still quite prevalent, > enough (in my opinion) to require an implementation in Python, possibly > even allowing for different encodings in the path and query parts (e.g. > utf-8 path and gb2312 query).'' > > http://lists.w3.org/Archives/Public/www-international/2008JulSep/0042.html > > I think it's worth remembering that a very large proportion of the use > of Python's urllib.unquote() is in implementations of Web server > frameworks of one sort or another. We can't control what the browsers > that talk to such frameworks produce; the IETF doesn't control that, > either. In this case, "practicality beats purity" is the clarion call > of the browser designers, and we'd better be able to support them. I think we're supporting these sufficiently by allowing developers to override the encoding and errors value. I see no argument here against having a default encoding of UTF-8.

> Bill Janssen <bill.janssen@gmail.com> added the comment:
>
> Erik van der Poel at Google has now chimed in with stats on current URL
> usage:
>
> ``...the bottom line is that escaped non-utf-8 is still quite prevalent,
> enough (in my opinion) to require an implementation in Python, possibly
> even allowing for different encodings in the path and query parts (e.g.
> utf-8 path and gb2312 query).''
>
> http://lists.w3.org/Archives/Public/www-international/2008JulSep/0042.html
>
> I think it's worth remembering that a very large proportion of the use
> of Python's urllib.unquote() is in implementations of Web server
> frameworks of one sort or another.  We can't control what the browsers
> that talk to such frameworks produce; the IETF doesn't control that,
> either.  In this case, "practicality beats purity" is the clarion call
> of the browser designers, and we'd better be able to support them.

I think we're supporting these sufficiently by allowing developers to
override the encoding and errors value. I see no argument here against
having a default encoding of UTF-8.

History
Date	User	Action	Args
2008-08-13 17:17:20	gvanrossum	set	recipients: + gvanrossum, loewis, jimjjewett, janssen, orsenthil, pitrou, thomaspinckney3, mgiuca
2008-08-13 17:17:20	gvanrossum	link	issue3300 messages
2008-08-13 17:17:19	gvanrossum	create