This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author thomaspinckney3
Recipients loewis, mgiuca, orsenthil, thomaspinckney3
Date 2008-07-09.15:20:47
SpamBayes Score 0.005432515
Marked as misclassified No
Message-id <1215616850.04.0.316970358707.issue3300@psf.upfronthosting.co.za>
In-reply-to
Content
I mentioned this is in a brief python-dev discussion earlier this 
spring, but many popular websites such as Wikipedia and Facebook do use 
UTF-8 as their character encoding scheme for the path and argument 
portion of URLs.

I know there's no RFC that says this is what should be done, but in 
order to make urllib work out-of-the-box on as many common websites as 
possible, I think defaulting to UTF-8 decoding makes a lot of sense. 

Possibly allow an option charset argument to be passed into quote and 
unquote, but default to UTF-8 in the absence of an explicit character 
set being passed in?
History
Date User Action Args
2008-07-09 15:20:50thomaspinckney3setspambayes_score: 0.00543251 -> 0.005432515
recipients: + thomaspinckney3, loewis, orsenthil, mgiuca
2008-07-09 15:20:50thomaspinckney3setspambayes_score: 0.00543251 -> 0.00543251
messageid: <1215616850.04.0.316970358707.issue3300@psf.upfronthosting.co.za>
2008-07-09 15:20:48thomaspinckney3linkissue3300 messages
2008-07-09 15:20:47thomaspinckney3create