Author mgiuca
Recipients gvanrossum, janssen, jimjjewett, lemburg, loewis, mgiuca, orsenthil, pitrou, thomaspinckney3
Date 2008-08-10.07:05:44
SpamBayes Score 1.56376e-07
Marked as misclassified No
Message-id <1218351947.75.0.785102512508.issue3300@psf.upfronthosting.co.za>
In-reply-to
Content
Guido suggested that quote's "safe" parameter should allow any
character, not just ASCII range. I've implemented this now. It was a lot
messier than I imagined.

The problem is that in my older patches, both 's' and 'safe' are encoded
to bytes right away, and the rest of the process is just octet encoding
(matching each byte against the safe set to see whether or not to quote it).

The new implementation requires that you delay encoding both of these
till the iteration over the string, so you match each *character*
against the safe set, then encode it if it's not in 'safe'. Now the
problem is some encodings/errors produce bytes which are in the safe
range. For instance quote('\u6f22', encoding='latin-1',
errors='xmlcharrefreplace') should give "%26%2328450%3B" (which is
"&#28450;" encoded). To preserve this behaviour, you then have to check
each *byte* of the encoded character against a 'safe bytes' set. I
believe that will slow down the implementation considerably.

In summary, it requires two levels of encoding: first characters, then
bytes. You can see how messy it made my quote implementation - I've
attached the patch (parse.py.patch8+allsafe).

I don't think it's worth the extra code bloat and performance hit just
to implement a feature whose only use is producing invalid URIs (since
URIs are supposed to only have ASCII characters). Does anyone disagree,
and want this feature in?
History
Date User Action Args
2008-08-10 07:05:48mgiucasetrecipients: + mgiuca, lemburg, gvanrossum, loewis, jimjjewett, janssen, orsenthil, pitrou, thomaspinckney3
2008-08-10 07:05:47mgiucasetmessageid: <1218351947.75.0.785102512508.issue3300@psf.upfronthosting.co.za>
2008-08-10 07:05:47mgiucalinkissue3300 messages
2008-08-10 07:05:45mgiucacreate