Author janssen
Recipients gvanrossum, janssen, jimjjewett, loewis, mgiuca, orsenthil, pitrou, thomaspinckney3
Date 2008-08-07.21:17:06
SpamBayes Score 0.00129733
Marked as misclassified No
Message-id <1218143828.6.0.619997562476.issue3300@psf.upfronthosting.co.za>
In-reply-to
Content
My main fear with this patch is that "unquote" will become seen as
unreliable, because naive software trying to parse URLs will encounter
uses of percent-encoding where the encoded octets are not in fact UTF-8
bytes.  They're just some set of bytes.  A secondary concern is that it
will invisibly produce invalid data, because it decodes some
non-UTF-8-encoded string that happens to only use UTF-8-valid sequences
as the wrong string value.

Now, I have to confess that I don't know how common these use cases are
in actual URL usage.  It would be nice if there was some organization
that had a large collection of URLs, and could provide a test set we
could run a scanner over :-).

As a workaround, though, I've sent a message off to Larry Masinter to
ask about this case.  He's one of the authors of the URI spec.
History
Date User Action Args
2008-08-07 21:17:08janssensetrecipients: + janssen, gvanrossum, loewis, jimjjewett, orsenthil, pitrou, thomaspinckney3, mgiuca
2008-08-07 21:17:08janssensetmessageid: <1218143828.6.0.619997562476.issue3300@psf.upfronthosting.co.za>
2008-08-07 21:17:08janssenlinkissue3300 messages
2008-08-07 21:17:07janssencreate