Message70858
My main fear with this patch is that "unquote" will become seen as
unreliable, because naive software trying to parse URLs will encounter
uses of percent-encoding where the encoded octets are not in fact UTF-8
bytes. They're just some set of bytes. A secondary concern is that it
will invisibly produce invalid data, because it decodes some
non-UTF-8-encoded string that happens to only use UTF-8-valid sequences
as the wrong string value.
Now, I have to confess that I don't know how common these use cases are
in actual URL usage. It would be nice if there was some organization
that had a large collection of URLs, and could provide a test set we
could run a scanner over :-).
As a workaround, though, I've sent a message off to Larry Masinter to
ask about this case. He's one of the authors of the URI spec. |
|
Date |
User |
Action |
Args |
2008-08-07 21:17:08 | janssen | set | recipients:
+ janssen, gvanrossum, loewis, jimjjewett, orsenthil, pitrou, thomaspinckney3, mgiuca |
2008-08-07 21:17:08 | janssen | set | messageid: <1218143828.6.0.619997562476.issue3300@psf.upfronthosting.co.za> |
2008-08-07 21:17:08 | janssen | link | issue3300 messages |
2008-08-07 21:17:07 | janssen | create | |
|