Message 70858 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	janssen
Recipients	gvanrossum, janssen, jimjjewett, loewis, mgiuca, orsenthil, pitrou, thomaspinckney3
Date	2008-08-07.21:17:06
SpamBayes Score	0.0012973289
Marked as misclassified	No
Message-id	<1218143828.6.0.619997562476.issue3300@psf.upfronthosting.co.za>
In-reply-to

Content
My main fear with this patch is that "unquote" will become seen as unreliable, because naive software trying to parse URLs will encounter uses of percent-encoding where the encoded octets are not in fact UTF-8 bytes. They're just some set of bytes. A secondary concern is that it will invisibly produce invalid data, because it decodes some non-UTF-8-encoded string that happens to only use UTF-8-valid sequences as the wrong string value. Now, I have to confess that I don't know how common these use cases are in actual URL usage. It would be nice if there was some organization that had a large collection of URLs, and could provide a test set we could run a scanner over :-). As a workaround, though, I've sent a message off to Larry Masinter to ask about this case. He's one of the authors of the URI spec.

My main fear with this patch is that "unquote" will become seen as
unreliable, because naive software trying to parse URLs will encounter
uses of percent-encoding where the encoded octets are not in fact UTF-8
bytes.  They're just some set of bytes.  A secondary concern is that it
will invisibly produce invalid data, because it decodes some
non-UTF-8-encoded string that happens to only use UTF-8-valid sequences
as the wrong string value.

Now, I have to confess that I don't know how common these use cases are
in actual URL usage.  It would be nice if there was some organization
that had a large collection of URLs, and could provide a test set we
could run a scanner over :-).

As a workaround, though, I've sent a message off to Larry Masinter to
ask about this case.  He's one of the authors of the URI spec.

History
Date	User	Action	Args
2008-08-07 21:17:08	janssen	set	recipients: + janssen, gvanrossum, loewis, jimjjewett, orsenthil, pitrou, thomaspinckney3, mgiuca
2008-08-07 21:17:08	janssen	set	messageid: <1218143828.6.0.619997562476.issue3300@psf.upfronthosting.co.za>
2008-08-07 21:17:08	janssen	link	issue3300 messages
2008-08-07 21:17:07	janssen	create