This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mgiuca
Recipients gvanrossum, janssen, jimjjewett, lemburg, loewis, mgiuca, orsenthil, pitrou, thomaspinckney3
Date 2008-08-10.05:05:04
SpamBayes Score 3.230087e-08
Marked as misclassified No
Message-id <1218344709.3.0.965800855029.issue3300@psf.upfronthosting.co.za>
In-reply-to
Content
I've been thinking more about the errors="strict" default. I think this
was Guido's suggestion. I've decided I'd rather stick with errors="replace".

I changed errors="replace" to errors="strict" in patch 8, but now I'm
worried that will cause problems, specifically for unquote. Once again,
all the code in the stdlib which calls unquote doesn't provide an errors
option, so the default will be the only choice when using these other
services.

I'm concerned that there'll be lots of unhandled exceptions flying
around for URLs which aren't encoded with UTF-8, and a conscientious
programmer will not be able to protect against user errors.

Take the cgi module as an example. Typical usage is to write:
> fields = cgi.FieldStorage()
> foo = fields.getFirst("foo")

If the QUERY_STRING is "foo=w%FCt" (Latin-1), with errors='strict', you
get a UnicodeDecodeError when you call cgi.FieldStorage(). With
errors='replace', the variable foo will be "w�t". I think in general I'd
rather have '�'s in my program (representing invalid user input) than
exceptions, since this is usually a user input error, not a programming
error.

(One problem is that all I can do to handle this is catch a
UnicodeDecodeError on the call to FieldStorage; then I can't access any
of the data).

Now maybe something we can think about is propagating the "encoding" and
"errors" argument through to a few other major functions (such as
cgi.parse_qsl, cgi.FieldStorage and urllib.parse.urlencode), but that
should be separately to this patch.
History
Date User Action Args
2008-08-10 05:05:09mgiucasetrecipients: + mgiuca, lemburg, gvanrossum, loewis, jimjjewett, janssen, orsenthil, pitrou, thomaspinckney3
2008-08-10 05:05:09mgiucasetmessageid: <1218344709.3.0.965800855029.issue3300@psf.upfronthosting.co.za>
2008-08-10 05:05:08mgiucalinkissue3300 messages
2008-08-10 05:05:04mgiucacreate