Author gvanrossum
Recipients gvanrossum, janssen, jimjjewett, lemburg, loewis, mgiuca, orsenthil, pitrou, thomaspinckney3
Date 2008-08-12.23:43:45
SpamBayes Score 1.05416e-12
Marked as misclassified No
Message-id <ca471dc20808121643q22abc4e5mb04b416531f28ebe@mail.gmail.com>
In-reply-to <1218554817.69.0.715071509169.issue3300@psf.upfronthosting.co.za>
Content
> Matt Giuca <matt.giuca@gmail.com> added the comment:
> By the way, what is the current status of this bug? Is anybody waiting
> on me to do anything? (Re: Patch 9)

I'll be reviewing it today or tomorrow. From looking at it briefly I
worry that the implementation is pretty slow -- a method call for each
character and a map() call sounds pretty bad.

> To recap my previous list of outstanding issues raised by the review:
>
>> Should unquote accept a bytes/bytearray as well as a str?
> Currently, does not. I think it's meaningless to do so (and how to
> handle >127 bytes, if so?)

The bytes > 127 would be translated as themselves; this follows
logically from how stuff is parsed -- %% and %FF are translated,
everything else is not. But I don't really care, I doubt there's a
need.

>> Lib/email/utils.py:
>> Should encode_rfc2231 with charset=None accept strings with non-ASCII
>> characters, and just encode them to UTF-8?
> Currently does. Suggestion to restrict to ASCII on the review tracker;
> simple fix.

I think I agree with that comment; it seems wrong to return UTF8
without setting that in the header. The alternative would be to
default charset to utf8 if there are any non-ASCII chars in the input.
I'd be okay with that too.

>> Should quote raise a TypeError if given a bytes with encoding/errors
>> arguments? (Motivation: TypeError is what you usually raise if you
>> supply too many args to a function).
> Resolved. Raises TypeError.
>
>> Lib/urllib/parse.py:
>> (As discussed above) Should quote accept safe characters outside the
>> ASCII range (thereby potentially producing invalid URIs)?
> Resolved? Implemented, but too messy and not worth it just to produce
> invalid URIs, so NOT in patch.

Agreed, safe should be ASCII chars only.

> That's only two very minor yes/no issues remaining. Please comment.

I believe patch 9 still has errors defaulting to strict for quote().
Weren't you going to change that?

Regarding using UTF-8 as the default encoding, I still think this the
right thing to do -- while the tables shown by Bill indicate that
there's still a lot of Latin-1 out there, UTF-8 is definitely gaining
on it, and I expect that Python apps, especially Py3k apps, are much
more likely to follow (and hopefully reinforce! :-) this trend than to
lag behind.
History
Date User Action Args
2008-08-12 23:43:49gvanrossumsetrecipients: + gvanrossum, lemburg, loewis, jimjjewett, janssen, orsenthil, pitrou, thomaspinckney3, mgiuca
2008-08-12 23:43:48gvanrossumlinkissue3300 messages
2008-08-12 23:43:45gvanrossumcreate