Message 71073 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	gvanrossum
Recipients	gvanrossum, janssen, jimjjewett, lemburg, loewis, mgiuca, orsenthil, pitrou, thomaspinckney3
Date	2008-08-12.23:43:45
SpamBayes Score	1.0541568e-12
Marked as misclassified	No
Message-id	<ca471dc20808121643q22abc4e5mb04b416531f28ebe@mail.gmail.com>
In-reply-to	<1218554817.69.0.715071509169.issue3300@psf.upfronthosting.co.za>

Content
> Matt Giuca <matt.giuca@gmail.com> added the comment: > By the way, what is the current status of this bug? Is anybody waiting > on me to do anything? (Re: Patch 9) I'll be reviewing it today or tomorrow. From looking at it briefly I worry that the implementation is pretty slow -- a method call for each character and a map() call sounds pretty bad. > To recap my previous list of outstanding issues raised by the review: > >> Should unquote accept a bytes/bytearray as well as a str? > Currently, does not. I think it's meaningless to do so (and how to > handle >127 bytes, if so?) The bytes > 127 would be translated as themselves; this follows logically from how stuff is parsed -- %% and %FF are translated, everything else is not. But I don't really care, I doubt there's a need. >> Lib/email/utils.py: >> Should encode_rfc2231 with charset=None accept strings with non-ASCII >> characters, and just encode them to UTF-8? > Currently does. Suggestion to restrict to ASCII on the review tracker; > simple fix. I think I agree with that comment; it seems wrong to return UTF8 without setting that in the header. The alternative would be to default charset to utf8 if there are any non-ASCII chars in the input. I'd be okay with that too. >> Should quote raise a TypeError if given a bytes with encoding/errors >> arguments? (Motivation: TypeError is what you usually raise if you >> supply too many args to a function). > Resolved. Raises TypeError. > >> Lib/urllib/parse.py: >> (As discussed above) Should quote accept safe characters outside the >> ASCII range (thereby potentially producing invalid URIs)? > Resolved? Implemented, but too messy and not worth it just to produce > invalid URIs, so NOT in patch. Agreed, safe should be ASCII chars only. > That's only two very minor yes/no issues remaining. Please comment. I believe patch 9 still has errors defaulting to strict for quote(). Weren't you going to change that? Regarding using UTF-8 as the default encoding, I still think this the right thing to do -- while the tables shown by Bill indicate that there's still a lot of Latin-1 out there, UTF-8 is definitely gaining on it, and I expect that Python apps, especially Py3k apps, are much more likely to follow (and hopefully reinforce! :-) this trend than to lag behind.

> Matt Giuca <matt.giuca@gmail.com> added the comment:
> By the way, what is the current status of this bug? Is anybody waiting
> on me to do anything? (Re: Patch 9)

I'll be reviewing it today or tomorrow. From looking at it briefly I
worry that the implementation is pretty slow -- a method call for each
character and a map() call sounds pretty bad.

> To recap my previous list of outstanding issues raised by the review:
>
>> Should unquote accept a bytes/bytearray as well as a str?
> Currently, does not. I think it's meaningless to do so (and how to
> handle >127 bytes, if so?)

The bytes > 127 would be translated as themselves; this follows
logically from how stuff is parsed -- %% and %FF are translated,
everything else is not. But I don't really care, I doubt there's a
need.

>> Lib/email/utils.py:
>> Should encode_rfc2231 with charset=None accept strings with non-ASCII
>> characters, and just encode them to UTF-8?
> Currently does. Suggestion to restrict to ASCII on the review tracker;
> simple fix.

I think I agree with that comment; it seems wrong to return UTF8
without setting that in the header. The alternative would be to
default charset to utf8 if there are any non-ASCII chars in the input.
I'd be okay with that too.

>> Should quote raise a TypeError if given a bytes with encoding/errors
>> arguments? (Motivation: TypeError is what you usually raise if you
>> supply too many args to a function).
> Resolved. Raises TypeError.
>
>> Lib/urllib/parse.py:
>> (As discussed above) Should quote accept safe characters outside the
>> ASCII range (thereby potentially producing invalid URIs)?
> Resolved? Implemented, but too messy and not worth it just to produce
> invalid URIs, so NOT in patch.

Agreed, safe should be ASCII chars only.

> That's only two very minor yes/no issues remaining. Please comment.

I believe patch 9 still has errors defaulting to strict for quote().
Weren't you going to change that?

Regarding using UTF-8 as the default encoding, I still think this the
right thing to do -- while the tables shown by Bill indicate that
there's still a lot of Latin-1 out there, UTF-8 is definitely gaining
on it, and I expect that Python apps, especially Py3k apps, are much
more likely to follow (and hopefully reinforce! :-) this trend than to
lag behind.

History
Date	User	Action	Args
2008-08-12 23:43:49	gvanrossum	set	recipients: + gvanrossum, lemburg, loewis, jimjjewett, janssen, orsenthil, pitrou, thomaspinckney3, mgiuca
2008-08-12 23:43:48	gvanrossum	link	issue3300 messages
2008-08-12 23:43:45	gvanrossum	create