Message 70872 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	janssen
Recipients	gvanrossum, janssen, jimjjewett, lemburg, loewis, mgiuca, orsenthil, pitrou, thomaspinckney3
Date	2008-08-08.00:18:54
SpamBayes Score	3.0755065e-11
Marked as misclassified	No
Message-id	<4b3e516a0808071718p1621b455j86933e2f1a56f144@mail.gmail.com>
In-reply-to	<ca471dc20808071623v74ca2f35m947484f381f2a3fe@mail.gmail.com>

Content
On Thu, Aug 7, 2008 at 4:23 PM, Guido van Rossum <report@bugs.python.org>wrote: > > >> However I fear that this middle ground will in practice cause: > >> > >> (a) more in-the-field failures, since devs are notorious for testing > >> with ASCII only; and > > > > Returning bytes deals with this problem. > > In an unpleasant way. We might as well consider changing all APIs that > deal with URLs to insist on bytes. > That seems a bit over-the-top. Most URL operations are about strings, and most of the APIs should deal with strings; we're talking about the return result of an operation specifically designed to extract binary data from the one place where it's allowed to occur. Vastly smaller than "changing all APIs that deal with URLs". By the way, I see that the email package dodges this by encoding the bytes to strings using the codec "raw-unicode-escape". In other words, byte sequences in the outward form of a string. I'd be OK with that. That is, make the default codec for "unquote" be "raw-unicode-escape". All the bytes will come through unscathed, and people who are naively expecting ASCII strings will still receive them, so the code won't break. This actually seems to be closest to the current usage, so I'm going to change my patch to do that.

On Thu, Aug 7, 2008 at 4:23 PM, Guido van Rossum <report@bugs.python.org>wrote:

>
> >> However I fear that this middle ground will in practice cause:
> >>
> >> (a) more in-the-field failures, since devs are notorious for testing
> >> with ASCII only; and
> >
> > Returning bytes deals with this problem.
>
> In an unpleasant way. We might as well consider changing all APIs that
> deal with URLs to insist on bytes.
>

That seems a bit over-the-top.  Most URL operations *are* about strings, and
most of the APIs should deal with strings; we're talking about the return
result of an operation specifically designed to extract binary data from the
one place where it's allowed to occur.  Vastly smaller than "changing all
APIs that deal with URLs".

By the way, I see that the email package dodges this by encoding the bytes
to strings using the codec "raw-unicode-escape".  In other words, byte
sequences in the outward form of a string.  I'd be OK with that.  That is,
make the default codec for "unquote" be "raw-unicode-escape".  All the bytes
will come through unscathed, and people who are naively expecting ASCII
strings will still receive them, so the code won't break.  This actually
seems to be closest to the current usage, so I'm going to change my patch to
do that.

Files
File name	Uploaded
unnamed	janssen, 2008-08-08.00:18:53

History
Date	User	Action	Args
2008-08-08 00:18:55	janssen	set	recipients: + janssen, lemburg, gvanrossum, loewis, jimjjewett, orsenthil, pitrou, thomaspinckney3, mgiuca
2008-08-08 00:18:54	janssen	link	issue3300 messages
2008-08-08 00:18:54	janssen	create