On Thu, Aug 7, 2008 at 4:23 PM, Guido van Rossum <report@bugs.python.org> wrote:

>> However I fear that this middle ground will in practice cause:
>>
>> (a) more in-the-field failures, since devs are notorious for testing
>> with ASCII only; and
>
> Returning bytes deals with this problem.

In an unpleasant way. We might as well consider changing all APIs that
deal with URLs to insist on bytes.

That seems a bit over-the-top. Most URL operations *are* about strings, and most of the APIs should deal with strings; we're talking about the return result of an operation specifically designed to extract binary data from the one place where it's allowed to occur. Vastly smaller than "changing all APIs that deal with URLs".

By the way, I see that the email package dodges this by encoding the bytes to strings using the codec "raw-unicode-escape". In other words, byte sequences in the outward form of a string. I'd be OK with that. That is, make the default codec for "unquote" be "raw-unicode-escape". All the bytes will come through unscathed, and people who are naively expecting ASCII strings will still receive them, so the code won't break. This actually seems to be closest to the current usage, so I'm going to change my patch to do that.