Author mgiuca
Recipients BreamoreBoy, adamnelson, ajaksu2, collinwinter, eric.araujo, ezio.melotti, mastrodomenico, mgiuca, nagle, orsenthil, pitrou, vak, varmaa, vstinner
Date 2010-07-19.12:53:25
SpamBayes Score 0.000137581
Marked as misclassified No
Message-id <1279544007.96.0.781099148111.issue1712522@psf.upfronthosting.co.za>
In-reply-to
Content
> I think everyone assumed that the parameter should be a "str" object
> and nothing else. Apparently some people used it accidentally with
> some unicode strings and it "worked" until these strings contained
> non-ASCII characters.

I don't consider use of Unicode strings in Python 2.7 to be "accidental". In my experience with Python 2, pretty much everything already works with Unicode strings, and it's best practice to use them.

Now one of the major goals of Python 2.6/2.7 is to allow the writing of code which ports smoothly to Python 3. Unicode support is a major issue here. To quote "What's new in Python 3" (http://docs.python.org/py3k/whatsnew/3.0.html):
"To be prepared in Python 2.x, start using unicode for all unencoded text, and str for binary or encoded data only. Then the 2to3  tool will do most of the work for you."
Having functions in Python 2.7 which don't accept Unicode (or worse, raise random exceptions) runs against best practices for moving to Python 3.

> If we were following you, we would add "encoding" and "errors" arguments
> to any str-accepting 2.x function, so that it can also accept unicode
> strings. That's certainly not a reasonable solution.

No, that's certainly not necessary. You don't need an "encoding" or "errors" argument on any given function in order to support unicode. In fact, most code written to work with strings naturally works with Unicode because unicode strings support the same basic operations.

The need for an "encoding" and "errors", and in fact the need to deal with string encoding at all with urllib.quote is due to the special nature of URLs. If URLs had a syntax like %uXXXX then there would be no need for encoding Unicode strings (as in UTF-8) at all. However, because the RFC specifies that Unicode strings are to be encoded into a byte sequence *using an unspecified encoding*, it is therefore necessary, for this specific function, to ask the programmer which encoding to use.

Thus I assure you, this is not just one random function I have picked to add these arguments to. This is the only one (that I know of) that requires them to support Unicode.

> The original issue is against robotparser, and clearly states a bug
> (robotparser doesn't work in some cases).

I don't know why this keeps coming back to robotparser. The original bug was not against robotparser; it is called "quote throws exception on Unicode URL" and that is the bug. Robotparser was just one demonstrative piece of code which failed because of it.

Having said that, I don't expect to continue this argument. If you (the Python developers) decide that it's too late to accept this, then I won't object to reverting it.
History
Date User Action Args
2010-07-19 12:53:28mgiucasetrecipients: + mgiuca, collinwinter, varmaa, nagle, orsenthil, pitrou, vstinner, ajaksu2, ezio.melotti, eric.araujo, mastrodomenico, vak, adamnelson, BreamoreBoy
2010-07-19 12:53:27mgiucasetmessageid: <1279544007.96.0.781099148111.issue1712522@psf.upfronthosting.co.za>
2010-07-19 12:53:26mgiucalinkissue1712522 messages
2010-07-19 12:53:25mgiucacreate