Message 75926 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	ezio.melotti, mrabarnett
Date	2008-11-16.03:50:56
SpamBayes Score	1.7097435e-13
Marked as misclassified	No
Message-id	<1226807458.66.0.687569346274.issue4328@psf.upfronthosting.co.za>
In-reply-to

Content
Usually, when you do operations involving unicode and normal strings, the latter are coerced to unicode using the default encoding. If the codec is not able to decode the string a UnicodeDecodeError is raised. E.g.: >>> 'à' + u'foo' Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeDecodeError: 'ascii' codec can't decode byte 0x85 in position 0: ordinal not in range(128) The same error is raised with u'%s' % 'à'. I think that 'à' in u'foo' should behave in the same way (i.e. try to decode the string and possibly raise a UnicodeDecodeError). This is probably the most coherent and backward-compatible solution, at least in Python2.x. In Python2.x normal and unicode strings are often mixed and having 'f' in u'foo' that raises a TypeError will probably break lot of code. In Python3.x it could make sense, the strings are unicode by default and you are not supposed to mix byte strings and unicode strings so we may require an explicit decoding. The behavior should be consistent for all the operations, if we decide to raise a TypeError with 'in' it should be raised with '+' and '%' (and possibly others) as well.

Usually, when you do operations involving unicode and normal strings,
the latter are coerced to unicode using the default encoding. If the
codec is not able to decode the string a UnicodeDecodeError is raised. E.g.:
>>> 'à' + u'foo'
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0x85 in position 0:
ordinal not in range(128)
The same error is raised with u'%s' % 'à'.

I think that 'à' in u'foo' should behave in the same way (i.e. try to
decode the string and possibly raise a UnicodeDecodeError). This is
probably the most coherent and backward-compatible solution, at least in
Python2.x. In Python2.x normal and unicode strings are often mixed and
having 'f' in u'foo' that raises a TypeError will probably break lot of
code.

In Python3.x it could make sense, the strings are unicode by default and
you are not supposed to mix byte strings and unicode strings so we may
require an explicit decoding.

The behavior should be consistent for all the operations, if we decide
to raise a TypeError with 'in' it should be raised with '+' and '%' (and
possibly others) as well.

History
Date	User	Action	Args
2008-11-16 03:50:58	ezio.melotti	set	recipients: + ezio.melotti, mrabarnett
2008-11-16 03:50:58	ezio.melotti	set	messageid: <1226807458.66.0.687569346274.issue4328@psf.upfronthosting.co.za>
2008-11-16 03:50:58	ezio.melotti	link	issue4328 messages
2008-11-16 03:50:56	ezio.melotti	create