This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ezio.melotti
Recipients ezio.melotti, mrabarnett
Date 2008-11-16.03:50:56
SpamBayes Score 1.7097435e-13
Marked as misclassified No
Message-id <1226807458.66.0.687569346274.issue4328@psf.upfronthosting.co.za>
In-reply-to
Content
Usually, when you do operations involving unicode and normal strings,
the latter are coerced to unicode using the default encoding. If the
codec is not able to decode the string a UnicodeDecodeError is raised. E.g.:
>>> 'à' + u'foo'
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0x85 in position 0:
ordinal not in range(128)
The same error is raised with u'%s' % 'à'.

I think that 'à' in u'foo' should behave in the same way (i.e. try to
decode the string and possibly raise a UnicodeDecodeError). This is
probably the most coherent and backward-compatible solution, at least in
Python2.x. In Python2.x normal and unicode strings are often mixed and
having 'f' in u'foo' that raises a TypeError will probably break lot of
code.

In Python3.x it could make sense, the strings are unicode by default and
you are not supposed to mix byte strings and unicode strings so we may
require an explicit decoding.

The behavior should be consistent for all the operations, if we decide
to raise a TypeError with 'in' it should be raised with '+' and '%' (and
possibly others) as well.
History
Date User Action Args
2008-11-16 03:50:58ezio.melottisetrecipients: + ezio.melotti, mrabarnett
2008-11-16 03:50:58ezio.melottisetmessageid: <1226807458.66.0.687569346274.issue4328@psf.upfronthosting.co.za>
2008-11-16 03:50:58ezio.melottilinkissue4328 messages
2008-11-16 03:50:56ezio.melotticreate