classification
Title: "à" in u"foo" raises a misleading error
Type: enhancement Stage: resolved
Components: Unicode Versions: Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Misleading exception from unicode.__contains__
View: 1680159
Assigned To: Nosy List: ezio.melotti, loewis, mrabarnett, r.david.murray, terry.reedy
Priority: normal Keywords:

Created on 2008-11-15 09:27 by ezio.melotti, last changed 2009-12-30 15:33 by r.david.murray. This issue is now closed.

Messages (6)
msg75907 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2008-11-15 09:27
With Python 2.x:
>>> 'à' in u'foo'
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: 'in <string>' requires string as left operand
>>> 'à' in u'xàx'
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: 'in <string>' requires string as left operand

The error claims that "'in <string>' requires string as left operand"
when actually the left operand *is* a string.

With Python2.6 with unicode_literals:
>>> print(b'\x85')
à
>>> b'\x85' in 'foo'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'in <string>' requires string as left operand

With Python3.x the error is slightly different:
TypeError: 'in <string>' requires string as left operand, not bytes

but then it works with:
>>> b'f' in 'foo'
True

This problem seems somehow related to the implicit decoding of 'à'. I
guess that 'à' in u'foo' should raise a UnicodeDecodeError ('xxx' codec
can't decode byte 0x85 ...), not a TypeError.
msg75915 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2008-11-15 17:37
The left operand is a bytestring and the right operand is a unicode
string, so it makes sense that it raises an exception, although it would
be clearer if it said "'in <string>' requires unicode string as left
operand".

I agree that if it's going to do implicit decoding so that it'll accept
'f' in u'foo' then it should probably raise a UnicodeDecodeError when
that fails.

If it's reporting a /TypeError/ then it should also reject 'f' in u'foo'.
msg75926 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2008-11-16 03:50
Usually, when you do operations involving unicode and normal strings,
the latter are coerced to unicode using the default encoding. If the
codec is not able to decode the string a UnicodeDecodeError is raised. E.g.:
>>> 'à' + u'foo'
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0x85 in position 0:
ordinal not in range(128)
The same error is raised with u'%s' % 'à'.

I think that 'à' in u'foo' should behave in the same way (i.e. try to
decode the string and possibly raise a UnicodeDecodeError). This is
probably the most coherent and backward-compatible solution, at least in
Python2.x. In Python2.x normal and unicode strings are often mixed and
having 'f' in u'foo' that raises a TypeError will probably break lot of
code.

In Python3.x it could make sense, the strings are unicode by default and
you are not supposed to mix byte strings and unicode strings so we may
require an explicit decoding.

The behavior should be consistent for all the operations, if we decide
to raise a TypeError with 'in' it should be raised with '+' and '%' (and
possibly others) as well.
msg76217 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2008-11-21 23:00
| but then it works with:
| >>> b'f' in 'foo'
| True

Not True in 3.0rc3.  Same message as you quoted:
 'in <string>' requires string as left operand, not bytes

bytes + string fails too
msg77507 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-12-10 08:48
No patch has been proposed yet, so un-targetting for bugfix branches.
msg97035 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-12-30 15:33
Actually, I've already fixed this for 2.7 (issue 1680159) by letting the
UnicodeDecodeError propagate upward.  I don't think making 'f' in u'foo'
an error would be a good idea.

Unless I'm mistaken the py3 behavior is correct.  If someone has a case
where the py3 message or behavior is incorrect they can reopen (or open
a new issue, since the cause is likely to be different if there is a
problem).

(Actually, the message you get when you do 'o' in b'foo' is...not
obvious; but as I said I think that's a different issue from this one.)
History
Date User Action Args
2009-12-30 15:33:11r.david.murraysetstatus: open -> closed

superseder: Misleading exception from unicode.__contains__
versions: - Python 3.2
nosy: + r.david.murray

messages: + msg97035
resolution: duplicate
stage: needs patch -> resolved
2009-12-30 14:52:06ezio.melottisetpriority: normal
stage: needs patch
type: enhancement
versions: + Python 2.7, Python 3.2, - Python 2.6, Python 3.0
2008-12-10 08:48:47loewissetnosy: + loewis
messages: + msg77507
versions: - Python 2.5, Python 2.4, Python 2.5.3
2008-11-21 23:00:40terry.reedysetnosy: + terry.reedy
messages: + msg76217
2008-11-16 03:50:58ezio.melottisetmessages: + msg75926
2008-11-15 17:37:45mrabarnettsetnosy: + mrabarnett
messages: + msg75915
2008-11-15 09:27:23ezio.melotticreate