Issue 4328: "à" in u"foo" raises a misleading error

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/48578

classification

Title:	"à" in u"foo" raises a misleading error
Type:	enhancement	Stage:	resolved
Components:	Unicode	Versions:	Python 2.7

process

Status:	closed	Resolution:	duplicate
Dependencies:		Superseder:	Misleading exception from unicode.__contains__ View: 1680159
Assigned To:		Nosy List:	ezio.melotti, loewis, mrabarnett, r.david.murray, terry.reedy
Priority:	normal	Keywords:

Created on 2008-11-15 09:27 by ezio.melotti, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (6)
msg75907 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2008-11-15 09:27
With Python 2.x: >>> 'à' in u'foo' Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: 'in <string>' requires string as left operand >>> 'à' in u'xàx' Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: 'in <string>' requires string as left operand The error claims that "'in <string>' requires string as left operand" when actually the left operand is a string. With Python2.6 with unicode_literals: >>> print(b'\x85') à >>> b'\x85' in 'foo' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'in <string>' requires string as left operand With Python3.x the error is slightly different: TypeError: 'in <string>' requires string as left operand, not bytes but then it works with: >>> b'f' in 'foo' True This problem seems somehow related to the implicit decoding of 'à'. I guess that 'à' in u'foo' should raise a UnicodeDecodeError ('xxx' codec can't decode byte 0x85 ...), not a TypeError.
msg75915 - (view)	Author: Matthew Barnett (mrabarnett) *	Date: 2008-11-15 17:37
The left operand is a bytestring and the right operand is a unicode string, so it makes sense that it raises an exception, although it would be clearer if it said "'in <string>' requires unicode string as left operand". I agree that if it's going to do implicit decoding so that it'll accept 'f' in u'foo' then it should probably raise a UnicodeDecodeError when that fails. If it's reporting a /TypeError/ then it should also reject 'f' in u'foo'.
msg75926 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2008-11-16 03:50
Usually, when you do operations involving unicode and normal strings, the latter are coerced to unicode using the default encoding. If the codec is not able to decode the string a UnicodeDecodeError is raised. E.g.: >>> 'à' + u'foo' Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeDecodeError: 'ascii' codec can't decode byte 0x85 in position 0: ordinal not in range(128) The same error is raised with u'%s' % 'à'. I think that 'à' in u'foo' should behave in the same way (i.e. try to decode the string and possibly raise a UnicodeDecodeError). This is probably the most coherent and backward-compatible solution, at least in Python2.x. In Python2.x normal and unicode strings are often mixed and having 'f' in u'foo' that raises a TypeError will probably break lot of code. In Python3.x it could make sense, the strings are unicode by default and you are not supposed to mix byte strings and unicode strings so we may require an explicit decoding. The behavior should be consistent for all the operations, if we decide to raise a TypeError with 'in' it should be raised with '+' and '%' (and possibly others) as well.
msg76217 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2008-11-21 23:00
\| but then it works with: \| >>> b'f' in 'foo' \| True Not True in 3.0rc3. Same message as you quoted: 'in <string>' requires string as left operand, not bytes bytes + string fails too
msg77507 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2008-12-10 08:48
No patch has been proposed yet, so un-targetting for bugfix branches.
msg97035 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2009-12-30 15:33
Actually, I've already fixed this for 2.7 (issue 1680159) by letting the UnicodeDecodeError propagate upward. I don't think making 'f' in u'foo' an error would be a good idea. Unless I'm mistaken the py3 behavior is correct. If someone has a case where the py3 message or behavior is incorrect they can reopen (or open a new issue, since the cause is likely to be different if there is a problem). (Actually, the message you get when you do 'o' in b'foo' is...not obvious; but as I said I think that's a different issue from this one.)

History
Date	User	Action	Args
2022-04-11 14:56:41	admin	set	github: 48578
2009-12-30 15:33:11	r.david.murray	set	status: open -> closed superseder: Misleading exception from unicode.__contains__ versions: - Python 3.2 nosy: + r.david.murray messages: + msg97035 resolution: duplicate stage: needs patch -> resolved
2009-12-30 14:52:06	ezio.melotti	set	priority: normal stage: needs patch type: enhancement versions: + Python 2.7, Python 3.2, - Python 2.6, Python 3.0
2008-12-10 08:48:47	loewis	set	nosy: + loewis messages: + msg77507 versions: - Python 2.5, Python 2.4, Python 2.5.3
2008-11-21 23:00:40	terry.reedy	set	nosy: + terry.reedy messages: + msg76217
2008-11-16 03:50:58	ezio.melotti	set	messages: + msg75926
2008-11-15 17:37:45	mrabarnett	set	nosy: + mrabarnett messages: + msg75915
2008-11-15 09:27:23	ezio.melotti	create