Issue1772788
Created on 2007-08-12 22:54 by laukpe, last changed 2010-10-25 21:45 by georg.brandl. This issue is now closed.
| Messages (7) | |||
|---|---|---|---|
| msg32630 - (view) | Author: Pekka Laukkanen (laukpe) * | Date: 2007-08-12 22:54 | |
A test using in format "chr(x) in <string>" raises a TypeError if "x" is in range 128-255 (i.e. non-ascii) and string is unicode. This happens even if the unicode string contains only ascii data as the example below demonstrates. Python 2.5.1 (r251:54863, May 2 2007, 16:56:35) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> chr(127) in 'hello' False >>> chr(128) in 'hello' False >>> chr(127) in u'hi' False >>> chr(128) in u'hi' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'in <string>' requires string as left operand This can cause pretty nasty and hard-to-debug bugs in code using "in <string>" format if e.g. user provided data is converted to unicode internally. Most other string operations work nicely between normal and unicode strings and I'd say simply returning False in this situation would be ok too. Issuing a warning similarly as below might be a good idea also. >>> chr(128) == u'' __main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal Finally, the error message is somewhat misleading since the left operand is definitely a string. >>> type(chr(128)) <type 'str'> A real life example of code where this problem exist is telnetlib. I'll submit a separate bug about it as that problem can obviously be fixed in the library itself. |
|||
| msg32631 - (view) | Author: Fredrik Lundh (effbot) * ![]() |
Date: 2007-08-21 08:48 | |
"Most other string operations work nicely between normal and unicode strings" Nope. You *always* get errors if you mix Unicode with NON-ASCII data (unless you've messed up the system's default encoding, which is a bad thing to do if you care about portability). Some examples: >>> chr(128) + u"foo" UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128) >>> u"foo".find(chr(128)) UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128) etc. If there's a bug here, it's that you get a TypeError instead of a ValueError subclass. |
|||
| msg32632 - (view) | Author: Pekka Laukkanen (laukpe) * | Date: 2007-08-21 14:03 | |
Fredrik, you are obviously correct that most operations between normal and unicode strings don't work if the normal string contains non-ascii data. I still do think that a UnicodeWarning like you get from "chr(128) == u'foo'" would be nicer than an exception and prevent problems like the one in telnetlib [1]. If an exception is raised I don't care too much about its type but a better message would make debugging possible problems easier. [1] https://sourceforge.net/tracker/index.php?func=detail&aid=1772794&group_id=5470&atid=105470 |
|||
| msg85040 - (view) | Author: Jack Diederich (jackdied) * ![]() |
Date: 2009-04-01 16:22 | |
assigning all open telnetlib items to myself |
|||
| msg119568 - (view) | Author: Georg Brandl (georg.brandl) * ![]() |
Date: 2010-10-25 17:56 | |
I don't think we'll do anything about this message in 2.x -- in 3.x you get a clear TypeError anyway if you mix str and bytes. |
|||
| msg119570 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2010-10-25 19:16 | |
I'm not sure that I'd consider: >>> 'abc' in b'abcde' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: Type str doesn't support the buffer API a clear error message :) It certainly isn't as bad as the 2.x message, though. |
|||
| msg119581 - (view) | Author: Georg Brandl (georg.brandl) * ![]() |
Date: 2010-10-25 21:45 | |
Ah. I tried the other combination :) |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2010-10-25 21:45:23 | georg.brandl | set | messages: + msg119581 |
| 2010-10-25 19:16:40 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg119570 |
| 2010-10-25 17:56:04 | georg.brandl | set | status: open -> closed nosy: + georg.brandl messages: + msg119568 resolution: out of date |
| 2009-04-01 16:22:08 | jackdied | set | nosy:
+ jackdied messages: + msg85040 |
| 2007-08-12 22:54:08 | laukpe | create | |
