Issue 4314: isalpha bug - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/48564

classification

Title:	isalpha bug
Type:	behavior	Stage:
Components:	Unicode	Versions:	Python 2.5

process

Status:	closed	Resolution:	works for me
Dependencies:		Superseder:
Assigned To:		Nosy List:	ZooKeeper, lemburg, vstinner
Priority:	normal	Keywords:

Created on 2008-11-13 14:39 by ZooKeeper, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (7)
msg75820 - (view)	Author: ZooKeeper (ZooKeeper)	Date: 2008-11-13 14:39
This may be a little tricky to recreate but here it is: q = u'абвгде' q.isalpha() True foo = u'ч' foo.isalpha() False So the Russian character u'ч' and u'ё' as well as a bunch of others is not recognized by isalpha as a alphabetic character, which it is a matter of fact. This applies to both capital and regular versions of the letters. http://en.wikipedia.org/wiki/%D0%81 http://en.wikipedia.org/wiki/Che_(Cyrillic) Using: Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on win32
msg75821 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-11-13 14:46
Are you sure that you've used the right source code encoding for writing these characters ? Note that the Unicode .isalpha() method relies entirely on what the Unicode database provides as code point information. If a character is marked as not being alphanumeric (ie. is not in one of the categories 'Ll', 'Lu', 'Lt', 'Lo' or 'Lm'), it will return False.
msg75822 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-11-13 14:48
FWIW: I get the following in Python 2.5: >>> print u'\u0401' Ё >>> print u'\u0451' ё >>> print u'\u0401'.isalpha() True >>> print u'\u0451'.isalpha() True
msg75823 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-11-13 14:49
... and for the other character: >>> print u'\u0427' Ч >>> print u'\u0447' ч >>> print u'\u0427'.isalpha() True >>> print u'\u0447'.isalpha() True Looks fine.
msg75824 - (view)	Author: STINNER Victor (vstinner) *	Date: 2008-11-13 14:52
Results on Linux: With Python 2.7 trunk: >>> print(', '.join('%s:%s' % (c, c.isalpha()) for c in u'абвгдеч')) а:True, б:True, в:True, г:True, д:True, е:True, ч:True With Python 2.5.1: >>> print(', '.join('%s:%s' % (c, c.isalpha()) for c in u'абвгдеч')) а:True, б:True, в:True, г:True, д:True, е:True, ч:True With Python 3.0 trunk: >>> print(', '.join('%s:%s' % (c, c.isalpha()) for c in 'абвгдеч')) а:True, б:True, в:True, г:True, д:True, е:True, ч:True Are you sure that you really typed the character "ч"? Can you retry using unichr(0x447).isalpha()? Test with Python3: >>> print(' - '.join((r"\u%04x" % x) for x in range(0x400, 0x4ff+1) if not chr(x).isalpha())) \u0482 - \u0483 - \u0484 - \u0485 - \u0486 - \u0487 - \u0488 - \u0489 Which means that Python thinks that all unicode character in range U+0400..U+04ff are letters except the range U+0482..U+0489 (thousands sign ҂ to million sign ҉).
msg75826 - (view)	Author: ZooKeeper (ZooKeeper)	Date: 2008-11-13 15:55
I'll investigate it in further shortly, but for now replicating the test. print u'\u0451' ¸ print u'\u0427' × Something must be going on here. Running Win XP.
msg75827 - (view)	Author: STINNER Victor (vstinner) *	Date: 2008-11-13 15:57
$ python Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40) >>> print u'\u0451' ё >>> print u'\u0427' Ч @ZooKeeper: Try Python 2.6, I guess that your bug is already fixed.

History
Date	User	Action	Args
2022-04-11 14:56:41	admin	set	github: 48564
2008-11-13 15:57:45	vstinner	set	messages: + msg75827
2008-11-13 15:55:27	ZooKeeper	set	messages: + msg75826
2008-11-13 14:52:32	vstinner	set	nosy: + vstinner messages: + msg75824
2008-11-13 14:49:53	lemburg	set	status: open -> closed resolution: works for me messages: + msg75823
2008-11-13 14:48:18	lemburg	set	messages: + msg75822
2008-11-13 14:46:08	lemburg	set	nosy: + lemburg messages: + msg75821 components: - Extension Modules
2008-11-13 14:39:37	ZooKeeper	create