Message 122885 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	belopolsky
Recipients	belopolsky, docs@python
Date	2010-11-30.05:46:44
SpamBayes Score	1.8290539e-08
Marked as misclassified	No
Message-id	<1291096006.2.0.0136231958849.issue10587@psf.upfronthosting.co.za>
In-reply-to

Content
On Mon, Nov 29, 2010 at 4:13 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote: >> - How specific should library reference manual be in defining methods >> affected by UCD such as str.upper()? > > It should specify what this actually does in Unicode terminology > (probably in addition to a layman's rephrase of that) > http://mail.python.org/pipermail/python-dev/2010-November/106155.html Some of the clarifications may actually lead to a conclusion that current behavior is wrong. For example, Unicode defines Alphabetic property as Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic http://www.unicode.org/reports/tr44/tr44-6.html#Alphabetic However, str.isalpha() is defined as just Lu + Ll + Lt + Lm + Lo. For example, >>> import unicodedata as ud >>> ud.category('Ⅴ') 'Nl' >>> 'Ⅴ'.isalpha() False >>> ud.name('Ⅴ') 'ROMAN NUMERAL FIVE' As far a I can tell, the source of Other_Alphabetic property data, http://unicode.org/Public/UNIDATA/PropList.txt, is not even included in the unicodedata module and neither is SpecialCasing.txt which is necessary for implementing a compliant case mapping algorithm.

On Mon, Nov 29, 2010 at 4:13 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
>> - How specific should library reference manual be in defining methods
>> affected by UCD such as str.upper()?
>
> It should specify what this actually does in Unicode terminology
> (probably in addition to a layman's rephrase of that)
>

http://mail.python.org/pipermail/python-dev/2010-November/106155.html

Some of the clarifications may actually lead to a conclusion that current behavior is wrong.  For example, Unicode defines Alphabetic property as Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic

http://www.unicode.org/reports/tr44/tr44-6.html#Alphabetic

However, str.isalpha() is defined as just Lu + Ll + Lt + Lm + Lo.  For example,

>>> import unicodedata as ud
>>> ud.category('Ⅴ')
'Nl'
>>> 'Ⅴ'.isalpha()
False
>>> ud.name('Ⅴ')
'ROMAN NUMERAL FIVE'

As far a I can tell, the source of Other_Alphabetic property data,
http://unicode.org/Public/UNIDATA/PropList.txt, is not even included in the unicodedata module and neither is SpecialCasing.txt which is necessary for implementing a compliant case mapping algorithm.

History
Date	User	Action	Args
2010-11-30 05:46:46	belopolsky	set	recipients: + belopolsky, docs@python
2010-11-30 05:46:46	belopolsky	set	messageid: <1291096006.2.0.0136231958849.issue10587@psf.upfronthosting.co.za>
2010-11-30 05:46:44	belopolsky	link	issue10587 messages
2010-11-30 05:46:44	belopolsky	create