This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients belopolsky, ezio.melotti, lemburg, loewis
Date 2010-11-29.20:42:30
SpamBayes Score 1.53062e-05
Marked as misclassified No
Message-id <4CF41035.1010205@egenix.com>
In-reply-to <1291061459.3.0.34838576769.issue10575@psf.upfronthosting.co.za>
Content
Martin v. Löwis wrote:
> 
> Martin v. Löwis <martin@v.loewis.de> added the comment:
> 
> This is not a bug, see
> 
> http://www.unicode.org/reports/tr44/#Numeric_Value
> 
> Characters have a Numeric_Type property of either null, Decimal, Digit, or Numeric. For non-Unihan characters, this is denoted by filling out either no column, or (6,7,and 8), or (7 and 8), or (8), respectively, as implemented by makeunicodedata.py. Unihan characters have only null or Numeric as their Numeric_Type property, never Decimal nor Digit, see
> 
>  http://www.unicode.org/reports/tr44/#Numeric_Type_Han
> 
> Therefore, it is correct that digit() raises a ValueError for U+4e09.

You're right. I guess this is a bug in the UCD or TR44/TR38 itself.

It looks like the numeric properties are not separated in the
Unihan database in the same way they are for the standard UCD.

Unihan separates based on usage context, whereas UCS takes
a parsing approach.
History
Date User Action Args
2010-11-29 20:42:32lemburgsetrecipients: + lemburg, loewis, belopolsky, ezio.melotti
2010-11-29 20:42:30lemburglinkissue10575 messages
2010-11-29 20:42:30lemburgcreate