Message122867
Martin v. Löwis wrote:
>
> Martin v. Löwis <martin@v.loewis.de> added the comment:
>
> This is not a bug, see
>
> http://www.unicode.org/reports/tr44/#Numeric_Value
>
> Characters have a Numeric_Type property of either null, Decimal, Digit, or Numeric. For non-Unihan characters, this is denoted by filling out either no column, or (6,7,and 8), or (7 and 8), or (8), respectively, as implemented by makeunicodedata.py. Unihan characters have only null or Numeric as their Numeric_Type property, never Decimal nor Digit, see
>
> http://www.unicode.org/reports/tr44/#Numeric_Type_Han
>
> Therefore, it is correct that digit() raises a ValueError for U+4e09.
You're right. I guess this is a bug in the UCD or TR44/TR38 itself.
It looks like the numeric properties are not separated in the
Unihan database in the same way they are for the standard UCD.
Unihan separates based on usage context, whereas UCS takes
a parsing approach. |
|
Date |
User |
Action |
Args |
2010-11-29 20:42:32 | lemburg | set | recipients:
+ lemburg, loewis, belopolsky, ezio.melotti |
2010-11-29 20:42:30 | lemburg | link | issue10575 messages |
2010-11-29 20:42:30 | lemburg | create | |
|