Message336453
I think that analysis is wrong. The Wikipedia page describes the meaning of the Unicode Decimal/Digit/Numeric properties:
https://en.wikipedia.org/wiki/Unicode_character_property#Numeric_values_and_types
and the characters you show aren't appropriate for converting to ints:
py> for c in '一二三四五':
... print(unicodedata.name(c))
...
CJK UNIFIED IDEOGRAPH-4E00
CJK UNIFIED IDEOGRAPH-4E8C
CJK UNIFIED IDEOGRAPH-4E09
CJK UNIFIED IDEOGRAPH-56DB
CJK UNIFIED IDEOGRAPH-4E94
The first one, for example, is translated as "one; a, an; alone"; it is better read as the *word* one rather than the numeral 1. (Disclaimer: I am not a Chinese speaker and I welcome correction from an expert.)
Likewise U+4E8C, translated as "two; twice".
The blog post is factually wrong when it claims:
"str.isdigit only returns True for what I said before, strings containing solely the digits 0-9."
py> s = "\N{BENGALI DIGIT ONE}\N{BENGALI DIGIT TWO}"
py> s.isdigit()
True
py> int(s)
12
So I think that there's nothing to do here (unless it is perhaps to add a FAQ about it, or improve the docs). |
|
Date |
User |
Action |
Args |
2019-02-24 10:05:25 | steven.daprano | set | recipients:
+ steven.daprano, StyXman |
2019-02-24 10:05:25 | steven.daprano | set | messageid: <1551002725.83.0.385571425891.issue36100@roundup.psfhosted.org> |
2019-02-24 10:05:25 | steven.daprano | link | issue36100 messages |
2019-02-24 10:05:25 | steven.daprano | create | |
|