Message 336453 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	steven.daprano
Recipients	StyXman, steven.daprano
Date	2019-02-24.10:05:25
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1551002725.83.0.385571425891.issue36100@roundup.psfhosted.org>
In-reply-to

Content
I think that analysis is wrong. The Wikipedia page describes the meaning of the Unicode Decimal/Digit/Numeric properties: https://en.wikipedia.org/wiki/Unicode_character_property#Numeric_values_and_types and the characters you show aren't appropriate for converting to ints: py> for c in '一二三四五': ... print(unicodedata.name(c)) ... CJK UNIFIED IDEOGRAPH-4E00 CJK UNIFIED IDEOGRAPH-4E8C CJK UNIFIED IDEOGRAPH-4E09 CJK UNIFIED IDEOGRAPH-56DB CJK UNIFIED IDEOGRAPH-4E94 The first one, for example, is translated as "one; a, an; alone"; it is better read as the word one rather than the numeral 1. (Disclaimer: I am not a Chinese speaker and I welcome correction from an expert.) Likewise U+4E8C, translated as "two; twice". The blog post is factually wrong when it claims: "str.isdigit only returns True for what I said before, strings containing solely the digits 0-9." py> s = "\N{BENGALI DIGIT ONE}\N{BENGALI DIGIT TWO}" py> s.isdigit() True py> int(s) 12 So I think that there's nothing to do here (unless it is perhaps to add a FAQ about it, or improve the docs).

I think that analysis is wrong. The Wikipedia page describes the meaning of the Unicode Decimal/Digit/Numeric properties:

https://en.wikipedia.org/wiki/Unicode_character_property#Numeric_values_and_types

and the characters you show aren't appropriate for converting to ints:

py> for c in '一二三四五':
...     print(unicodedata.name(c))
...
CJK UNIFIED IDEOGRAPH-4E00
CJK UNIFIED IDEOGRAPH-4E8C
CJK UNIFIED IDEOGRAPH-4E09
CJK UNIFIED IDEOGRAPH-56DB
CJK UNIFIED IDEOGRAPH-4E94

The first one, for example, is translated as "one; a, an; alone"; it is better read as the *word* one rather than the numeral 1. (Disclaimer: I am not a Chinese speaker and I welcome correction from an expert.)

Likewise U+4E8C, translated as "two; twice".

The blog post is factually wrong when it claims:

"str.isdigit only returns True for what I said before, strings containing solely the digits 0-9."

py> s = "\N{BENGALI DIGIT ONE}\N{BENGALI DIGIT TWO}"
py> s.isdigit()
True
py> int(s)
12

So I think that there's nothing to do here (unless it is perhaps to add a FAQ about it, or improve the docs).

History
Date	User	Action	Args
2019-02-24 10:05:25	steven.daprano	set	recipients: + steven.daprano, StyXman
2019-02-24 10:05:25	steven.daprano	set	messageid: <1551002725.83.0.385571425891.issue36100@roundup.psfhosted.org>
2019-02-24 10:05:25	steven.daprano	link	issue36100 messages
2019-02-24 10:05:25	steven.daprano	create