classification
Title: Document the differences between str.isdigit, isdecimal and isnumeric
Type: enhancement Stage:
Components: Documentation Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: StyXman, docs@python, mark.dickinson, steven.daprano, xtreak
Priority: normal Keywords:

Created on 2019-02-24 09:02 by StyXman, last changed 2019-02-24 13:32 by steven.daprano.

Messages (12)
msg336451 - (view) Author: Marcos Dione (StyXman) * Date: 2019-02-24 09:02
Following https://blog.lerner.co.il/pythons-str-isdigit-vs-str-isnumeric/, we have this:

Python 3.8.0a1+ (heads/master:001fee14e0, Feb 20 2019, 08:28:02)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> '一二三四五'.isnumeric()
True

>>> int('一二三四五')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '一二三四五'

>>> float('一二三四五')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: '一二三四五'

I think Reuven is right, these should be accepted as input. I just wonder if we should do the same for f.i. roman numerics...
msg336453 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2019-02-24 10:05
I think that analysis is wrong. The Wikipedia page describes the meaning of the Unicode Decimal/Digit/Numeric properties:

https://en.wikipedia.org/wiki/Unicode_character_property#Numeric_values_and_types

and the characters you show aren't appropriate for converting to ints:

py> for c in '一二三四五':
...     print(unicodedata.name(c))
...
CJK UNIFIED IDEOGRAPH-4E00
CJK UNIFIED IDEOGRAPH-4E8C
CJK UNIFIED IDEOGRAPH-4E09
CJK UNIFIED IDEOGRAPH-56DB
CJK UNIFIED IDEOGRAPH-4E94

The first one, for example, is translated as "one; a, an; alone"; it is better read as the *word* one rather than the numeral 1. (Disclaimer: I am not a Chinese speaker and I welcome correction from an expert.)

Likewise U+4E8C, translated as "two; twice".

The blog post is factually wrong when it claims:

"str.isdigit only returns True for what I said before, strings containing solely the digits 0-9."

py> s = "\N{BENGALI DIGIT ONE}\N{BENGALI DIGIT TWO}"
py> s.isdigit()
True
py> int(s)
12

So I think that there's nothing to do here (unless it is perhaps to add a FAQ about it, or improve the docs).
msg336454 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2019-02-24 10:14
[Steven posted his answer while I was composing mine; posting mine anyway ...]

I don't think this would make sense. There are lots of characters that can't be interpreted as a decimal digit but for which `isnumeric` nevertheless gives True.

>>> s = "㉓⅗⒘Ⅻ"
>>> for c in s: print(unicodedata.name(c))
... 
CIRCLED NUMBER TWENTY THREE
VULGAR FRACTION THREE FIFTHS
NUMBER SEVENTEEN FULL STOP
ROMAN NUMERAL TWELVE
>>> s.isnumeric()
True

What value would you expect `int(s)` to have in this situation?

Note that `int` and `float` already accept non-ASCII digits:

>>> s = "١٢٣٤٥٦٧٨٩"
>>> int(s)
123456789
>>> float(s)
123456789.0
msg336455 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2019-02-24 10:24
> What value would you expect `int(s)` to have in this situation?

Actually, I guess that question was too easy. The value for `int(s)` should *obviously* be 23 * 1000 + (3/5) * 100 + 17 * 10 + 12 = 23242. I should have used ⅐ instead of ⅗.

Anyway, agreed with Steven that no change should be made here.
msg336456 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-02-24 10:31
Not a unicode expert but searching along the lines there was a note added on issue10610 that int() is supported for characters of 'Nd' category. So to check if a string can be converted to integer with help of int() I should be using str.isdecimal() instead of str.isnumeric() ?

https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex

> The numeric literals accepted include the digits 0 to 9 or any Unicode equivalent (code points with the Nd property). See http://www.unicode.org/Public/10.0.0/ucd/extracted/DerivedNumericType.txt for a complete list of code points with the Nd property.


>>> [unicodedata.category(c) for c in '一二三四五']
['Lo', 'Lo', 'Lo', 'Lo', 'Lo']
>>> [unicodedata.category(c) for c in '\N{BENGALI DIGIT ONE}\N{BENGALI DIGIT TWO}']
['Nd', 'Nd']
msg336459 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2019-02-24 10:57
> So to check if a string can be converted to integer with help of int() I should be using str.isdecimal() instead of str.isnumeric() ?

Yes, I think that's correct. The characters matched by `str.isdecimal` are a subset of those matched by `str.isdigit`, which in turn are a subset of those matched by `str.isnumeric`. `int` and `float` required general category Nd, which corresponds to `str.isdigit`.
msg336460 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2019-02-24 10:58
> which corresponds to `str.isdigit`.

Gah! That should have said:

> which corresponds to `str.isdecimal`.

Sorry.
msg336461 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-02-24 11:07
> `int` and `float` required general category Nd, which corresponds to `str.isdigit`.

Sorry, did you mean str.isdecimal? since there could be a subset where isdigit is True and isdecimal returns False. 

>>> '\u00B2'.isdecimal()
False
>>> '\u00B2'.isdigit()
True
>>> import unicodedata
>>> unicodedata.category('\u00B2')
'No'
>>> int('\u00B2')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '²'

Is this worth an FAQ or an addition to the existing note on int that specifies characters should belong to 'Nd' category to add a note that str.isdecimal should return True
msg336462 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2019-02-24 11:13
On Sun, Feb 24, 2019 at 11:07:41AM +0000, Karthikeyan Singaravelan wrote:

> Is this worth an FAQ or an addition to the existing note on int that 
> specifies characters should belong to 'Nd' category to add a note that 
> str.isdecimal should return True

Yes, I think that there should be a FAQ about the differences between 
isdigit, isdecimal and isnumeric, pointing to the relevant Unicode 
documentation. I would also like to see a briefer note added to each of 
the string methods docstrings as well.
msg336464 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-02-24 11:44
Agreed, though str.isnumeric behavior might seem to be correct in terms of user who knows unicode internals the naming makes it easy to be used for a general user on trying to determine if the string can be used for int() without knowing unicode internals. I am not sure how this can be explained in simpler terms but it would be good if clarified in the docs to avoid confusion. 

There seems to be have been thread [0] in the past about multiple ways to check for a unicode literal to be number causing confusion. It adds more confusion on Python 2 where strings are not unicode by default.

$ python2.7
Python 2.7.14 (default, Mar 12 2018, 13:54:56)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> '\u00B2'.isdigit()
False
>>> u'\u00B2'.isdigit()
True

[0] https://mail.python.org/pipermail/python-list/2012-May/624340.html
msg336466 - (view) Author: Marcos Dione (StyXman) * Date: 2019-02-24 12:39
Thanks for all the examples, I'm convinced.
msg336467 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2019-02-24 13:32
I'm re-opening the ticket with a change of subject, because I think this should be treated as a documentation enhancement:

- improve the docstrings for str.isdigit, isnumeric and isdecimal to make it clear what each does (e.g. what counts as a digit);

- similarly improve the documentation for int and float? although the existing comment may be sufficient

https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex

- add a FAQ summarizing the situation.

I don't think we need to worry about backporting the docs to Python 2, but if others disagree, I won't object.
History
Date User Action Args
2019-02-24 13:32:58steven.dapranosetstatus: closed -> open

resolution: not a bug ->

assignee: docs@python
stage: resolved ->
title: int() and float() should accept any isnumeric() digit -> Document the differences between str.isdigit, isdecimal and isnumeric
nosy: + docs@python
versions: - Python 2.7, Python 3.4, Python 3.5, Python 3.6, Python 3.7, Python 3.8
messages: + msg336467
components: + Documentation, - Library (Lib)
type: behavior -> enhancement
2019-02-24 12:39:23StyXmansetstatus: open -> closed
versions: + Python 3.4, Python 3.5, Python 3.6
messages: + msg336466

resolution: not a bug
stage: resolved
2019-02-24 11:44:36xtreaksetmessages: + msg336464
versions: - Python 3.4, Python 3.5, Python 3.6
2019-02-24 11:13:40steven.dapranosetmessages: + msg336462
2019-02-24 11:07:41xtreaksetmessages: + msg336461
2019-02-24 10:58:35mark.dickinsonsetmessages: + msg336460
2019-02-24 10:57:40mark.dickinsonsetmessages: + msg336459
2019-02-24 10:31:50xtreaksetnosy: + xtreak
messages: + msg336456
2019-02-24 10:24:17mark.dickinsonsetmessages: + msg336455
2019-02-24 10:14:50mark.dickinsonsetnosy: + mark.dickinson
messages: + msg336454
2019-02-24 10:05:25steven.dapranosetnosy: + steven.daprano
messages: + msg336453
2019-02-24 09:02:22StyXmancreate