classification
Title: unicode.isdecimal bug in online Python 2 documentation
Type: behavior Stage: patch review
Components: Documentation Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, pewscorner, zheng
Priority: normal Keywords: easy, patch

Created on 2019-03-24 14:47 by pewscorner, last changed 2019-04-10 07:31 by zheng.

Pull Requests
URL Status Linked Edit
PR 12757 open zheng, 2019-04-10 07:27
Messages (2)
msg338736 - (view) Author: PEW's Corner (pewscorner) * Date: 2019-03-24 14:47
The online Python 2 documentation for unicode.isdecimal (https://docs.python.org/2/library/stdtypes.html#unicode.isdecimal) incorrectly states:

"Decimal characters include digit characters".

This is wrong (decimal characters are actually a subset of digit characters), and u'\u00b3' is an example of a character that is a digit but not a decimal.

Issue 26483 (https://bugs.python.org/issue26483) corrected the same bug in the Python 3 documentation, and a similar correction should be applied to the Python 2 documentation.
msg339832 - (view) Author: zheng (zheng) * Date: 2019-04-10 07:31
I propose we copy over the exact changes made to the Python 3 documentation.

I looked through the code mentioned in the other thread. Namely, `Objects/unicodeobject.c` and `Tools/unicode/makeunicodedata.py`. The implementation is identical between python 2 and python 3. The only difference appears to be the unicode version used.

    # decimal digit, integer digit
                decimal = 0
                if record[6]:
                    flags |= DECIMAL_MASK
                    decimal = int(record[6])
                digit = 0
                if record[7]:
                    flags |= DIGIT_MASK
                    digit = int(record[7])
                if record[8]:
                    flags |= NUMERIC_MASK
                    numeric.setdefault(record[8], []).append(char)

Another form of validation I did was enumerate all the digits and decimals and compare between versions. It looks like the general change is that there are a bunch of new unicode characters introduced in python 3. The exception is NEW TAI LUE THAM DIGIT ONE which gets recategorized as a digit.

python 2, compiled with UCS4
for u in map(unichr, list(range(0x10FFFF))):
    if unicode.isdigit(u):
        print(unicodedata.name(u))

python 3
for u in map(chr, range(0x10FFFF)):
    if str.isdigit(u):
        print(name(u))
History
Date User Action Args
2019-04-10 07:31:29zhengsetnosy: + zheng
messages: + msg339832
2019-04-10 07:27:12zhengsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request12684
2019-03-24 15:15:10serhiy.storchakasetkeywords: + easy
stage: needs patch
2019-03-24 14:47:13pewscornercreate