Title: unicode.isdecimal bug in online Python 2 documentation
Type: behavior Stage: resolved
Components: Documentation Versions: Python 2.7
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, pewscorner, zach.ware, zheng
Priority: normal Keywords: easy, patch

Created on 2019-03-24 14:47 by pewscorner, last changed 2020-01-19 19:00 by zach.ware. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 12757 closed zheng, 2019-04-10 07:27
Messages (3)
msg338736 - (view) Author: PEW's Corner (pewscorner) * Date: 2019-03-24 14:47
The online Python 2 documentation for unicode.isdecimal ( incorrectly states:

"Decimal characters include digit characters".

This is wrong (decimal characters are actually a subset of digit characters), and u'\u00b3' is an example of a character that is a digit but not a decimal.

Issue 26483 ( corrected the same bug in the Python 3 documentation, and a similar correction should be applied to the Python 2 documentation.
msg339832 - (view) Author: zheng (zheng) * Date: 2019-04-10 07:31
I propose we copy over the exact changes made to the Python 3 documentation.

I looked through the code mentioned in the other thread. Namely, `Objects/unicodeobject.c` and `Tools/unicode/`. The implementation is identical between python 2 and python 3. The only difference appears to be the unicode version used.

    # decimal digit, integer digit
                decimal = 0
                if record[6]:
                    flags |= DECIMAL_MASK
                    decimal = int(record[6])
                digit = 0
                if record[7]:
                    flags |= DIGIT_MASK
                    digit = int(record[7])
                if record[8]:
                    flags |= NUMERIC_MASK
                    numeric.setdefault(record[8], []).append(char)

Another form of validation I did was enumerate all the digits and decimals and compare between versions. It looks like the general change is that there are a bunch of new unicode characters introduced in python 3. The exception is NEW TAI LUE THAM DIGIT ONE which gets recategorized as a digit.

python 2, compiled with UCS4
for u in map(unichr, list(range(0x10FFFF))):
    if unicode.isdigit(u):

python 3
for u in map(chr, range(0x10FFFF)):
    if str.isdigit(u):
msg360267 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2020-01-19 19:00
As Python 2.7 has reached EOL and the branch is closed to regular maintenance, I'm closing the issue.  Thanks for the report and patch anyway!
Date User Action Args
2020-01-19 19:00:52zach.waresetstatus: open -> closed

nosy: + zach.ware
messages: + msg360267

resolution: out of date
stage: patch review -> resolved
2019-04-10 07:31:29zhengsetnosy: + zheng
messages: + msg339832
2019-04-10 07:27:12zhengsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request12684
2019-03-24 15:15:10serhiy.storchakasetkeywords: + easy
stage: needs patch
2019-03-24 14:47:13pewscornercreate