classification
Title: docs unclear on difference between str.isdigit() and str.isdecimal()
Type: enhancement Stage: resolved
Components: Documentation, Unicode Versions: Python 3.7, Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: martin.panter Nosy List: Anna Koroliuk, docs@python, ethan.furman, ezio.melotti, martin.panter, mdk, python-dev, serhiy.storchaka, terry.reedy, vstinner
Priority: normal Keywords: patch

Created on 2016-03-04 20:04 by ethan.furman, last changed 2016-12-11 04:45 by martin.panter. This issue is now closed.

Files
File name Uploaded Description Edit
command_comparison.txt Anna Koroliuk, 2016-03-12 14:50
issue26483.diff mdk, 2016-11-26 14:59 review
issue26483.diff mdk, 2016-12-07 20:48 review
Messages (13)
msg261195 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2016-03-04 20:04
The docs use different explanations for what constitutes a decimal verses a digit character; consequently I can't tell if they are the same or different, and if different what the differences are.

-----------------------------------------------

https://docs.python.org/3/library/stdtypes.html?#str.isdecimal

Return true if all characters in the string are decimal characters and there is at least one character, false otherwise. Decimal characters are those from general category “Nd”. This category includes digit characters, and all characters that can be used to form decimal-radix numbers, e.g. U+0660, ARABIC-INDIC DIGIT ZERO.

-----------------------------------------------

https://docs.python.org/3/library/stdtypes.html?#str.isdigit

Return true if all characters in the string are digits and there is at least one character, false otherwise. Digits include decimal characters and digits that need special handling, such as the compatibility superscript digits. Formally, a digit is a character that has the property value Numeric_Type=Digit or Numeric_Type=Decimal.
msg261197 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-04 20:20
>>> chars = ''.join(map(chr, range(sys.maxunicode+1)))
>>> digits = ''.join(filter(str.isdigit, chars))
>>> digits
'0123456789²³¹٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯෦෧෨෩෪෫෬෭෮෯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙፩፪፫፬፭፮፯፰፱០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᧚᪀᪁᪂᪃᪄᪅᪆᪇᪈᪉᪐᪑᪒᪓᪔᪕᪖᪗᪘᪙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉①②③④⑤⑥⑦⑧⑨⑴⑵⑶⑷⑸⑹⑺⑻⑼⒈⒉⒊⒋⒌⒍⒎⒏⒐⓪⓵⓶⓷⓸⓹⓺⓻⓼⓽⓿❶❷❸❹❺❻❼❽❾➀➁➂➃➄➅➆➇➈➊➋➌➍➎➏➐➑➒꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꧐꧑꧒꧓꧔꧕꧖꧗꧘꧙꧰꧱꧲꧳꧴꧵꧶꧷꧸꧹꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙꯰꯱꯲꯳꯴꯵꯶꯷꯸꯹0123456789𐒠𐒡𐒢𐒣𐒤𐒥𐒦𐒧𐒨𐒩𐩀𐩁𐩂𐩃𐹠𐹡𐹢𐹣𐹤𐹥𐹦𐹧𐹨𑁒𑁓𑁔𑁕𑁖𑁗𑁘𑁙𑁚𑁦𑁧𑁨𑁩𑁪𑁫𑁬𑁭𑁮𑁯𑃰𑃱𑃲𑃳𑃴𑃵𑃶𑃷𑃸𑃹𑄶𑄷𑄸𑄹𑄺𑄻𑄼𑄽𑄾𑄿𑇐𑇑𑇒𑇓𑇔𑇕𑇖𑇗𑇘𑇙𑋰𑋱𑋲𑋳𑋴𑋵𑋶𑋷𑋸𑋹𑓐𑓑𑓒𑓓𑓔𑓕𑓖𑓗𑓘𑓙𑙐𑙑𑙒𑙓𑙔𑙕𑙖𑙗𑙘𑙙𑛀𑛁𑛂𑛃𑛄𑛅𑛆𑛇𑛈𑛉𑜰𑜱𑜲𑜳𑜴𑜵𑜶𑜷𑜸𑜹𑣠𑣡𑣢𑣣𑣤𑣥𑣦𑣧𑣨𑣩𖩠𖩡𖩢𖩣𖩤𖩥𖩦𖩧𖩨𖩩𖭐𖭑𖭒𖭓𖭔𖭕𖭖𖭗𖭘𖭙𝟎𝟏𝟐𝟑𝟒𝟓𝟔𝟕𝟖𝟗𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡𝟢𝟣𝟤𝟥𝟦𝟧𝟨𝟩𝟪𝟫𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿🄀🄁🄂🄃🄄🄅🄆🄇🄈🄉🄊'
>>> decimals = ''.join(filter(str.isdecimal, chars))
>>> decimals
'0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯෦෧෨෩෪෫෬෭෮෯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᪀᪁᪂᪃᪄᪅᪆᪇᪈᪉᪐᪑᪒᪓᪔᪕᪖᪗᪘᪙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꧐꧑꧒꧓꧔꧕꧖꧗꧘꧙꧰꧱꧲꧳꧴꧵꧶꧷꧸꧹꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙꯰꯱꯲꯳꯴꯵꯶꯷꯸꯹0123456789𐒠𐒡𐒢𐒣𐒤𐒥𐒦𐒧𐒨𐒩𑁦𑁧𑁨𑁩𑁪𑁫𑁬𑁭𑁮𑁯𑃰𑃱𑃲𑃳𑃴𑃵𑃶𑃷𑃸𑃹𑄶𑄷𑄸𑄹𑄺𑄻𑄼𑄽𑄾𑄿𑇐𑇑𑇒𑇓𑇔𑇕𑇖𑇗𑇘𑇙𑋰𑋱𑋲𑋳𑋴𑋵𑋶𑋷𑋸𑋹𑓐𑓑𑓒𑓓𑓔𑓕𑓖𑓗𑓘𑓙𑙐𑙑𑙒𑙓𑙔𑙕𑙖𑙗𑙘𑙙𑛀𑛁𑛂𑛃𑛄𑛅𑛆𑛇𑛈𑛉𑜰𑜱𑜲𑜳𑜴𑜵𑜶𑜷𑜸𑜹𑣠𑣡𑣢𑣣𑣤𑣥𑣦𑣧𑣨𑣩𖩠𖩡𖩢𖩣𖩤𖩥𖩦𖩧𖩨𖩩𖭐𖭑𖭒𖭓𖭔𖭕𖭖𖭗𖭘𖭙𝟎𝟏𝟐𝟑𝟒𝟓𝟔𝟕𝟖𝟗𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡𝟢𝟣𝟤𝟥𝟦𝟧𝟨𝟩𝟪𝟫𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿'
>>> ''.join(sorted(set(decimals) - set(digits)))
''
>>> ''.join(sorted(set(digits) - set(decimals)))
'²³¹፩፪፫፬፭፮፯፰፱᧚⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉①②③④⑤⑥⑦⑧⑨⑴⑵⑶⑷⑸⑹⑺⑻⑼⒈⒉⒊⒋⒌⒍⒎⒏⒐⓪⓵⓶⓷⓸⓹⓺⓻⓼⓽⓿❶❷❸❹❺❻❼❽❾➀➁➂➃➄➅➆➇➈➊➋➌➍➎➏➐➑➒𐩀𐩁𐩂𐩃𐹠𐹡𐹢𐹣𐹤𐹥𐹦𐹧𐹨𑁒𑁓𑁔𑁕𑁖𑁗𑁘𑁙𑁚🄀🄁🄂🄃🄄🄅🄆🄇🄈🄉🄊'
msg261198 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2016-03-04 20:35
I like those code snippets!  Thanks, Serhiy!

Just to make sure I have understood correctly:  every decimal char is also a digit char, but some digit chars are not decimal chars.
msg261199 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-04 21:02
Yes. For details you need to read The Unicode Standard.

And every decimal character is accepted by the int() constructor, but non-decimal digits are not.

>>> for d in decimals: x = int(d)
... 
>>> for d in set(digits) - set(decimals):
...     try:
...         int(d)
...     except ValueError:
...         pass
...     else:
...         raise AssertionError
...
msg261275 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-03-06 23:21
The documentation could certainly be clarified to say that all decimals, as determined by isdecimal(), are also digits as determined by isdigit(). IMO the current documentation is confusing or wrong to stay that decimals ‘include digit characters’; it is actually the other way around.

I think it could also be clearer about how ‘general category “Nd” ’ and ‘the property value Numeric_Type=Digit or Numeric_Type=Decimal’ are related (if they are indeed related).
msg261279 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-07 05:57
AFAIK ‘general category “Nd” ’ is the same as ‘the property value Numeric_Type=Decimal’.

Yet one related predicate is str.isnumeric().

https://docs.python.org/3/library/stdtypes.html?#str.isnumeric

Return true if all characters in the string are numeric characters, and there is at least one character, false otherwise. Numeric characters include digit characters, and all characters that have the Unicode numeric value property, e.g. U+2155, VULGAR FRACTION ONE FIFTH. Formally, numeric characters are those with the property value Numeric_Type=Digit, Numeric_Type=Decimal or Numeric_Type=Numeric.

>>> numerics = ''.join(filter(str.isnumeric, chars))
>>> ''.join(sorted(set(digits) - set(numerics)))
''
>>> ''.join(sorted(set(numerics) - set(digits)))
'¼½¾৴৵৶৷৸৹୲୳୴୵୶୷௰௱௲౸౹౺౻౼౽౾൰൱൲൳൴൵༪༫༬༭༮༯༰༱༲༳፲፳፴፵፶፷፸፹፺፻፼ᛮᛯᛰ៰៱៲៳៴៵៶៷៸៹⅐⅑⅒⅓⅔⅕⅖⅗⅘⅙⅚⅛⅜⅝⅞⅟ⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩⅪⅫⅬⅭⅮⅯⅰⅱⅲⅳⅴⅵⅶⅷⅸⅹⅺⅻⅼⅽⅾⅿↀↁↂↅↆↇↈ↉⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳⑽⑾⑿⒀⒁⒂⒃⒄⒅⒆⒇⒑⒒⒓⒔⒕⒖⒗⒘⒙⒚⒛⓫⓬⓭⓮⓯⓰⓱⓲⓳⓴⓾❿➉➓⳽〇〡〢〣〤〥〦〧〨〩〸〹〺㆒㆓㆔㆕㈠㈡㈢㈣㈤㈥㈦㈧㈨㈩㉈㉉㉊㉋㉌㉍㉎㉏㉑㉒㉓㉔㉕㉖㉗㉘㉙㉚㉛㉜㉝㉞㉟㊀㊁㊂㊃㊄㊅㊆㊇㊈㊉㊱㊲㊳㊴㊵㊶㊷㊸㊹㊺㊻㊼㊽㊾㊿㐅㒃㠪㭍一七万三九二五亖亿什仟仨伍佰億兆兩八六十千卄卅卌叁参參叄四壱壹幺廾廿弌弍弎弐拾捌柒漆玖百肆萬貮貳贰阡陆陌陸零ꛦꛧꛨꛩꛪꛫꛬꛭꛮꛯ꠰꠱꠲꠳꠴꠵參拾兩零六陸什𐄇𐄈𐄉𐄊𐄋𐄌𐄍𐄎𐄏𐄐𐄑𐄒𐄓𐄔𐄕𐄖𐄗𐄘𐄙𐄚𐄛𐄜𐄝𐄞𐄟𐄠𐄡𐄢𐄣𐄤𐄥𐄦𐄧𐄨𐄩𐄪𐄫𐄬𐄭𐄮𐄯𐄰𐄱𐄲𐄳𐅀𐅁𐅂𐅃𐅄𐅅𐅆𐅇𐅈𐅉𐅊𐅋𐅌𐅍𐅎𐅏𐅐𐅑𐅒𐅓𐅔𐅕𐅖𐅗𐅘𐅙𐅚𐅛𐅜𐅝𐅞𐅟𐅠𐅡𐅢𐅣𐅤𐅥𐅦𐅧𐅨𐅩𐅪𐅫𐅬𐅭𐅮𐅯𐅰𐅱𐅲𐅳𐅴𐅵𐅶𐅷𐅸𐆊𐆋𐋡𐋢𐋣𐋤𐋥𐋦𐋧𐋨𐋩𐋪𐋫𐋬𐋭𐋮𐋯𐋰𐋱𐋲𐋳𐋴𐋵𐋶𐋷𐋸𐋹𐋺𐋻𐌠𐌡𐌢𐌣𐍁𐍊𐏑𐏒𐏓𐏔𐏕𐡘𐡙𐡚𐡛𐡜𐡝𐡞𐡟𐡹𐡺𐡻𐡼𐡽𐡾𐡿𐢧𐢨𐢩𐢪𐢫𐢬𐢭𐢮𐢯𐣻𐣼𐣽𐣾𐣿𐤖𐤗𐤘𐤙𐤚𐤛𐦼𐦽𐧀𐧁𐧂𐧃𐧄𐧅𐧆𐧇𐧈𐧉𐧊𐧋𐧌𐧍𐧎𐧏𐧒𐧓𐧔𐧕𐧖𐧗𐧘𐧙𐧚𐧛𐧜𐧝𐧞𐧟𐧠𐧡𐧢𐧣𐧤𐧥𐧦𐧧𐧨𐧩𐧪𐧫𐧬𐧭𐧮𐧯𐧰𐧱𐧲𐧳𐧴𐧵𐧶𐧷𐧸𐧹𐧺𐧻𐧼𐧽𐧾𐧿𐩄𐩅𐩆𐩇𐩽𐩾𐪝𐪞𐪟𐫫𐫬𐫭𐫮𐫯𐭘𐭙𐭚𐭛𐭜𐭝𐭞𐭟𐭸𐭹𐭺𐭻𐭼𐭽𐭾𐭿𐮩𐮪𐮫𐮬𐮭𐮮𐮯𐳺𐳻𐳼𐳽𐳾𐳿𐹩𐹪𐹫𐹬𐹭𐹮𐹯𐹰𐹱𐹲𐹳𐹴𐹵𐹶𐹷𐹸𐹹𐹺𐹻𐹼𐹽𐹾𑁛𑁜𑁝𑁞𑁟𑁠𑁡𑁢𑁣𑁤𑁥𑇡𑇢𑇣𑇤𑇥𑇦𑇧𑇨𑇩𑇪𑇫𑇬𑇭𑇮𑇯𑇰𑇱𑇲𑇳𑇴𑜺𑜻𑣪𑣫𑣬𑣭𑣮𑣯𑣰𑣱𑣲𒐀𒐁𒐂𒐃𒐄𒐅𒐆𒐇𒐈𒐉𒐊𒐋𒐌𒐍𒐎𒐏𒐐𒐑𒐒𒐓𒐔𒐕𒐖𒐗𒐘𒐙𒐚𒐛𒐜𒐝𒐞𒐟𒐠𒐡𒐢𒐣𒐤𒐥𒐦𒐧𒐨𒐩𒐪𒐫𒐬𒐭𒐮𒐯𒐰𒐱𒐲𒐳𒐴𒐵𒐶𒐷𒐸𒐹𒐺𒐻𒐼𒐽𒐾𒐿𒑀𒑁𒑂𒑃𒑄𒑅𒑆𒑇𒑈𒑉𒑊𒑋𒑌𒑍𒑎𒑏𒑐𒑑𒑒𒑓𒑔𒑕𒑖𒑗𒑘𒑙𒑚𒑛𒑜𒑝𒑞𒑟𒑠𒑡𒑢𒑣𒑤𒑥𒑦𒑧𒑨𒑩𒑪𒑫𒑬𒑭𒑮𖭛𖭜𖭝𖭞𖭟𖭠𖭡𝍠𝍡𝍢𝍣𝍤𝍥𝍦𝍧𝍨𝍩𝍪𝍫𝍬𝍭𝍮𝍯𝍰𝍱𞣇𞣈𞣉𞣊𞣋𞣌𞣍𞣎𞣏🄋🄌𠀁𠁤𠃢𠄡𠤪𠦃𠦌𠦜𠫪𠫽𠬙𢎐𢦘𣬛𦉭廾'

decimals is a subset of digits, and digits is a subset of numerics.
msg261653 - (view) Author: Anna Koroliuk (Anna Koroliuk) * Date: 2016-03-12 14:50
Hi, all!

At Helsinki Python sprint I with the kind help of Ezio found two things. 

1) This code gives results which are attached in the file. I will just now show some interesting cases where isdigit() and isdecimal() give different results.

for c in map(chr, range(0x10FFFF)):
if unicodedata.digit(c, None) is not None: print(c, c.isdigit(), c.isdecimal())
... 

0 True True
1 True True
2 True True
² True False
³ True False
¹ True False
፩ True False
፪ True False
፫ True False
፬ True False
① True False
② True False
③ True False

So it's different commands, although for usual digits 0-9 in usual typewriting without those upper indexes etc they give same results. Full file command_comparison.txt is attached. 

2) Both commands isdigit() and isdecimal() are traced back that symbol is compared to a certain tables (masks), but masks are different. For isdigit() it is DIGIT_MASK = 0x04 and for isdecimal() is DECIMAL_MASK 0x02.

Here is how all the commands are traced to the mask. 

A) isdecimal()

./Objects/unicodeobject.c:    {"isdecimal", (PyCFunction) unicode_isdecimal, METH_NOARGS, isdecimal__doc__},

./Objects/unicodeobject.c:
static PyObject*
unicode_isdecimal(PyObject *self)
....
    if (length == 1)
        return PyBool_FromLong(
            Py_UNICODE_ISDECIMAL(PyUnicode_READ(kind, data, 0)));

./Include/unicodeobject.h:#define Py_UNICODE_ISDECIMAL(ch) _PyUnicode_IsDecimalDigit(ch)

./Objects/unicodectype.c:
int _PyUnicode_IsDecimalDigit(Py_UCS4 ch)
{
    if (_PyUnicode_ToDecimalDigit(ch) < 0)
        return 0;
    return 1;
}

int _PyUnicode_ToDecimalDigit(Py_UCS4 ch)
{
    const _PyUnicode_TypeRecord *ctype = gettyperecord(ch);

    return (ctype->flags & DECIMAL_MASK) ? ctype->decimal : -1;
}
./Objects/unicodectype.c:#define DECIMAL_MASK 0x02

B) isdigit()

./Objects/unicodeobject.c:    {"isdigit", (PyCFunction) unicode_isdigit, METH_NOARGS, isdigit__doc__},

./Objects/unicodeobject.c: static PyObject*
unicode_isdigit(PyObject *self)
...
    if (length == 1) {
        const Py_UCS4 ch = PyUnicode_READ(kind, data, 0);
        return PyBool_FromLong(Py_UNICODE_ISDIGIT(ch));
    }

./Include/unicodeobject.h:#define Py_UNICODE_ISDIGIT(ch) _PyUnicode_IsDigit(ch)

./Objects/unicodectype.c: int _PyUnicode_IsDigit(Py_UCS4 ch)
{
    if (_PyUnicode_ToDigit(ch) < 0)
        return 0;
    return 1;
}

int _PyUnicode_ToDigit(Py_UCS4 ch)
{
    const _PyUnicode_TypeRecord *ctype = gettyperecord(ch);

    return (ctype->flags & DIGIT_MASK) ? ctype->digit : -1;
}

./Tools/unicode/makeunicodedata.py:DIGIT_MASK = 0x04

BR,
Anna
msg261663 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2016-03-12 21:31
To dig further, the DIGIT_MASK and DECIMAL_MASK used in `unicodeobject.c` are from `unicodectype.c` and they match values from `unicodetype_db.h` witch is generated by `Tools/unicode/makeunicodedata.py` which built those masks this way:

    # decimal digit, integer digit
    decimal = 0
    if record[6]:
        flags |= DECIMAL_MASK
        decimal = int(record[6])
    digit = 0
    if record[7]:
        flags |= DIGIT_MASK
        digit = int(record[7])
    if record[8]:
        flags |= NUMERIC_MASK
        numeric.setdefault(record[8], []).append(char)

Those "record"s are documented in ftp://unicode.org/Public/3.2-Update/UnicodeData-3.2.0.html in which fields 6, 7, and 8 are:

 - 6	Decimal digit value	N	This is a numeric field. If the character has the decimal digit property, as specified in Chapter 4 of the Unicode Standard, the value of that digit is represented with an integer value in this field

 - 7	Digit value	N	This is a numeric field. If the character represents a digit, not necessarily a decimal digit, the value is here. This covers digits which do not form decimal radix forms, such as the compatibility superscript digits

 - 8	Numeric value	N	This is a numeric field. If the character has the numeric property, as specified in Chapter 4 of the Unicode Standard, the value of that character is represented with an integer or rational number in this field. This includes fractions as, e.g., "1/5" for U+2155 VULGAR FRACTION ONE FIFTH Also included are numerical values for compatibility characters such as circled numbers.

Which is very close of the actual documentation. Yet the documentation is misleading using "This category includes digit characters" in the "isdecimal" documentation.

Posssible rewriting:

isdecimal: Return true if all characters in the string are decimal characters and there is at least one character, false otherwise. Decimal characters are those that can be used to form decimal-radix numbers, e.g. U+0660, ARABIC-INDIC DIGIT ZERO. Formally a decimal character is a character in the Unicode General Category "Nd".

isdigit: Return true if all characters in the string are digits and there is at least one character, false otherwise. Digits include decimal characters and digits that need special handling, such as the compatibility superscript digits. This covers digits which do not form decimal radix forms. Formally, a digit is a character that has the property value Numeric_Type=Digit or Numeric_Type=Decimal.

I don't think we can refactor more than this without rewriting documentation for isnumeric which mentions the Unicode standard the same way.
msg281777 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2016-11-26 14:59
Proposing a patch.
msg282130 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-12-01 03:12
“digits which do not form decimal radix forms”

I see you have taken this from a Unicode document, but “forming a form” seems a long way of saying very little. The difference seems a bit vague, but I gather that digits not in the Unicode “decimal digit” category are often (always?) still decimal digits, but primarily used for a symbolic or typographical meaning more than in a plain number, e.g. superscripts, subscripts and other fonts, added circles and other decorations.
msg282153 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2016-12-01 09:28
“digits which do not form decimal radix forms”

> “forming a form” seems a long way of saying very little. The difference seems a bit vague

> I gather that digits not in the Unicode “decimal digit” category are often (always?) still decimal digits

I expected them not to, but they often are representative of a base 10 value:

>>> import sys
>>> import unicodedata
>>> chars = ''.join(map(chr, range(sys.maxunicode+1)))
>>> decimals = ''.join(filter(str.isdecimal, chars))
>>> digits = ''.join(filter(str.isdigit, chars))
>>> non_decimal_digits = set(digits) - set(decimals)
>>> from collections import Counter
>>> Counter([unicodedata.digit(char) for char in non_decimal_digits])
Counter({1: 15, 2: 14, 3: 14, 4: 14, 5: 13, 6: 13, 7: 13, 8: 13, 9: 13, 0: 6})

But, note that there's one more in the range [1,4], it's the [Kharosthi](https://en.wikipedia.org/wiki/Kharosthi) numbers, they do not use base 10 but a notation reminiscent of Roman numerals.

So here, clearly, all digits are not an notation for a base 10 value.
 
> but primarily used for a symbolic or typographical meaning more than in a plain number, e.g. superscripts, subscripts and other fonts, added circles and other decorations.

Which also can't be used to form a base 10 number.

So here is another proposition for isdecimal, probably more human friendly:

    Return true if all characters in the string are decimal
    characters and there is at least one character, false
    otherwise. Decimal characters are those that can be used to form
    numbers in base 10, e.g. U+0660, ARABIC-INDIC DIGIT
    ZERO. Formally a decimal character is a character in the Unicode
    General Category "Nd".

And here is another proposition for isdigit, probably friendlier too:

    Return true if all characters in the string are digits and there is at least one
    character, false otherwise.  Digits include decimal characters and digits that need
    special handling, such as the compatibility superscript digits.
    This covers digits which cannot be used to form numbers in base 10, like the Kharosthi numbers.
    Formally, a digit is a character that has the property value
    Numeric_Type=Digit or Numeric_Type=Decimal.
msg282827 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-12-10 05:04
I’m okay with this version unless anyone has any more improvements.
msg282900 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-12-11 04:10
New changeset c15f122617d5 by Martin Panter in branch '3.5':
Issue #26483: Clarify str.isdecimal() and isdigit()
https://hg.python.org/cpython/rev/c15f122617d5

New changeset bc7fc85beed1 by Martin Panter in branch '3.6':
Issues #28916, #26483: Merge stdtypes.rst from 3.5
https://hg.python.org/cpython/rev/bc7fc85beed1

New changeset b11850871300 by Martin Panter in branch 'default':
Issues #28916, #26483: Merge stdtypes.rst from 3.6
https://hg.python.org/cpython/rev/b11850871300
History
Date User Action Args
2016-12-11 04:45:00martin.pantersetstatus: open -> closed
resolution: fixed
stage: commit review -> resolved
2016-12-11 04:10:48python-devsetnosy: + python-dev
messages: + msg282900
2016-12-10 07:50:11serhiy.storchakasetassignee: docs@python -> martin.panter
2016-12-10 05:04:48martin.pantersetmessages: + msg282827
stage: patch review -> commit review
2016-12-07 20:48:43mdksetfiles: + issue26483.diff
2016-12-01 09:28:49mdksetmessages: + msg282153
2016-12-01 03:12:44martin.pantersetstage: needs patch -> patch review
messages: + msg282130
versions: + Python 3.7
2016-11-26 14:59:44mdksetfiles: + issue26483.diff
keywords: + patch
messages: + msg281777
2016-03-12 21:31:15mdksetnosy: + mdk
messages: + msg261663
2016-03-12 14:50:06Anna Koroliuksetfiles: + command_comparison.txt
nosy: + Anna Koroliuk
messages: + msg261653

2016-03-11 21:43:51terry.reedysetnosy: + terry.reedy

versions: - Python 3.4
2016-03-07 05:57:19serhiy.storchakasetmessages: + msg261279
2016-03-06 23:21:06martin.pantersetnosy: + martin.panter
messages: + msg261275
2016-03-04 21:02:07serhiy.storchakasetmessages: + msg261199
2016-03-04 20:35:29ethan.furmansetmessages: + msg261198
2016-03-04 20:20:11serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg261197
2016-03-04 20:19:34ezio.melottisetnosy: + ezio.melotti, vstinner, docs@python
versions: + Python 3.6
assignee: docs@python
components: + Documentation, Unicode
type: enhancement
stage: needs patch
2016-03-04 20:17:54serhiy.storchakasettitle: docs unclear on difference between isdigt() and isdecimal() -> docs unclear on difference between str.isdigit() and str.isdecimal()
2016-03-04 20:04:57ethan.furmancreate