This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author belopolsky
Recipients belopolsky, eric.smith, ezio.melotti, lemburg, mark.dickinson, skrah, vstinner
Date 2010-11-29.06:23:06
SpamBayes Score 2.0305979e-13
Marked as misclassified No
Message-id <>
After a bit of svn archeology, it does appear that Arabic-Indic digits' support was deliberate at least in the sense that the feature was tested for when the code was first committed. See r15000.

The test migrated from file to file over the last 10 years, but it is still present in

        self.assertEqual(float(b"  \u0663.\u0661\u0664  ".decode('raw-unicode-escape')), 3.14)

(It should probably be now rewritten using a string literal.)

I am now attaching the patch (issue10557.diff) that fixes the bug without sacrificing non-ASCII digit support.

If this approach is well-received, I would like to replace all calls to PyUnicode_EncodeDecimal() with the calls to the new _PyUnicode_EncodeDecimalUTF8() and deprecate Latin-1-oriented PyUnicode_EncodeDecimal().

For the future, I note that starting with Unicode 6.0.0, the Unicode Consortium promises that

Characters with the property value Numeric_Type=de (Decimal) only occur in contiguous ranges of 10 characters, with ascending numeric values from 0 to 9 (Numeric_Value=0..9).

This makes it very easy to check a numeric string does not contain a mix of digits from different scripts.

I still believe that proper API should require explicit choice of language or locale before allowing digits other than 0-9 just as int() would not accept hexadecimal digits without explicit choice of base >= 16.  But this would be a subject of a feature request.
Date User Action Args
2010-11-29 06:23:09belopolskysetrecipients: + belopolsky, lemburg, mark.dickinson, vstinner, eric.smith, ezio.melotti, skrah
2010-11-29 06:23:09belopolskysetmessageid: <>
2010-11-29 06:23:07belopolskylinkissue10557 messages
2010-11-29 06:23:06belopolskycreate