Author belopolsky
Recipients belopolsky, eric.smith, ezio.melotti, lemburg, mark.dickinson, skrah, vstinner
Date 2010-11-29.06:23:06
SpamBayes Score 2.0306e-13
Marked as misclassified No
Message-id <1291011789.32.0.244648538312.issue10557@psf.upfronthosting.co.za>
In-reply-to
Content
After a bit of svn archeology, it does appear that Arabic-Indic digits' support was deliberate at least in the sense that the feature was tested for when the code was first committed. See r15000.

The test migrated from file to file over the last 10 years, but it is still present in test_float.py:

        self.assertEqual(float(b"  \u0663.\u0661\u0664  ".decode('raw-unicode-escape')), 3.14)

(It should probably be now rewritten using a string literal.)

I am now attaching the patch (issue10557.diff) that fixes the bug without sacrificing non-ASCII digit support.

If this approach is well-received, I would like to replace all calls to PyUnicode_EncodeDecimal() with the calls to the new _PyUnicode_EncodeDecimalUTF8() and deprecate Latin-1-oriented PyUnicode_EncodeDecimal().

For the future, I note that starting with Unicode 6.0.0, the Unicode Consortium promises that

"""
Characters with the property value Numeric_Type=de (Decimal) only occur in contiguous ranges of 10 characters, with ascending numeric values from 0 to 9 (Numeric_Value=0..9).
"""

This makes it very easy to check a numeric string does not contain a mix of digits from different scripts.

I still believe that proper API should require explicit choice of language or locale before allowing digits other than 0-9 just as int() would not accept hexadecimal digits without explicit choice of base >= 16.  But this would be a subject of a feature request.
History
Date User Action Args
2010-11-29 06:23:09belopolskysetrecipients: + belopolsky, lemburg, mark.dickinson, vstinner, eric.smith, ezio.melotti, skrah
2010-11-29 06:23:09belopolskysetmessageid: <1291011789.32.0.244648538312.issue10557@psf.upfronthosting.co.za>
2010-11-29 06:23:07belopolskylinkissue10557 messages
2010-11-29 06:23:06belopolskycreate