This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author belopolsky
Recipients belopolsky, eric.smith, ezio.melotti, lemburg, mark.dickinson, skrah, vstinner
Date 2010-11-29.17:55:50
SpamBayes Score 2.7755576e-16
Marked as misclassified No
Message-id <1291053358.57.0.165550567454.issue10581@psf.upfronthosting.co.za>
In-reply-to
Content
I am opening a new report to continue work on the issues raised in #10557 that are either feature requests or documentation bugs.

The rest is my reply to the relevant portions of Marc's comment at msg122785.

On Mon, Nov 29, 2010 at 4:41 AM, Marc-Andre Lemburg <report@bugs.python.org> wrote:
..
> Alexander Belopolsky wrote:
>>
>> Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:
>>
>> After a bit of svn archeology, it does appear that Arabic-Indic
>> digits' support was deliberate at least in the sense that the
>> feature was tested for when the code was first committed. See r15000.
>
> As I mentioned on python-dev (http://mail.python.org/pipermail/python-dev/2010-November/106077.html)
> this support was added intentionally.
>
>> The test migrated from file to file over the last 10 years, but it
>> is still present in test_float.py:
>>
>>         self.assertEqual(float(b"  \u0663.\u0661\u0664  ".decode('raw-unicode-escape')), 3.14)
>>
>> (It should probably be now rewritten using a string literal.)
>>
..
>> For the future, I note that starting with Unicode 6.0.0,
>> the Unicode Consortium promises that
>>
>> """
>> Characters with the property value Numeric_Type=de (Decimal) only
>> occur in contiguous ranges of 10 characters, with ascending numeric
>> values from 0 to 9 (Numeric_Value=0..9).
>> """
>>
>> This makes it very easy to check a numeric string does not contain
>> a mix of digits from different scripts.
>
> I'm not sure why you'd want to check for such ranges.
>

In order to disallow a mix of say Arabic-Indic and Bengali digits.  Such combinations cannot be defended as possibly valid numbers in any script.

>> I still believe that proper API should require explicit choice of
>> language or locale before allowing digits other than 0-9 just as
>> int() would not accept hexadecimal digits without explicit choice of
>> base >= 16.  But this would be a subject of a feature request.
>
> Since when do we require a locale or language to be specified when
> using Unicode ?
>

This is a valid question.  I may be in minority, but I find it convenient to use int(), float() etc. for data validation.  If my program gets a CSV file with Arabic-Indic digits, I want to fire the guy who prepared it before it causes real issues. :-)  I may be too strict, but I don't think anyone would want to see columns with a mix of Bengali and Devanagari numerals.

On the other hand there is certain convenience in promiscuous parsers, but this is not an expectation that I have from int() and friends.  int('0xFF') requires me to specify base even though 0xFF is a perfectly valid notation.

There are pros and cons in any approach.  Let's figure out what is better before we fix the documentation.

> The codecs, Unicode methods and other Unicode support features
> happily work with all kinds of languages, mixed or not, without any
> such specification.

In my view int() and friends are only marginally related to Unicode and Unicode methods design is not directly relevant to their behavior.  If we were designing str.todigits(), by all means, I would argue that it must be consistent with str.isdigit().  For numeric data, however, I think we should follow the logic that rejected int('0xFF').

This is my opinion.  We can consider allowing int('0xFF') as well.  Let's discuss.
History
Date User Action Args
2010-11-29 17:55:58belopolskysetrecipients: + belopolsky, lemburg, mark.dickinson, vstinner, eric.smith, ezio.melotti, skrah
2010-11-29 17:55:58belopolskysetmessageid: <1291053358.57.0.165550567454.issue10581@psf.upfronthosting.co.za>
2010-11-29 17:55:50belopolskylinkissue10581 messages
2010-11-29 17:55:50belopolskycreate