Author lemburg
Recipients eric.smith, ezio.melotti, lemburg, loewis, mark.dickinson
Date 2009-08-03.18:43:48
SpamBayes Score 8.98761e-08
Marked as misclassified No
Message-id <4A772FE3.60106@egenix.com>
In-reply-to <1249317285.35.0.709481915004.issue6632@psf.upfronthosting.co.za>
Content
Ezio Melotti wrote:
> 
> New submission from Ezio Melotti <ezio.melotti@gmail.com>:
> 
> The decimal codec only handles characters in the Nd (Number, decimal)
> Unicode category and whitespaces [a]. It is used by int(), float(),
> complex() and indirectly by Decimal(), Fraction() and possibly others.
> This works well only for plain digits (e.g. int(u'123')) but it
> doesn't work for all the other characters used to represent numbers, like:
> [...]

In general, Python has always stuck to the Unicode standard
for these things (as well as others like casing, etc.).

If the Unicode standard adopts a scheme for dealing with these
issues, we should include support for it.

Implementing something based on non-standards now and breaking
that support later on in order to implement the true standards
is not such a good idea.

There is work underway to define a standard for locale specific
formatting of numbers, dates, etc.:

    http://cldr.unicode.org/

Here's the TR with the data format specification:

    http://www.unicode.org/reports/tr35/tr35-12.html

I'm sure that the information gathered in that project will
sooner or later be folded back into the standard Unicode character
database. Once that's done we can then use that information to
e.g. determine the characters that make up a sign, decimal
point, etc.
History
Date User Action Args
2009-08-03 18:43:50lemburgsetrecipients: + lemburg, loewis, mark.dickinson, eric.smith, ezio.melotti
2009-08-03 18:43:48lemburglinkissue6632 messages
2009-08-03 18:43:48lemburgcreate