This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients belopolsky, cvrebert, eric.araujo, eric.smith, ezio.melotti, lemburg, mark.dickinson, skrah, vstinner
Date 2013-06-12.07:05:07
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <51B81D9D.1040005@egenix.com>
In-reply-to <1371015137.36.0.535918556898.issue10581@psf.upfronthosting.co.za>
Content
On 12.06.2013 07:32, Alexander Belopolsky wrote:
> 
> Alexander Belopolsky added the comment:
> 
> It looks like we a approaching consensus on some points:
> 
> 1. Mixed script numerals should be disallowed.
> 2. '\N{MINUS SIGN}' should be accepted as an alternative to '\N{HYPHEN-MINUS}'
> 
> Open question: should we accept fullwidth + and -, sub/superscript variants etc.?  I believe rather than debating variant codepoints one by one, we should consider applying NFKC (compatibility) normalization to unicode strings to be interpreted as numbers.  This would allow parsing strings like this:
> 
>>>> float(normalize('NFKC', '\N{FULLWIDTH HYPHEN-MINUS}\N{DIGIT ONE FULL STOP}\N{FULLWIDTH DIGIT TWO}'))
> -1.2

While it would solve these cases, I think that would cause a
significant performance hit.

Perhaps we could do this in two phases:
1. detect whether the string uses non-ASCII digits and symbols
2. if it does, apply normalization and then use the decimal codec
History
Date User Action Args
2013-06-12 07:05:08lemburgsetrecipients: + lemburg, mark.dickinson, belopolsky, vstinner, eric.smith, ezio.melotti, eric.araujo, cvrebert, skrah
2013-06-12 07:05:08lemburglinkissue10581 messages
2013-06-12 07:05:07lemburgcreate