Message 191014 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	belopolsky, cvrebert, eric.araujo, eric.smith, ezio.melotti, lemburg, mark.dickinson, skrah, vstinner
Date	2013-06-12.07:05:07
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<51B81D9D.1040005@egenix.com>
In-reply-to	<1371015137.36.0.535918556898.issue10581@psf.upfronthosting.co.za>

Content
On 12.06.2013 07:32, Alexander Belopolsky wrote: > > Alexander Belopolsky added the comment: > > It looks like we a approaching consensus on some points: > > 1. Mixed script numerals should be disallowed. > 2. '\N{MINUS SIGN}' should be accepted as an alternative to '\N{HYPHEN-MINUS}' > > Open question: should we accept fullwidth + and -, sub/superscript variants etc.? I believe rather than debating variant codepoints one by one, we should consider applying NFKC (compatibility) normalization to unicode strings to be interpreted as numbers. This would allow parsing strings like this: > >>>> float(normalize('NFKC', '\N{FULLWIDTH HYPHEN-MINUS}\N{DIGIT ONE FULL STOP}\N{FULLWIDTH DIGIT TWO}')) > -1.2 While it would solve these cases, I think that would cause a significant performance hit. Perhaps we could do this in two phases: 1. detect whether the string uses non-ASCII digits and symbols 2. if it does, apply normalization and then use the decimal codec

On 12.06.2013 07:32, Alexander Belopolsky wrote:
> 
> Alexander Belopolsky added the comment:
> 
> It looks like we a approaching consensus on some points:
> 
> 1. Mixed script numerals should be disallowed.
> 2. '\N{MINUS SIGN}' should be accepted as an alternative to '\N{HYPHEN-MINUS}'
> 
> Open question: should we accept fullwidth + and -, sub/superscript variants etc.?  I believe rather than debating variant codepoints one by one, we should consider applying NFKC (compatibility) normalization to unicode strings to be interpreted as numbers.  This would allow parsing strings like this:
> 
>>>> float(normalize('NFKC', '\N{FULLWIDTH HYPHEN-MINUS}\N{DIGIT ONE FULL STOP}\N{FULLWIDTH DIGIT TWO}'))
> -1.2

While it would solve these cases, I think that would cause a
significant performance hit.

Perhaps we could do this in two phases:
1. detect whether the string uses non-ASCII digits and symbols
2. if it does, apply normalization and then use the decimal codec

History
Date	User	Action	Args
2013-06-12 07:05:08	lemburg	set	recipients: + lemburg, mark.dickinson, belopolsky, vstinner, eric.smith, ezio.melotti, eric.araujo, cvrebert, skrah
2013-06-12 07:05:08	lemburg	link	issue10581 messages
2013-06-12 07:05:07	lemburg	create