Message191014
On 12.06.2013 07:32, Alexander Belopolsky wrote:
>
> Alexander Belopolsky added the comment:
>
> It looks like we a approaching consensus on some points:
>
> 1. Mixed script numerals should be disallowed.
> 2. '\N{MINUS SIGN}' should be accepted as an alternative to '\N{HYPHEN-MINUS}'
>
> Open question: should we accept fullwidth + and -, sub/superscript variants etc.? I believe rather than debating variant codepoints one by one, we should consider applying NFKC (compatibility) normalization to unicode strings to be interpreted as numbers. This would allow parsing strings like this:
>
>>>> float(normalize('NFKC', '\N{FULLWIDTH HYPHEN-MINUS}\N{DIGIT ONE FULL STOP}\N{FULLWIDTH DIGIT TWO}'))
> -1.2
While it would solve these cases, I think that would cause a
significant performance hit.
Perhaps we could do this in two phases:
1. detect whether the string uses non-ASCII digits and symbols
2. if it does, apply normalization and then use the decimal codec |
|
Date |
User |
Action |
Args |
2013-06-12 07:05:08 | lemburg | set | recipients:
+ lemburg, mark.dickinson, belopolsky, vstinner, eric.smith, ezio.melotti, eric.araujo, cvrebert, skrah |
2013-06-12 07:05:08 | lemburg | link | issue10581 messages |
2013-06-12 07:05:07 | lemburg | create | |
|