This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author belopolsky
Recipients belopolsky, eric.smith, ezio.melotti, ggenellina, lemburg, loewis, lukasz.langa, mark.dickinson, pitrou, rhettinger, skrah, tchrist, terry.reedy
Date 2013-06-10.01:55:57
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1370829358.56.0.512052871642.issue6632@psf.upfronthosting.co.za>
In-reply-to
Content
As a design principle, "accept what's unambiguous in any locale" is reasonable, but it is hard to apply consistently.  I would agree that the status quo is hard to defend.  After a long discussion, it has been accepted that fullwidth digits should be accepted and now float(u'123') is valid, but not float('+123'), float('-123') or float('12⒊'). The last example is

>>> '\N{FULLWIDTH DIGIT ONE}\N{FULLWIDTH DIGIT TWO}\N{DIGIT THREE FULL STOP}'
'12⒊'

All these variations can be neatly addressed by applying NFKC or NFKD normalization to unicode data before conversion:

>>> float(unicodedata.normalize('NFKD', '+123'))
123.0
>>> float(unicodedata.normalize('NFKD', '-123'))
-123.0
>>> float(unicodedata.normalize('NFKC', '12⒊'))
123.0

This would even allow parsing fullwidth hexadecimal numbers:

>>> float.fromhex(unicodedata.normalize('NFKC', '0x⒈7p3'))
11.5
>>> int(unicodedata.normalize('NFKC', '7F'), 16)
127

but would not help with the MINUS SIGN.

Allowing '\N{MINUS SIGN}' is particularly attractive because arguably unicode text should prefer it to ambiguous '\N{HYPHEN-MINUS}', but on the same token fractions.Fraction() should accept '\N{FRACTION SLASH}' in addition to the legacy '\N{SOLIDUS}'.

Overall, I think this situation calls for a PEP-size proposal and discussion about handling unicode numerical data throughout stdlib rather that a case by case discussion of the various quirks in the curent version.
History
Date User Action Args
2013-06-10 01:55:58belopolskysetrecipients: + belopolsky, lemburg, loewis, rhettinger, terry.reedy, mark.dickinson, ggenellina, pitrou, eric.smith, ezio.melotti, skrah, lukasz.langa, tchrist
2013-06-10 01:55:58belopolskysetmessageid: <1370829358.56.0.512052871642.issue6632@psf.upfronthosting.co.za>
2013-06-10 01:55:58belopolskylinkissue6632 messages
2013-06-10 01:55:57belopolskycreate