classification
Title: error in unicodedata.numeric(u"\u2187") and 2188
Type: behavior Stage:
Components: Unicode Versions: Python 3.0, Python 3.1, Python 2.7, Python 2.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Generate numeric/space/linebreak from Unicode database.
View: 1571184
Assigned To: Nosy List: amaury.forgeotdarc, benjamin.peterson, ezio.melotti, loewis, vernondcole
Priority: normal Keywords: needs review, patch

Created on 2009-06-30 02:43 by vernondcole, last changed 2009-06-30 21:29 by amaury.forgeotdarc. This issue is now closed.

Files
File name Uploaded Description Edit
unicode_tonumeric.patch amaury.forgeotdarc, 2009-06-30 09:31
unicode-tonumeric-2.patch amaury.forgeotdarc, 2009-06-30 14:28
unnamed vernondcole, 2009-06-30 17:31
Messages (8)
msg89899 - (view) Author: Vernon Cole (vernondcole) Date: 2009-06-30 02:43
I am making a demo program, a class which is a subset of int, which
implements a partial implementation of PEP313 (Roman numeral literals).

I discover that my conversion routines fail for values > 50000 due to an
error in unicodedata for the two code points 2187 and 2188. The return
value of unicodedata.numeric() for those two points should be 50,000.0
and 100,000.0 respectively. See the following console dump which
includes code point 2181 which works correctly.

----- console dump follows -----

c:\BZR\roman>c:\python26\python.exe
Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata
>>> unicodedata.name(u"\u2187")
'ROMAN NUMERAL FIFTY THOUSAND'
>>> unicodedata.numeric(u"\u2187")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: not a numeric character
>>> unicodedata.name(u"\u2188")
'ROMAN NUMERAL ONE HUNDRED THOUSAND'
>>> unicodedata.numeric(u"\u2188")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: not a numeric character
>>> unicodedata.name(u"\u2181")
'ROMAN NUMERAL FIVE THOUSAND'
>>> unicodedata.numeric(u"\u2181")
5000.0
>>>
msg89911 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-06-30 06:47
Python 2.6 and all the following versions use the Unicode database
version 5.1.0 [1] (unicodedata.unidata_version).

The numeric value is in the database for all the codepoints from U+2185
to U+2188 (included), so the problem shouldn't be there.

[1]: ftp://ftp.unicode.org/Public/5.1.0/ucd/UnicodeData.txt
msg89913 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-06-30 09:31
The _PyUnicode_ToNumeric() function was not in line with the unicode
database.
Here is a new version of this function, together with the script to
generate its code.
msg89919 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009-06-30 12:32
Wouldn't it make more sense to move this into unicode_db.h?
msg89926 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-06-30 14:28
Right. Actually unicodetype_db.h is the one included in unicodectype.c,
I moved my script into makeunicodedata.py.

Here is a new patch. The code generated for _PyUnicode_ToNumeric is the
same as before (except for some tabs), see the old patch if you want to
check the actual changes in the function.
msg89946 - (view) Author: Vernon Cole (vernondcole) Date: 2009-06-30 17:31
Wow! Quick response! My outstanding bug on IronPython has been hanging out
there since August of last year.
  I don't really want to try compiling the standard library on my laptop,
but I do want to fully test my code soon. What is the first place I can
expect to see this in binary form? 3.2 alpha?
--
Vernon

On Tue, Jun 30, 2009 at 8:28 AM, Amaury Forgeot d'Arc <
report@bugs.python.org> wrote:

>
> Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:
>
> Right. Actually unicodetype_db.h is the one included in unicodectype.c,
> I moved my script into makeunicodedata.py.
>
> Here is a new patch. The code generated for _PyUnicode_ToNumeric is the
> same as before (except for some tabs), see the old patch if you want to
> check the actual changes in the function.
>
> ----------
> Added file: http://bugs.python.org/file14405/unicode-tonumeric-2.patch
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue6383>
> _______________________________________
>
msg89947 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-06-30 19:11
Notice that this is a duplicate of the longstanding issue1571184, which
has a patch that is more comprehensive than the one proposed here. So
rather than accepting Amaury's patch, I'd prefer to see Anders' patch
reviewed, and revised as necessary.
msg89949 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-06-30 21:29
Yes, my patch is entirely contained in the one from issue1571184.
I mark this one as a duplicate, and will review and update the other.
History
Date User Action Args
2009-06-30 21:29:30amaury.forgeotdarcsetstatus: open -> closed
superseder: Generate numeric/space/linebreak from Unicode database.
messages: + msg89949

dependencies: - Generate numeric/space/linebreak from Unicode database.
resolution: duplicate
2009-06-30 19:11:47loewissetdependencies: + Generate numeric/space/linebreak from Unicode database.
messages: + msg89947
2009-06-30 17:31:10vernondcolesetfiles: + unnamed

messages: + msg89946
2009-06-30 14:28:47amaury.forgeotdarcsetfiles: + unicode-tonumeric-2.patch

messages: + msg89926
2009-06-30 12:32:57benjamin.petersonsetnosy: + loewis, benjamin.peterson
messages: + msg89919
2009-06-30 09:31:44amaury.forgeotdarcsetfiles: + unicode_tonumeric.patch

nosy: + amaury.forgeotdarc
messages: + msg89913

keywords: + needs review, patch
2009-06-30 06:47:53ezio.melottisetpriority: normal

messages: + msg89911
versions: + Python 3.0, Python 3.1, Python 2.7
2009-06-30 05:29:20ezio.melottisetnosy: + ezio.melotti
2009-06-30 02:43:02vernondcolecreate