Issue 6383: error in unicodedata.numeric(u"\u2187") and 2188

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/50632

classification

Title:	error in unicodedata.numeric(u"\u2187") and 2188
Type:	behavior	Stage:
Components:	Unicode	Versions:	Python 3.0, Python 3.1, Python 2.7, Python 2.6

process

Status:	closed	Resolution:	duplicate
Dependencies:		Superseder:	Generate numeric/space/linebreak from Unicode database. View: 1571184
Assigned To:		Nosy List:	amaury.forgeotdarc, benjamin.peterson, ezio.melotti, loewis, vernondcole
Priority:	normal	Keywords:	needs review, patch

Created on 2009-06-30 02:43 by vernondcole, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
unicode_tonumeric.patch	amaury.forgeotdarc, 2009-06-30 09:31
unicode-tonumeric-2.patch	amaury.forgeotdarc, 2009-06-30 14:28
unnamed	vernondcole, 2009-06-30 17:31

Messages (8)
msg89899 - (view)	Author: Vernon Cole (vernondcole)	Date: 2009-06-30 02:43
I am making a demo program, a class which is a subset of int, which implements a partial implementation of PEP313 (Roman numeral literals). I discover that my conversion routines fail for values > 50000 due to an error in unicodedata for the two code points 2187 and 2188. The return value of unicodedata.numeric() for those two points should be 50,000.0 and 100,000.0 respectively. See the following console dump which includes code point 2181 which works correctly. ----- console dump follows ----- c:\BZR\roman>c:\python26\python.exe Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import unicodedata >>> unicodedata.name(u"\u2187") 'ROMAN NUMERAL FIFTY THOUSAND' >>> unicodedata.numeric(u"\u2187") Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: not a numeric character >>> unicodedata.name(u"\u2188") 'ROMAN NUMERAL ONE HUNDRED THOUSAND' >>> unicodedata.numeric(u"\u2188") Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: not a numeric character >>> unicodedata.name(u"\u2181") 'ROMAN NUMERAL FIVE THOUSAND' >>> unicodedata.numeric(u"\u2181") 5000.0 >>>
msg89911 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2009-06-30 06:47
Python 2.6 and all the following versions use the Unicode database version 5.1.0 [1] (unicodedata.unidata_version). The numeric value is in the database for all the codepoints from U+2185 to U+2188 (included), so the problem shouldn't be there. [1]: ftp://ftp.unicode.org/Public/5.1.0/ucd/UnicodeData.txt
msg89913 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2009-06-30 09:31
The _PyUnicode_ToNumeric() function was not in line with the unicode database. Here is a new version of this function, together with the script to generate its code.
msg89919 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2009-06-30 12:32
Wouldn't it make more sense to move this into unicode_db.h?
msg89926 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2009-06-30 14:28
Right. Actually unicodetype_db.h is the one included in unicodectype.c, I moved my script into makeunicodedata.py. Here is a new patch. The code generated for _PyUnicode_ToNumeric is the same as before (except for some tabs), see the old patch if you want to check the actual changes in the function.
msg89946 - (view)	Author: Vernon Cole (vernondcole)	Date: 2009-06-30 17:31
Wow! Quick response! My outstanding bug on IronPython has been hanging out there since August of last year. I don't really want to try compiling the standard library on my laptop, but I do want to fully test my code soon. What is the first place I can expect to see this in binary form? 3.2 alpha? -- Vernon On Tue, Jun 30, 2009 at 8:28 AM, Amaury Forgeot d'Arc < report@bugs.python.org> wrote: > > Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment: > > Right. Actually unicodetype_db.h is the one included in unicodectype.c, > I moved my script into makeunicodedata.py. > > Here is a new patch. The code generated for _PyUnicode_ToNumeric is the > same as before (except for some tabs), see the old patch if you want to > check the actual changes in the function. > > ---------- > Added file: http://bugs.python.org/file14405/unicode-tonumeric-2.patch > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue6383> > _______________________________________ >
msg89947 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2009-06-30 19:11
Notice that this is a duplicate of the longstanding issue1571184, which has a patch that is more comprehensive than the one proposed here. So rather than accepting Amaury's patch, I'd prefer to see Anders' patch reviewed, and revised as necessary.
msg89949 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2009-06-30 21:29
Yes, my patch is entirely contained in the one from issue1571184. I mark this one as a duplicate, and will review and update the other.

History
Date	User	Action	Args
2022-04-11 14:56:50	admin	set	github: 50632
2009-06-30 21:29:30	amaury.forgeotdarc	set	status: open -> closed superseder: Generate numeric/space/linebreak from Unicode database. messages: + msg89949 dependencies: - Generate numeric/space/linebreak from Unicode database. resolution: duplicate
2009-06-30 19:11:47	loewis	set	dependencies: + Generate numeric/space/linebreak from Unicode database. messages: + msg89947
2009-06-30 17:31:10	vernondcole	set	files: + unnamed messages: + msg89946
2009-06-30 14:28:47	amaury.forgeotdarc	set	files: + unicode-tonumeric-2.patch messages: + msg89926
2009-06-30 12:32:57	benjamin.peterson	set	nosy: + loewis, benjamin.peterson messages: + msg89919
2009-06-30 09:31:44	amaury.forgeotdarc	set	files: + unicode_tonumeric.patch nosy: + amaury.forgeotdarc messages: + msg89913 keywords: + needs review, patch
2009-06-30 06:47:53	ezio.melotti	set	priority: normal messages: + msg89911 versions: + Python 3.0, Python 3.1, Python 2.7
2009-06-30 05:29:20	ezio.melotti	set	nosy: + ezio.melotti
2009-06-30 02:43:02	vernondcole	create