Issue6383
Created on 2009-06-30 02:43 by vernondcole, last changed 2009-06-30 21:29 by amaury.forgeotdarc.
|
msg89899 - (view) |
Author: Vernon Cole (vernondcole) |
Date: 2009-06-30 02:43 |
|
I am making a demo program, a class which is a subset of int, which
implements a partial implementation of PEP313 (Roman numeral literals).
I discover that my conversion routines fail for values > 50000 due to an
error in unicodedata for the two code points 2187 and 2188. The return
value of unicodedata.numeric() for those two points should be 50,000.0
and 100,000.0 respectively. See the following console dump which
includes code point 2181 which works correctly.
----- console dump follows -----
c:\BZR\roman>c:\python26\python.exe
Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata
>>> unicodedata.name(u"\u2187")
'ROMAN NUMERAL FIFTY THOUSAND'
>>> unicodedata.numeric(u"\u2187")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: not a numeric character
>>> unicodedata.name(u"\u2188")
'ROMAN NUMERAL ONE HUNDRED THOUSAND'
>>> unicodedata.numeric(u"\u2188")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: not a numeric character
>>> unicodedata.name(u"\u2181")
'ROMAN NUMERAL FIVE THOUSAND'
>>> unicodedata.numeric(u"\u2181")
5000.0
>>>
|
|
msg89911 - (view) |
Author: Ezio Melotti (ezio.melotti) |
Date: 2009-06-30 06:47 |
|
Python 2.6 and all the following versions use the Unicode database
version 5.1.0 [1] (unicodedata.unidata_version).
The numeric value is in the database for all the codepoints from U+2185
to U+2188 (included), so the problem shouldn't be there.
[1]: ftp://ftp.unicode.org/Public/5.1.0/ucd/UnicodeData.txt
|
|
msg89913 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) |
Date: 2009-06-30 09:31 |
|
The _PyUnicode_ToNumeric() function was not in line with the unicode
database.
Here is a new version of this function, together with the script to
generate its code.
|
|
msg89919 - (view) |
Author: Benjamin Peterson (benjamin.peterson) |
Date: 2009-06-30 12:32 |
|
Wouldn't it make more sense to move this into unicode_db.h?
|
|
msg89926 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) |
Date: 2009-06-30 14:28 |
|
Right. Actually unicodetype_db.h is the one included in unicodectype.c,
I moved my script into makeunicodedata.py.
Here is a new patch. The code generated for _PyUnicode_ToNumeric is the
same as before (except for some tabs), see the old patch if you want to
check the actual changes in the function.
|
|
msg89946 - (view) |
Author: Vernon Cole (vernondcole) |
Date: 2009-06-30 17:31 |
|
Wow! Quick response! My outstanding bug on IronPython has been hanging out
there since August of last year.
I don't really want to try compiling the standard library on my laptop,
but I do want to fully test my code soon. What is the first place I can
expect to see this in binary form? 3.2 alpha?
--
Vernon
On Tue, Jun 30, 2009 at 8:28 AM, Amaury Forgeot d'Arc <
report@bugs.python.org> wrote:
>
> Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:
>
> Right. Actually unicodetype_db.h is the one included in unicodectype.c,
> I moved my script into makeunicodedata.py.
>
> Here is a new patch. The code generated for _PyUnicode_ToNumeric is the
> same as before (except for some tabs), see the old patch if you want to
> check the actual changes in the function.
>
> ----------
> Added file: http://bugs.python.org/file14405/unicode-tonumeric-2.patch
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue6383>
> _______________________________________
>
|
|
msg89947 - (view) |
Author: Martin v. Löwis (loewis) |
Date: 2009-06-30 19:11 |
|
Notice that this is a duplicate of the longstanding issue1571184, which
has a patch that is more comprehensive than the one proposed here. So
rather than accepting Amaury's patch, I'd prefer to see Anders' patch
reviewed, and revised as necessary.
|
|
msg89949 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) |
Date: 2009-06-30 21:29 |
|
Yes, my patch is entirely contained in the one from issue1571184.
I mark this one as a duplicate, and will review and update the other.
|
|
| Date |
User |
Action |
Args |
| 2009-06-30 21:29:30 | amaury.forgeotdarc | set | status: open -> closed superseder: Generate numeric/space/linebreak from Unicode database. messages:
+ msg89949
dependencies:
- Generate numeric/space/linebreak from Unicode database. resolution: duplicate |
| 2009-06-30 19:11:47 | loewis | set | dependencies:
+ Generate numeric/space/linebreak from Unicode database. messages:
+ msg89947 |
| 2009-06-30 17:31:10 | vernondcole | set | files:
+ unnamed
messages:
+ msg89946 |
| 2009-06-30 14:28:47 | amaury.forgeotdarc | set | files:
+ unicode-tonumeric-2.patch
messages:
+ msg89926 |
| 2009-06-30 12:32:57 | benjamin.peterson | set | nosy:
+ loewis, benjamin.peterson messages:
+ msg89919
|
| 2009-06-30 09:31:44 | amaury.forgeotdarc | set | files:
+ unicode_tonumeric.patch
nosy:
+ amaury.forgeotdarc messages:
+ msg89913
keywords:
+ needs review, patch |
| 2009-06-30 06:47:53 | ezio.melotti | set | priority: normal
messages:
+ msg89911 versions:
+ Python 3.0, Python 3.1, Python 2.7 |
| 2009-06-30 05:29:20 | ezio.melotti | set | nosy:
+ ezio.melotti
|
| 2009-06-30 02:43:02 | vernondcole | create | |
|