Message89959
Here is a refreshed version of the patch, without the generated files.
The patch combines several changes which are fairly independent from
each other:
- Using the unicode database to generate the functions adds 143 new
codepoints to PyUnicode_ToNumeric, and one codepoint to
PyUnicode_IsWhitespace.
- In addition, PyUnicode_ToNumeric now contains code for all numerics;
previously those which are also digits fell in the 'default:' case and
were converted with PyUnicode_ToDigit(). This adds 468 new codepoints,
but removes the need to call PyUnicode_ToDigit()
- The Unihan.txt files (two files to download, 25Mb each) are now
parsed, and this adds 73 more codepoints to PyUnicode_ToNumeric. (There
are now 1009 entries in this function.)
The 3.2.0 version of this file contains two huge numbers: 1e16 and 1e20,
I had to widen the type of 'change_record.numeric_changed' from 'int' to
'double'. It is possible that these were removed from the Unicode
database between versions 4.1 and 5.1.
- the database has a new flag, NUMERIC_MASK, used by
PyUnicode_IsNumeric. This adds ~350 lines in the arrays of numbers in
unicodetype_db.h
If this patch is accepted, the md5 checksum in test_unicodedata.py will
need to change. |
|
Date |
User |
Action |
Args |
2009-07-01 00:03:32 | amaury.forgeotdarc | set | recipients:
+ amaury.forgeotdarc, lemburg, ajaksu2, andersch, ezio.melotti, vernondcole |
2009-07-01 00:03:32 | amaury.forgeotdarc | set | messageid: <1246406612.38.0.0754513900135.issue1571184@psf.upfronthosting.co.za> |
2009-07-01 00:03:30 | amaury.forgeotdarc | link | issue1571184 messages |
2009-07-01 00:03:30 | amaury.forgeotdarc | create | |
|