Issue1571184
Created on 2006-10-05 07:57 by andersch, last changed 2009-10-06 21:35 by amaury.forgeotdarc.
|
msg51199 - (view) |
Author: Anders Chrigström (andersch) |
Date: 2006-10-05 07:57 |
|
This patch changes the functions _PyUnicode_ToNumeric,
_PyUnicode_IsLinebreak and _PyUnicode_IsWhitespace from
having to be manually updated into being generated from
data in the unicode database.
It will allso read numeric values for characters whos
numeric type is defined in the Unihan.txt file and not
in the UnicodeData.txt file.
The patch should work for both the release25-maint
branch as well as the trunk.
The patch is so big i had to split it into two files
for sourcefore to accept it.
|
|
msg51200 - (view) |
Author: Marc-Andre Lemburg (lemburg) |
Date: 2006-10-05 10:45 |
|
Logged In: YES
user_id=38388
Instead of attaching the patch with the generated code,
could you please just attach the script that generates the
files and/or any patch needed to support the new generation
of the above three functions ?
That makes reviewing this a lot easier.
Thanks.
|
|
msg51201 - (view) |
Author: Anders Chrigström (andersch) |
Date: 2006-10-06 09:44 |
|
Logged In: YES
user_id=621306
Here is a patch without the generated files.
|
|
msg84457 - (view) |
Author: Daniel Diniz (ajaksu2) |
Date: 2009-03-30 02:04 |
|
I believe this one is out of date, but without a sample test to check
verifying is harder...
|
|
msg89954 - (view) |
Author: Vernon Cole (vernondcole) |
Date: 2009-06-30 22:39 |
|
Adding Python 2.6 to the list of affected versions - as that is where I
found the bug reported in issue 6383 (now superseded by this one.)
|
|
msg89959 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) |
Date: 2009-07-01 00:03 |
|
Here is a refreshed version of the patch, without the generated files.
The patch combines several changes which are fairly independent from
each other:
- Using the unicode database to generate the functions adds 143 new
codepoints to PyUnicode_ToNumeric, and one codepoint to
PyUnicode_IsWhitespace.
- In addition, PyUnicode_ToNumeric now contains code for all numerics;
previously those which are also digits fell in the 'default:' case and
were converted with PyUnicode_ToDigit(). This adds 468 new codepoints,
but removes the need to call PyUnicode_ToDigit()
- The Unihan.txt files (two files to download, 25Mb each) are now
parsed, and this adds 73 more codepoints to PyUnicode_ToNumeric. (There
are now 1009 entries in this function.)
The 3.2.0 version of this file contains two huge numbers: 1e16 and 1e20,
I had to widen the type of 'change_record.numeric_changed' from 'int' to
'double'. It is possible that these were removed from the Unicode
database between versions 4.1 and 5.1.
- the database has a new flag, NUMERIC_MASK, used by
PyUnicode_IsNumeric. This adds ~350 lines in the arrays of numbers in
unicodetype_db.h
If this patch is accepted, the md5 checksum in test_unicodedata.py will
need to change.
|
|
msg93597 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) |
Date: 2009-10-05 12:33 |
|
Marc-Andre, could you comment on this patch?
The comments above were made by inspecting the generated code, comparing
with the previous version.
IMO the only drawback is the increased memory usage.
|
|
msg93600 - (view) |
Author: Marc-Andre Lemburg (lemburg) |
Date: 2009-10-05 12:55 |
|
Amaury Forgeot d'Arc wrote:
>
> Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:
>
> Marc-Andre, could you comment on this patch?
> The comments above were made by inspecting the generated code, comparing
> with the previous version.
> IMO the only drawback is the increased memory usage.
I haven't tried applying the patch, but from reading it, it looks
good.
|
|
msg93663 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) |
Date: 2009-10-06 21:35 |
|
Patch applied with r75272.
Merged to py3k, adapted and regenerated files with r75274.
|
|
| Date |
User |
Action |
Args |
| 2009-10-06 21:35:55 | amaury.forgeotdarc | set | status: open -> closed resolution: fixed messages:
+ msg93663
|
| 2009-10-05 12:55:02 | lemburg | set | messages:
+ msg93600 |
| 2009-10-05 12:38:07 | amaury.forgeotdarc | set | files:
- unicodectype_ucs4-2.patch |
| 2009-10-05 12:37:39 | amaury.forgeotdarc | set | files:
+ unicodectype_ucs4-2.patch |
| 2009-10-05 12:33:21 | amaury.forgeotdarc | set | messages:
+ msg93597 |
| 2009-07-01 00:03:30 | amaury.forgeotdarc | set | files:
+ unicodedata-2.7.patch nosy:
+ amaury.forgeotdarc messages:
+ msg89959
|
| 2009-06-30 23:01:22 | ezio.melotti | set | nosy:
+ ezio.melotti
|
| 2009-06-30 22:39:43 | vernondcole | set | nosy:
+ vernondcole
messages:
+ msg89954 versions:
+ Python 2.6, Python 3.0 |
| 2009-06-30 21:29:30 | amaury.forgeotdarc | unlink | issue6383 dependencies |
| 2009-06-30 21:29:30 | amaury.forgeotdarc | link | issue6383 superseder |
| 2009-06-30 19:11:47 | loewis | link | issue6383 dependencies |
| 2009-03-30 02:04:52 | ajaksu2 | link | issue1571170 dependencies |
| 2009-03-30 02:04:07 | ajaksu2 | set | versions:
+ Python 3.1, Python 2.7 nosy:
+ ajaksu2
messages:
+ msg84457
type: feature request stage: test needed |
| 2006-10-05 07:57:32 | andersch | create | |
|