Generate numeric/space/linebreak from Unicode database. #44085

andersch · 2006-10-05T07:57:32Z

BPO	1571184
Nosy	@malemburg, @amauryfa, @devdanzin, @ezio-melotti
Files	Unicodedata_part1.patch: Generate unicodedata part1 Unicodedata_part2.patch: Generate unicodedata part2 Unicodedata.patch unicodedata-2.7.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2009-10-06.21:35:55.116>
created_at = <Date 2006-10-05.07:57:32.000>
labels = ['interpreter-core', 'type-feature']
title = 'Generate numeric/space/linebreak from Unicode database.'
updated_at = <Date 2009-10-06.21:35:55.114>
user = 'https://bugs.python.org/andersch'

bugs.python.org fields:

activity = <Date 2009-10-06.21:35:55.114>
actor = 'amaury.forgeotdarc'
assignee = 'none'
closed = True
closed_date = <Date 2009-10-06.21:35:55.116>
closer = 'amaury.forgeotdarc'
components = ['Interpreter Core']
creation = <Date 2006-10-05.07:57:32.000>
creator = 'andersch'
dependencies = []
files = ['7564', '7565', '7566', '14413']
hgrepos = []
issue_num = 1571184
keywords = ['patch']
message_count = 9.0
messages = ['51199', '51200', '51201', '84457', '89954', '89959', '93597', '93600', '93663']
nosy_count = 6.0
nosy_names = ['lemburg', 'amaury.forgeotdarc', 'ajaksu2', 'andersch', 'ezio.melotti', 'vernondcole']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'test needed'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue1571184'
versions = ['Python 2.6', 'Python 3.0', 'Python 3.1', 'Python 2.7']

andersch · 2006-10-05T07:57:32Z

This patch changes the functions _PyUnicode_ToNumeric,
_PyUnicode_IsLinebreak and _PyUnicode_IsWhitespace from
having to be manually updated into being generated from
data in the unicode database.

It will allso read numeric values for characters whos
numeric type is defined in the Unihan.txt file and not
in the UnicodeData.txt file.

The patch should work for both the release25-maint
branch as well as the trunk.

The patch is so big i had to split it into two files
for sourcefore to accept it.

malemburg · 2006-10-05T10:45:25Z

Logged In: YES
user_id=38388

Instead of attaching the patch with the generated code,
could you please just attach the script that generates the
files and/or any patch needed to support the new generation
of the above three functions ?

That makes reviewing this a lot easier.

Thanks.

andersch · 2006-10-06T09:44:40Z

Logged In: YES
user_id=621306

Here is a patch without the generated files.

devdanzin · 2009-03-30T02:04:08Z

I believe this one is out of date, but without a sample test to check
verifying is harder...

vernondcole · 2009-06-30T22:39:44Z

Adding Python 2.6 to the list of affected versions - as that is where I
found the bug reported in bpo-6383 (now superseded by this one.)

amauryfa · 2009-07-01T00:03:26Z

Here is a refreshed version of the patch, without the generated files.
The patch combines several changes which are fairly independent from
each other:

Using the unicode database to generate the functions adds 143 new
codepoints to PyUnicode_ToNumeric, and one codepoint to
PyUnicode_IsWhitespace.
In addition, PyUnicode_ToNumeric now contains code for all numerics;
previously those which are also digits fell in the 'default:' case and
were converted with PyUnicode_ToDigit(). This adds 468 new codepoints,
but removes the need to call PyUnicode_ToDigit()
The Unihan.txt files (two files to download, 25Mb each) are now
parsed, and this adds 73 more codepoints to PyUnicode_ToNumeric. (There
are now 1009 entries in this function.)
The 3.2.0 version of this file contains two huge numbers: 1e16 and 1e20,
I had to widen the type of 'change_record.numeric_changed' from 'int' to
'double'. It is possible that these were removed from the Unicode
database between versions 4.1 and 5.1.
the database has a new flag, NUMERIC_MASK, used by
PyUnicode_IsNumeric. This adds ~350 lines in the arrays of numbers in
unicodetype_db.h

If this patch is accepted, the md5 checksum in test_unicodedata.py will
need to change.

amauryfa · 2009-10-05T12:33:21Z

Marc-Andre, could you comment on this patch?
The comments above were made by inspecting the generated code, comparing
with the previous version.
IMO the only drawback is the increased memory usage.

malemburg · 2009-10-05T12:55:02Z

Amaury Forgeot d'Arc wrote:

Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:

Marc-Andre, could you comment on this patch?
The comments above were made by inspecting the generated code, comparing
with the previous version.
IMO the only drawback is the increased memory usage.

I haven't tried applying the patch, but from reading it, it looks
good.

amauryfa · 2009-10-06T21:35:55Z

Patch applied with r75272.
Merged to py3k, adapted and regenerated files with r75274.

andersch mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Oct 5, 2006

devdanzin mannequin added type-feature A feature request or enhancement labels Mar 30, 2009

amauryfa closed this as completed Oct 6, 2009

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate numeric/space/linebreak from Unicode database. #44085

Generate numeric/space/linebreak from Unicode database. #44085

andersch mannequin commented Oct 5, 2006

andersch mannequin commented Oct 5, 2006

malemburg commented Oct 5, 2006

andersch mannequin commented Oct 6, 2006

devdanzin mannequin commented Mar 30, 2009

vernondcole mannequin commented Jun 30, 2009

amauryfa commented Jul 1, 2009

amauryfa commented Oct 5, 2009

malemburg commented Oct 5, 2009

amauryfa commented Oct 6, 2009

Generate numeric/space/linebreak from Unicode database. #44085

Generate numeric/space/linebreak from Unicode database. #44085

Comments

andersch mannequin commented Oct 5, 2006

andersch mannequin commented Oct 5, 2006

malemburg commented Oct 5, 2006

andersch mannequin commented Oct 6, 2006

devdanzin mannequin commented Mar 30, 2009

vernondcole mannequin commented Jun 30, 2009

amauryfa commented Jul 1, 2009

amauryfa commented Oct 5, 2009

malemburg commented Oct 5, 2009

amauryfa commented Oct 6, 2009