Message 144679 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	ezio.melotti, gvanrossum, lemburg, loewis, mrabarnett, tchrist, terry.reedy
Date	2011-09-30.08:59:08
SpamBayes Score	1.8249656e-06
Marked as misclassified	No
Message-id	<1317373151.27.0.18276033418.issue12753@psf.upfronthosting.co.za>
In-reply-to

Content
The attached patch changes Tools/unicode/makeunicodedata.py to create a list of names and codepoints taken from http://www.unicode.org/Public/6.0.0/ucd/NameAliases.txt and adds it to Modules/unicodename_db.h. During the lookup the _getcode function at Modules/unicodedata.c:1055 loops over the 11 aliases and checks if any of those match. The patch also includes tests for both unicodedata.lookup and \N{}. I'm not sure this is the best way to implement this, and someone will probably want to review and tweak both the approach and the C code, but it works fine: >>> "\N{LATIN CAPITAL LETTER GHA}" 'Ƣ' >>> import unicodedata >>> unicodedata.lookup("LATIN CAPITAL LETTER GHA") 'Ƣ' >>> "\N{LATIN CAPITAL LETTER OI}" 'Ƣ' >>> unicodedata.lookup("LATIN CAPITAL LETTER OI") 'Ƣ' The patch doesn't include changes for NamedSequences.txt.

The attached patch changes Tools/unicode/makeunicodedata.py to create a list of names and codepoints taken from http://www.unicode.org/Public/6.0.0/ucd/NameAliases.txt and adds it to Modules/unicodename_db.h.
During the lookup the _getcode function at Modules/unicodedata.c:1055 loops over the 11 aliases and checks if any of those match.
The patch also includes tests for both unicodedata.lookup and \N{}.

I'm not sure this is the best way to implement this, and someone will probably want to review and tweak both the approach and the C code, but it works fine:
>>> "\N{LATIN CAPITAL LETTER GHA}"
'Ƣ'
>>> import unicodedata
>>> unicodedata.lookup("LATIN CAPITAL LETTER GHA")
'Ƣ'
>>> "\N{LATIN CAPITAL LETTER OI}"
'Ƣ'
>>> unicodedata.lookup("LATIN CAPITAL LETTER OI")
'Ƣ'

The patch doesn't include changes for NamedSequences.txt.

History
Date	User	Action	Args
2011-09-30 08:59:11	ezio.melotti	set	recipients: + ezio.melotti, lemburg, gvanrossum, loewis, terry.reedy, mrabarnett, tchrist
2011-09-30 08:59:11	ezio.melotti	set	messageid: <1317373151.27.0.18276033418.issue12753@psf.upfronthosting.co.za>
2011-09-30 08:59:10	ezio.melotti	link	issue12753 messages
2011-09-30 08:59:10	ezio.melotti	create