Message332318
Here's my implementation:
from unicodedata import name
from unicodedata import lookup as _lookup
from fnmatch import translate
from re import compile, I
_NAMES = None
def getnames():
global _NAMES
if _NAMES is None:
_NAMES = []
for i in range(0x110000):
s = name(chr(i), '')
if s:
_NAMES.append(s)
return _NAMES
def lookup(name_or_glob):
if any(c in name_or_glob for c in '*?['):
match = compile(translate(name_or_glob), flags=I).match
return [name for name in getnames() if match(name)]
else:
return _lookup(name_or_glob)
The major limitation of my implementation is that it doesn't match name aliases or sequences.
http://www.unicode.org/Public/11.0.0/ucd/NameAliases.txt
http://www.unicode.org/Public/11.0.0/ucd/NamedSequences.txt
For example:
lookup('TAMIL SYLLABLE TAA?') # NamedSequence
ought to return ['தா'] but doesn't.
Parts of the Unicode documentation uses the convention that canonical names are in UPPERCASE, aliases are lowercase, and sequences are in Mixed Case. and I think that we should follow that convention:
http://www.unicode.org/charts/aboutcharindex.html
That makes it easy to see what is the canonical name and what isn't. |
|
Date |
User |
Action |
Args |
2018-12-22 01:01:29 | steven.daprano | set | recipients:
+ steven.daprano, vstinner, ezio.melotti, rominf |
2018-12-22 01:01:27 | steven.daprano | set | messageid: <1545440487.14.0.98272194251.issue35549@roundup.psfhosted.org> |
2018-12-22 01:01:27 | steven.daprano | link | issue35549 messages |
2018-12-22 01:01:27 | steven.daprano | create | |
|