Message191724
> Can a character or sequence have multiple aliases?
Yes, for example, most control characters have two aliases (and no name).
0000;NULL;control
0000;NUL;abbreviation
0001;START OF HEADING;control
0001;SOH;abbreviation
0002;START OF TEXT;control
0002;STX;abbreviation
(See <http://www.unicode.org/Public/UNIDATA/NameAliases.txt>)
> What will be a result type of unicodedata.name() with "abbreviation" keyword value?
Under my proposal:
>>> unicodedata.name('\N{ESCAPE}', type='abbreviation')
'ESC'
I would also like to consider changing the default slightly. I find the following behavior rather unhelpful:
>>> unicodedata.name('\N{ESC}')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: no such name
I think most users would expect 'ESCAPE' instead.
The following is more of a curiosity rather than a genuine problem, but is a good illustration for a general point:
>>> unicodedata.name('\N{PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET}')
'PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET'
(Note misspelled word "BRACKET" in the output.)
Since "correction" alias is the official method of publishing corrections to unicode names, I think unicodedata.name() should return correct name by default. |
|
Date |
User |
Action |
Args |
2013-06-23 19:41:26 | belopolsky | set | recipients:
+ belopolsky, lemburg, loewis, benjamin.peterson, ezio.melotti, serhiy.storchaka |
2013-06-23 19:41:26 | belopolsky | set | messageid: <1372016486.66.0.0240512667644.issue18234@psf.upfronthosting.co.za> |
2013-06-23 19:41:26 | belopolsky | link | issue18234 messages |
2013-06-23 19:41:26 | belopolsky | create | |
|