Message 191724 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	belopolsky
Recipients	belopolsky, benjamin.peterson, ezio.melotti, lemburg, loewis, serhiy.storchaka
Date	2013-06-23.19:41:26
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1372016486.66.0.0240512667644.issue18234@psf.upfronthosting.co.za>
In-reply-to

Content
> Can a character or sequence have multiple aliases? Yes, for example, most control characters have two aliases (and no name). 0000;NULL;control 0000;NUL;abbreviation 0001;START OF HEADING;control 0001;SOH;abbreviation 0002;START OF TEXT;control 0002;STX;abbreviation (See <http://www.unicode.org/Public/UNIDATA/NameAliases.txt>) > What will be a result type of unicodedata.name() with "abbreviation" keyword value? Under my proposal: >>> unicodedata.name('\N{ESCAPE}', type='abbreviation') 'ESC' I would also like to consider changing the default slightly. I find the following behavior rather unhelpful: >>> unicodedata.name('\N{ESC}') Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: no such name I think most users would expect 'ESCAPE' instead. The following is more of a curiosity rather than a genuine problem, but is a good illustration for a general point: >>> unicodedata.name('\N{PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET}') 'PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET' (Note misspelled word "BRACKET" in the output.) Since "correction" alias is the official method of publishing corrections to unicode names, I think unicodedata.name() should return correct name by default.

> Can a character or sequence have multiple aliases?

Yes, for example, most control characters have two aliases (and no name).

0000;NULL;control
0000;NUL;abbreviation
0001;START OF HEADING;control
0001;SOH;abbreviation
0002;START OF TEXT;control
0002;STX;abbreviation

(See <http://www.unicode.org/Public/UNIDATA/NameAliases.txt>)

> What will be a result type of unicodedata.name() with "abbreviation" keyword value?

Under my proposal:

>>> unicodedata.name('\N{ESCAPE}', type='abbreviation')
'ESC'

I would also like to consider changing the default slightly.  I find the following behavior rather unhelpful:

>>> unicodedata.name('\N{ESC}')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: no such name

I think most users would expect 'ESCAPE' instead.

The following is more of a curiosity rather than a genuine problem, but is a good illustration for a general point:

>>> unicodedata.name('\N{PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET}')
'PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET'

(Note misspelled word "BRACKET" in the output.)

Since "correction" alias is the official method of publishing corrections to unicode names, I think unicodedata.name() should return correct name by default.

History
Date	User	Action	Args
2013-06-23 19:41:26	belopolsky	set	recipients: + belopolsky, lemburg, loewis, benjamin.peterson, ezio.melotti, serhiy.storchaka
2013-06-23 19:41:26	belopolsky	set	messageid: <1372016486.66.0.0240512667644.issue18234@psf.upfronthosting.co.za>
2013-06-23 19:41:26	belopolsky	link	issue18234 messages
2013-06-23 19:41:26	belopolsky	create