This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author loewis
Recipients ezio.melotti, gvanrossum, lemburg, loewis, mrabarnett, tchrist, terry.reedy
Date 2011-10-01.15:17:51
SpamBayes Score 2.710333e-08
Marked as misclassified No
Message-id <4E872F1E.6050604@v.loewis.de>
In-reply-to <29624.1317420430@chthon>
Content
> You may wish unicode.name() to return the alias in preference, however.

-1. .name() is documented (and users familiar with it expect it) as
returning the name of the character from the UCD.

It doesn't really matter much to me if it's non-sensical - it's just
a label. Notice that many characters have names like "CJK UNIFIED
IDEOGRAPH-4E20", which isn't very descriptive, either. What does matter
is that the name returned matches the same name in many other places
in the net, which (rightfully) all use the UCD name (they might provide
the alias as well if they are aware of aliases, but often don't).

> If you mean, is it ok to add just the aliases and not the named sequences to
> \N{}, it is certainly better than not doing so at all.  Plus that way you do
> *not* have to figure out what in the world to to do with [^a-c\N{sequence}],

Python doesn't use regexes in the language parser, but does do \N
escapes in the parser. So there is no way this transformation could
possibly be made - except when you are talking about escapes in regexes,
and not escapes in Unicode strings.

> Perl does not provide the old 1.0 names at all.  We don't have a Unicode
> 1.0 legacy to support, which makes this cleaner.  However, we do provide
> for the names of the C0 and C1 Control Codes, because apart from Unicode
> 1.0, they don't condescend to name the ASCII or Latin1 control codes.  

If there would be a reasonably official source for these names, and one
that guarantees that there is no collision with UCD names, I could
accept doing so for Python as well.

> We also provide for certain well known aliases from the Names file:
> anything that says "* commonly abbreviated as ...", so things like LRO
> and ZWJ and such.

-1. Readability counts, writability not so much (I know this is
different for Perl :-). If there is too much aliasing, people will
wonder what these codes actually mean.
History
Date User Action Args
2011-10-01 15:17:52loewissetrecipients: + loewis, lemburg, gvanrossum, terry.reedy, ezio.melotti, mrabarnett, tchrist
2011-10-01 15:17:52loewislinkissue12753 messages
2011-10-01 15:17:51loewiscreate