Message 191777 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	belopolsky, benjamin.peterson, ezio.melotti, lemburg, loewis, serhiy.storchaka
Date	2013-06-24.15:07:49
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<51C860C1.9080305@egenix.com>
In-reply-to	<1372085891.98.0.811610593661.issue18234@psf.upfronthosting.co.za>

Content
On 24.06.2013 16:58, Alexander Belopolsky wrote: > > Alexander Belopolsky added the comment: > > Here is an example of "prior art" that is relevant to this discussion: > > """ > charnames::viacode(code) > .. > As mentioned above under ALIASES, Unicode 6.1 defines extra names (synonyms or aliases) for some code points, most of which were already available as Perl extensions. All these are accepted by \N{...} and the other functions in this module, but viacode has to choose which one name to return for a given input code point, so it returns the "best" name. To understand how this works, it is helpful to know more about the Unicode name properties. All code points actually have only a single name, which (starting in Unicode 2.0) can never change once a character has been assigned to the code point. But mistakes have been made in assigning names, for example sometimes a clerical error was made during the publishing of the Standard which caused words to be misspelled, and there was no way to correct those. The Name_Alias property was eventually created to handle these situations. If a name was wrong, a corrected synonym would be published for it, using Name_Alias. viacode will return t > hat corr > ected synonym as the "best" name for a code point. (It is even possible, though it hasn't happened yet, that the correction itself will need to be corrected, and so another Name_Alias can be created for that code point; viacode will return the most recent correction.) > > The Unicode name for each of the control characters (such as LINE FEED) is the empty string. However almost all had names assigned by other standards, such as the ASCII Standard, or were in common use. viacode returns these names as the "best" ones available. Unicode 6.1 has created Name_Aliases for each of them, including alternate names, like NEW LINE. viacode uses the original name, "LINE FEED" in preference to the alternate. Similarly the name returned for U+FEFF is "ZERO WIDTH NO-BREAK SPACE", not "BYTE ORDER MARK". > """ <http://perldoc.perl.org/charnames.html#charnames%3a%3aviacode(code)> > > If .name() cannot be touched, what about implementing .bestname() with the above semantics? I think it's better to let the programmer decide what the "best" name should be, e.g. some people will like ESC better than ESCAPE or \u001b or \x1b. unicodedata only provides neutral access to what's in the Unicode database. It doesn't make any decisions on what's good or bad ;-)

On 24.06.2013 16:58, Alexander Belopolsky wrote:
> 
> Alexander Belopolsky added the comment:
> 
> Here is an example of "prior art" that is relevant to this discussion:
> 
> """
> charnames::viacode(code)
> ..
> As mentioned above under ALIASES, Unicode 6.1 defines extra names (synonyms or aliases) for some code points, most of which were already available as Perl extensions. All these are accepted by \N{...} and the other functions in this module, but viacode has to choose which one name to return for a given input code point, so it returns the "best" name. To understand how this works, it is helpful to know more about the Unicode name properties. All code points actually have only a single name, which (starting in Unicode 2.0) can never change once a character has been assigned to the code point. But mistakes have been made in assigning names, for example sometimes a clerical error was made during the publishing of the Standard which caused words to be misspelled, and there was no way to correct those. The Name_Alias property was eventually created to handle these situations. If a name was wrong, a corrected synonym would be published for it, using Name_Alias. viacode will return
  t
>  hat corr
>  ected synonym as the "best" name for a code point. (It is even possible, though it hasn't happened yet, that the correction itself will need to be corrected, and so another Name_Alias can be created for that code point; viacode will return the most recent correction.)
> 
> The Unicode name for each of the control characters (such as LINE FEED) is the empty string. However almost all had names assigned by other standards, such as the ASCII Standard, or were in common use. viacode returns these names as the "best" ones available. Unicode 6.1 has created Name_Aliases for each of them, including alternate names, like NEW LINE. viacode uses the original name, "LINE FEED" in preference to the alternate. Similarly the name returned for U+FEFF is "ZERO WIDTH NO-BREAK SPACE", not "BYTE ORDER MARK".
> """ <http://perldoc.perl.org/charnames.html#charnames%3a%3aviacode(code)>
> 
> If .name() cannot be touched, what about implementing .bestname() with the above semantics?

I think it's better to let the programmer decide what the "best"
name should be, e.g. some people will like ESC better than ESCAPE or
\u001b or \x1b.

unicodedata only provides neutral access to what's in the Unicode database.
It doesn't make any decisions on what's good or bad ;-)

History
Date	User	Action	Args
2013-06-24 15:07:49	lemburg	set	recipients: + lemburg, loewis, belopolsky, benjamin.peterson, ezio.melotti, serhiy.storchaka
2013-06-24 15:07:49	lemburg	link	issue18234 messages
2013-06-24 15:07:49	lemburg	create