Message 191747 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	belopolsky, benjamin.peterson, ezio.melotti, lemburg, loewis, serhiy.storchaka
Date	2013-06-24.07:54:04
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<51C7FB18.6040001@egenix.com>
In-reply-to	<1372020225.63.0.919311160342.issue18234@psf.upfronthosting.co.za>

Content
On 23.06.2013 22:43, Alexander Belopolsky wrote: > > Alexander Belopolsky added the comment: > > unicodedata.name() was discussed in #12353 (msg144739) where MvL argued that misspelled names are better than corrected because they are more likely to appear misspelled in other sources. I am not sure I buy this argument. Someone googling for 'BYZANTINE MUSICAL SYMBOL FHTORA SKLIRON CHROMA VASIS' will probably just enter BYZANTINE VASIS and find what he or she needs. A more likely scenario is someone trying to get all FTHORA symbols using a naive code like this: [hex(i) for i in range(1114112) if 'FTHORA' in ud.name(chr(i), '')]. > > Even more likely scenario is someone seeing a fancy symbol on the web and wanting to use it in a python program. Such programmer would copy the symbol to python prompt, call unicode.name() and copy the result in the program. Do we want to encourage people to perpetuate the mistake that Unicode has corrected? > > I don't think the issue of control codes names was discussed in #12353. I see no downside with returning the first alias in case no name is present. We should stick to the rules. Please leave the function as it is, i.e. a 1-1 mapping to the official, non-changing Unicode name reference (including spelling errors, etc). Same with code points that have no name. If you want to expose the aliases, you can do so in a new function, say .aliases() which then returns the list of aliases of a character (including the original name, if available). If we change the return values of .name() to whatever we think would be more usable, we'd be modifying how Python programmers see the Unicode database. That's not the purpose of the module.

On 23.06.2013 22:43, Alexander Belopolsky wrote:
> 
> Alexander Belopolsky added the comment:
> 
> unicodedata.name() was discussed in #12353 (msg144739) where MvL argued that misspelled names are better than corrected because they are more likely to appear misspelled in other sources.  I am not sure I buy this argument.  Someone googling for 'BYZANTINE MUSICAL SYMBOL FHTORA SKLIRON CHROMA VASIS' will probably just enter BYZANTINE VASIS and find what he or she needs.  A more likely scenario is someone trying to get all FTHORA symbols using a naive code like this: [hex(i) for i in range(1114112) if 'FTHORA' in ud.name(chr(i), '')].
> 
> Even more likely scenario is someone seeing a fancy symbol on the web and wanting to use it in a python program.  Such programmer would copy the symbol to python prompt, call unicode.name() and copy the result in the program.  Do we want to encourage people to perpetuate the mistake that Unicode has corrected?
> 
> I don't think the issue of control codes names was discussed in #12353.  I see no downside with returning the first alias in case no name is present.

We should stick to the rules. Please leave the function as it
is, i.e. a 1-1 mapping to the official, non-changing Unicode
name reference (including spelling errors, etc). Same with
code points that have no name.

If you want to expose the aliases, you can do so in a new
function, say .aliases() which then returns the list of
aliases of a character (including the original name,
if available).

If we change the return values of .name() to whatever we think
would be more usable, we'd be modifying how Python programmers
see the Unicode database. That's not the purpose of the module.

History
Date	User	Action	Args
2013-06-24 07:54:05	lemburg	set	recipients: + lemburg, loewis, belopolsky, benjamin.peterson, ezio.melotti, serhiy.storchaka
2013-06-24 07:54:05	lemburg	link	issue18234 messages
2013-06-24 07:54:04	lemburg	create