Issue 20433: add aliasedname() and namedaliases() methods to unicodedata module

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/64632

classification

Title:	add aliasedname() and namedaliases() methods to unicodedata module
Type:	enhancement	Stage:	resolved
Components:	Unicode	Versions:	Python 3.3

process

Status:	closed	Resolution:	duplicate
Dependencies:		Superseder:	Unicodedata module should provide access to codepoint aliases View: 18234
Assigned To:		Nosy List:	ezio.melotti, jamadagni, serhiy.storchaka, vstinner
Priority:	normal	Keywords:

Created on 2014-01-29 07:42 by jamadagni, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
aliasedname.py	jamadagni, 2014-01-29 07:42	code that illustrates the desired behaviour of the requested functions

Messages (2)
msg209618 - (view)	Author: Shriramana Sharma (jamadagni)	Date: 2014-01-29 07:42
Currently we have unicodedata.name() which returns the formal character name of the character chr as per the second column in UnicodeData.txt from http://www.unicode.org/Public/UNIDATA/. However, there are a few characters where the formal character name has spelling mistakes. Also, the control characters in the Basic Latin and Latin-1 blocks aren't really given meaningful character names. In one case, that of FEFF, the formal name ZERO WIDTH NO-BREAK SPACE refers to a deprecated usage of the character (and the alternate name BYTE ORDER MARK refers to the recommended usage). In all these cases, improved names are provided as stable aliases in NameAliases.txt from the same UNIDATA source. These are also part of the stable standard and are intended to alleviate the naming situation w.r.t. the above issues. For the stability, see: http://www.unicode.org/policies/stability_policy.html#Formal_Name_Alias Hence it would be most useful if the unicodedata module would add an aliasedname() method with the same signature as name() to provide the official aliased name in the case of characters with aliases, and when a character does not have an alias, to provide the same output as name(). As of Py 3.3, unicodedata.lookup() already uses/supports NameAliases.txt for returning the character given the name. The present requirement is to use it for returning the name given the character. Note that NameAliases.txt has abbreviated names for some characters (where the third column reads "abbreviation"). While these would be useful for lookup(), they would not be useful to be returned for aliasedname(). For instance, one would prefer to see "SPACE" returned for 0020 rather than "SP". So these entries should be disregarded for aliasedname(). Also, NameAliases.txt has multiple entries for some characters even after discarding the abbreviation entries. In these cases, the first entry should be used (for want of a better rule). It is presumed that these are provided in some order of preference. It should be noted that discussion on this topic on the "unicore" (Unicode members) mailing list (on the thread "When normative aliases exist..." started 2014-01-21) indicates that the order of entries is subject to change although the entries themselves will not be removed. In this case, the first non-abbreviation entry may change. This is acceptable for the behaviour of aliasedname(). Also note that aliases may be defined in future. Thus the string returned by aliasedname() for a given character is not guaranteed to be the same, but whatever is returned by it will surely be valid to use with lookup(). Those who desire a single immutable name and do not require the improvements provided by the aliases should use name() and not aliasedname(). Finally, for extended support, a namealiases() function should return all the aliases together with their types, allowing the user full choice of the desired but official alias. The attached code should clarify the required behaviour. (It is not a patch, just an illustration.)
msg209621 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-01-29 07:50
This is a duplicate of issue18234.

History
Date	User	Action	Args
2022-04-11 14:57:57	admin	set	github: 64632
2014-02-10 08:38:15	ezio.melotti	set	status: pending -> closed stage: resolved
2014-01-29 07:50:26	serhiy.storchaka	set	status: open -> pending nosy: + serhiy.storchaka messages: + msg209621 superseder: Unicodedata module should provide access to codepoint aliases resolution: duplicate
2014-01-29 07:42:35	jamadagni	create