Author eryksun
Recipients eryksun, r.david.murray, zwol
Date 2016-07-12.15:08:29
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1468336109.3.0.336613219392.issue27496@psf.upfronthosting.co.za>
In-reply-to
Content
Character names are in field 1 of UnicodeData.txt [1][2]. For controls the name is just "<control>". In Tools/unicode/makunicodedata.py, the makeunicodename function skips names that start with "<". Instead of skipping the character, it could fall back on the Unicode 1.0 name (field 10), if it's defined. For controls, this is the ISO 6429 name:

    (10) Old name as published in Unicode 1.0 or ISO 6429 names 
    for control functions. This field is empty unless it is 
    significantly different from the current name for the 
    character. No longer used in code chart production. See 
    Name_Alias. 

The names of control characters are also in NameAliases.txt, which gets processed as the unicode.aliases list of (name, char) tuples.

[1]: http://www.unicode.org/reports/tr44/#UnicodeData.txt
[2]: http://www.unicode.org/Public/8.0.0/ucd
History
Date User Action Args
2016-07-12 15:08:29eryksunsetrecipients: + eryksun, r.david.murray, zwol
2016-07-12 15:08:29eryksunsetmessageid: <1468336109.3.0.336613219392.issue27496@psf.upfronthosting.co.za>
2016-07-12 15:08:29eryksunlinkissue27496 messages
2016-07-12 15:08:29eryksuncreate