classification
Title: Bugs and inconsistencies in unicodedata
Type: behavior Stage: needs patch
Components: Documentation, Unicode Versions: Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, dscorbett, ezio.melotti, lemburg, vstinner
Priority: normal Keywords:

Created on 2019-03-30 14:41 by dscorbett, last changed 2019-04-05 18:28 by terry.reedy.

Messages (1)
msg339203 - (view) Author: David Corbett (dscorbett) Date: 2019-03-30 14:41
In `unicodedata`, the functions `lookup` and `name` have some bugs and inconsistencies.

`lookup` matches case-insensitively, except for the algorithmic names of Hangul syllables and CJK unified ideographs, which must be in all caps. The documentation does not explain how character names are fuzzily matched.

`lookup` accepts names like “CJK UNIFIED IDEOGRAPH-04E00”, where the code point has a leading zero.

`lookup` and `name` don’t implement rule NR2, defined in chapter 4 of Unicode, for Tangut ideographs’ names.
History
Date User Action Args
2019-04-05 18:28:55terry.reedysetstage: needs patch
versions: + Python 3.8
2019-03-30 14:48:43xtreaksetnosy: + lemburg
2019-03-30 14:41:13dscorbettcreate