This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author steven.daprano
Recipients ezio.melotti, hidr0.frbg, ronaldoussoren, steven.daprano, vstinner
Date 2020-12-10.12:15:13
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1607602513.39.0.743587122132.issue42614@roundup.psfhosted.org>
In-reply-to
Content
In addition, you are probably hitting normalization issues. There are two ways to get the Cyrillic character 'й' in your string, one of them is a single code point, the other is two code points:

>>> a = 'й'
>>> b = 'й'
>>> len(a), unicodedata.name(a)
(1, 'CYRILLIC SMALL LETTER SHORT I')
>>> len(b), unicodedata.name(b[0]), unicodedata.name(b[1])
(2, 'CYRILLIC SMALL LETTER I', 'COMBINING BREVE')
History
Date User Action Args
2020-12-10 12:15:13steven.dapranosetrecipients: + steven.daprano, ronaldoussoren, vstinner, ezio.melotti, hidr0.frbg
2020-12-10 12:15:13steven.dapranosetmessageid: <1607602513.39.0.743587122132.issue42614@roundup.psfhosted.org>
2020-12-10 12:15:13steven.dapranolinkissue42614 messages
2020-12-10 12:15:13steven.dapranocreate