Author Greg Price
Recipients Greg Price
Date 2019-08-05.01:06:02
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1564967162.48.0.525882022653.issue37758@roundup.psfhosted.org>
In-reply-to
Content
The unicodedata module has two test cases which run through the database and make a hash of its visible outputs for all codepoints, comparing the hash against a checksum.  These are helpful regression tests for making sure the behavior isn't changed by patches that didn't intend to change it.

But Unicode has grown since Python first gained support for it, when Unicode itself was still rather new.  These test cases were added in commit 6a20ee7de back in 2000, and they haven't needed to change much since then... but they should be changed to look beyond the Basic Multilingual Plane (`range(0x10000)`) and cover all 17 planes of Unicode's final form.

Spotted in discussion on GH-15019 (https://github.com/python/cpython/pull/15019#discussion_r308947884 ).  I have a patch for this which I'll send shortly.
History
Date User Action Args
2019-08-05 01:06:02Greg Pricesetrecipients: + Greg Price
2019-08-05 01:06:02Greg Pricesetmessageid: <1564967162.48.0.525882022653.issue37758@roundup.psfhosted.org>
2019-08-05 01:06:02Greg Pricelinkissue37758 messages
2019-08-05 01:06:02Greg Pricecreate