This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author raulir
Recipients
Date 2007-06-09.22:45:42
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Implements quick checking of already normalized forms as 
described in http://unicode.org/reports/tr15/#Annex8

The patch is against 2.6 SVN trunk. Normalization test
passes on both UCS2 and UCS4 builds on Ubuntu Edgy.

API affected:

unicodedata.normalize('NFC', u'a') is u'a' and similar
expressions become true, as the unicode object is not
copied when it is found to be already normalized.
The documentation does not specify either way.

Added memory footprint:

A new 8-bit field is added to _PyUnicode_DatabaseRecord,
and the generated _PyUnicode_Database_Records
array grows from 219 records to 304 records. Each
record looks like this:

typedef struct {
   const unsigned char category;
   const unsigned char combining;
   const unsigned char bidirectional;
   const unsigned char mirrored;
   const unsigned char east_asian_width;
   const unsigned char normalization_quick_check;
} _PyUnicode_DatabaseRecord;

normalization_quick_check is the added field.
History
Date User Action Args
2007-08-23 15:58:42adminlinkissue1734234 messages
2007-08-23 15:58:42admincreate