Message52743
Implements quick checking of already normalized forms as
described in http://unicode.org/reports/tr15/#Annex8
The patch is against 2.6 SVN trunk. Normalization test
passes on both UCS2 and UCS4 builds on Ubuntu Edgy.
API affected:
unicodedata.normalize('NFC', u'a') is u'a' and similar
expressions become true, as the unicode object is not
copied when it is found to be already normalized.
The documentation does not specify either way.
Added memory footprint:
A new 8-bit field is added to _PyUnicode_DatabaseRecord,
and the generated _PyUnicode_Database_Records
array grows from 219 records to 304 records. Each
record looks like this:
typedef struct {
const unsigned char category;
const unsigned char combining;
const unsigned char bidirectional;
const unsigned char mirrored;
const unsigned char east_asian_width;
const unsigned char normalization_quick_check;
} _PyUnicode_DatabaseRecord;
normalization_quick_check is the added field.
|
|
Date |
User |
Action |
Args |
2007-08-23 15:58:42 | admin | link | issue1734234 messages |
2007-08-23 15:58:42 | admin | create | |
|