Author loewis
Recipients ezio.melotti, gvanrossum, lemburg, loewis, mrabarnett, tchrist, terry.reedy
Date 2011-10-03.18:44:24
SpamBayes Score 4.97283e-08
Marked as misclassified No
Message-id <1317667465.84.0.262627039258.issue12753@psf.upfronthosting.co.za>
In-reply-to
Content
The patch needs to take versioning into account. It seems that NamedSequences where added in 4.1, and NameAliases in 5.0. So for the moment, when using 3.2 (i.e. when self is not NULL), it is fine to lookup neither. Please put an assertion into makeunicodedata that this needs to be reviewed when an old version other than 3.2 needs to be supported.

The size of the DB does matter; there are frequent complaints about it. The named sequences take 20kB on my system; not sure whether that's too much. If you want to reduce the size (and also speedup lookup), you could use private-use characters, like so:
- add the named sequences as PUA characters to the names table of makeunicodename, in the range(P, P+418) (for some P).
- in lookup, check whether the _getcode result is in range(P,P+418). If so, subtract P from the code and use this as an index into _namedsequences.
- add a _getcode wrapper that filters out all private use characters, for regular lookup.
History
Date User Action Args
2011-10-03 18:44:25loewissetrecipients: + loewis, lemburg, gvanrossum, terry.reedy, ezio.melotti, mrabarnett, tchrist
2011-10-03 18:44:25loewissetmessageid: <1317667465.84.0.262627039258.issue12753@psf.upfronthosting.co.za>
2011-10-03 18:44:25loewislinkissue12753 messages
2011-10-03 18:44:24loewiscreate