Message 144832 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	loewis
Recipients	ezio.melotti, gvanrossum, lemburg, loewis, mrabarnett, tchrist, terry.reedy
Date	2011-10-03.18:44:24
SpamBayes Score	4.9728314e-08
Marked as misclassified	No
Message-id	<1317667465.84.0.262627039258.issue12753@psf.upfronthosting.co.za>
In-reply-to

Content
The patch needs to take versioning into account. It seems that NamedSequences where added in 4.1, and NameAliases in 5.0. So for the moment, when using 3.2 (i.e. when self is not NULL), it is fine to lookup neither. Please put an assertion into makeunicodedata that this needs to be reviewed when an old version other than 3.2 needs to be supported. The size of the DB does matter; there are frequent complaints about it. The named sequences take 20kB on my system; not sure whether that's too much. If you want to reduce the size (and also speedup lookup), you could use private-use characters, like so: - add the named sequences as PUA characters to the names table of makeunicodename, in the range(P, P+418) (for some P). - in lookup, check whether the _getcode result is in range(P,P+418). If so, subtract P from the code and use this as an index into _namedsequences. - add a _getcode wrapper that filters out all private use characters, for regular lookup.

The patch needs to take versioning into account. It seems that NamedSequences where added in 4.1, and NameAliases in 5.0. So for the moment, when using 3.2 (i.e. when self is not NULL), it is fine to lookup neither. Please put an assertion into makeunicodedata that this needs to be reviewed when an old version other than 3.2 needs to be supported.

The size of the DB does matter; there are frequent complaints about it. The named sequences take 20kB on my system; not sure whether that's too much. If you want to reduce the size (and also speedup lookup), you could use private-use characters, like so:
- add the named sequences as PUA characters to the names table of makeunicodename, in the range(P, P+418) (for some P).
- in lookup, check whether the _getcode result is in range(P,P+418). If so, subtract P from the code and use this as an index into _namedsequences.
- add a _getcode wrapper that filters out all private use characters, for regular lookup.

History
Date	User	Action	Args
2011-10-03 18:44:25	loewis	set	recipients: + loewis, lemburg, gvanrossum, terry.reedy, ezio.melotti, mrabarnett, tchrist
2011-10-03 18:44:25	loewis	set	messageid: <1317667465.84.0.262627039258.issue12753@psf.upfronthosting.co.za>
2011-10-03 18:44:25	loewis	link	issue12753 messages
2011-10-03 18:44:24	loewis	create