Author Jim.Jewett
Recipients Arfrever, Jean-Michel.Fauth, Jim.Jewett, belopolsky, benjamin.peterson, ezio.melotti, gvanrossum, mrabarnett, pitrou, python-dev, tchrist
Date 2012-01-16.00:24:46
SpamBayes Score 1.38889e-05
Marked as misclassified No
Message-id <1326673487.26.0.947477273583.issue12736@psf.upfronthosting.co.za>
In-reply-to
Content
Why was the delta-processing removed from the casing functions?

As best I can tell, the whole point of going through multiple levels of indirection (courtesy splitbins) is to maximize compression and minimize the amount of cache that unicode might occupy.

By using deltas, only one record is needed for each combination of (upper - lower, upper - title), which is generally only one or two combinations per script.  

Without deltas, nearly every cased letter needs its own record, and the index tables also get bigger. (It seems to be about 2.6 times as large, but cache effects may be worse, since letters from the same script will no longer be in the same record or the same index chain.)

If it is a concern about not enough room for flags, then the decimal/digit chars could be combined.  They are always the same, unless the number isn't decimal (in which case the flag is enough).
History
Date User Action Args
2012-01-16 00:24:47Jim.Jewettsetrecipients: + Jim.Jewett, gvanrossum, belopolsky, pitrou, benjamin.peterson, ezio.melotti, mrabarnett, Arfrever, Jean-Michel.Fauth, python-dev, tchrist
2012-01-16 00:24:47Jim.Jewettsetmessageid: <1326673487.26.0.947477273583.issue12736@psf.upfronthosting.co.za>
2012-01-16 00:24:46Jim.Jewettlinkissue12736 messages
2012-01-16 00:24:46Jim.Jewettcreate