I like Serhiy's patch, too, but it feels like the single-digit case should be enough. I found this comment by Yury a good argument:

I can see improvements in micro benchmarks, but even more importantly, Serhiy's patch reduces memory fragmentations.  99% of all long allocations are coming from freelist when it's there.

Did that comment come from a benchmark suite run? (i.e. actual applications and not micro benchmarks?) And, does it show a difference between the single- and multi-digit cases?
