Message185378
If a dict is used a cache, e.g. in functools.lru_cache,
the reduced resize factor in 3.3 can cause excessive resizing.
This can lead to a significant performance regression.
When the the number of deletions and insertions is roughly in balance
the reduced head room in the dict (compare to 3.2) causes a large increase in the number of resizes.
The reason for this above-linear increase is that with fewer dummy keys, the chance of a dummy being overwritten is reduced *and* is there is less overhead as well.
A dictionary with 128 items will have a capacity of 256 and only 43 dummy keys. If it had a capacity of 512 (as it would have done in 3.2) then it will have 214 keys, making a resize at least 10 times less frequent.
Changing the growth function from round_up_to_power_2(used*2) to
round_up_to_power_2(used*2+capacity/2) the desirable property of only doubling in size when growing can be preserved, yet ensuring sufficient overhead when used as a cache.
Consider a dict which grows to n items and then remains that size, with frequent deletions and insertions, using the proposed growth function:
Items Capacity Steady state Capacity
on reaching n capacity under 3.2
2 8 8 8
4 8 16 16
6 16 32 32
8 16 32 32
10 16 64 64
12 32 64 64
15 32 64 64
20 32 128 128
30 64 128 128
50 128 256 256
80 128 512 512
128 256 512 512
Thanks to Raymond Hettinger for bringing this to my attention.
Patch attached. |
|
Date |
User |
Action |
Args |
2013-03-27 21:33:59 | Mark.Shannon | set | recipients:
+ Mark.Shannon |
2013-03-27 21:33:59 | Mark.Shannon | set | messageid: <1364420039.63.0.203772094216.issue17563@psf.upfronthosting.co.za> |
2013-03-27 21:33:59 | Mark.Shannon | link | issue17563 messages |
2013-03-27 21:33:59 | Mark.Shannon | create | |
|