classification
Title: Excessive resizing of dicts when used as a cache
Type: performance Stage:
Components: Interpreter Core Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Mark.Shannon, eric.snow, python-dev, rhettinger, vstinner
Priority: normal Keywords: patch

Created on 2013-03-27 21:33 by Mark.Shannon, last changed 2013-05-17 10:29 by rhettinger. This issue is now closed.

Files
File name Uploaded Description Edit
resize.patch Mark.Shannon, 2013-03-27 21:33 Patch review
Messages (2)
msg185378 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2013-03-27 21:33
If a dict is used a cache, e.g. in functools.lru_cache, 
the reduced resize factor in 3.3 can cause excessive resizing.
This can lead to a significant performance regression.

When the the number of deletions and insertions is roughly in balance
the reduced head room in the dict (compare to 3.2) causes a large increase in the number of resizes.

The reason for this above-linear increase is that with fewer dummy keys, the chance of a dummy being overwritten is reduced *and* is there is less overhead as well.
A dictionary with 128 items will have a capacity of 256 and only 43 dummy keys. If it had a capacity of 512 (as it would have done in 3.2) then it will have 214 keys, making a resize at least 10 times less frequent.

Changing the growth function from round_up_to_power_2(used*2) to
round_up_to_power_2(used*2+capacity/2) the desirable property of only doubling in size when growing can be preserved, yet ensuring sufficient overhead when used as a cache.

Consider a dict which grows to n items and then remains that size, with frequent deletions and insertions, using the proposed growth function:

Items    Capacity         Steady state     Capacity
         on reaching n    capacity         under 3.2
  2        8                  8               8
  4        8                  16              16 
  6        16                 32              32 
  8        16                 32              32 
  10       16                 64              64 
  12       32                 64              64 
  15       32                 64              64 
  20       32                 128             128
  30       64                 128             128
  50       128                256             256
  80       128                512             512
  128      256                512             512


Thanks to Raymond Hettinger for bringing this to my attention.

Patch attached.
msg189443 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-05-17 10:25
New changeset cd2457463eeb by Raymond Hettinger in branch '3.3':
Issue #17563: Fix dict resize performance regression.
http://hg.python.org/cpython/rev/cd2457463eeb
History
Date User Action Args
2013-05-17 10:29:11rhettingersetstatus: open -> closed
resolution: fixed
2013-05-17 10:25:17python-devsetnosy: + python-dev
messages: + msg189443
2013-03-29 05:45:31eric.snowsetnosy: + eric.snow
2013-03-28 01:27:25rhettingersetassignee: rhettinger

nosy: + rhettinger
2013-03-27 23:37:54vstinnersetnosy: + vstinner
2013-03-27 21:33:59Mark.Shannoncreate