diff -r 26e2ee402a0b Objects/dictnotes.txt --- a/Objects/dictnotes.txt Tue Jun 12 23:04:11 2012 +0200 +++ b/Objects/dictnotes.txt Wed Jun 13 16:29:04 2012 +0100 @@ -70,42 +70,8 @@ Tunable Dictionary Parameters ----------------------------- -* PyDict_STARTSIZE. Starting size of dict (unless an instance dict). - Currently set to 8. Must be a power of two. - New dicts have to zero-out every cell. - Increasing improves the sparseness of small dictionaries but costs - time to read in the additional cache lines if they are not already - in cache. That case is common when keyword arguments are passed. - Prior to version 3.3, PyDict_MINSIZE was used as the starting size - of a new dict. - -* PyDict_MINSIZE. Minimum size of a dict. - Currently set to 4 (to keep instance dicts small). - Must be a power of two. Prior to version 3.3, PyDict_MINSIZE was - set to 8. - -* USABLE_FRACTION. Maximum dictionary load in PyDict_SetItem. - Currently set to 2/3. Increasing this ratio makes dictionaries more - dense resulting in more collisions. Decreasing it improves sparseness - at the expense of spreading entries over more cache lines and at the - cost of total memory consumed. - -* Growth rate upon hitting maximum load. Currently set to *2. - Raising this to *4 results in half the number of resizes, less - effort to resize, better sparseness for some (but not all dict sizes), - and potentially doubles memory consumption depending on the size of - the dictionary. Setting to *4 eliminates every other resize step. - -* Maximum sparseness (minimum dictionary load). What percentage - of entries can be unused before the dictionary shrinks to - free up memory and speed up iteration? (The current CPython - code does not represent this parameter directly.) - -* Shrinkage rate upon exceeding maximum sparseness. The current - CPython code never even checks sparseness when deleting a - key. When a new key is added, it resizes based on the number - of active keys, so that the addition may trigger shrinkage - rather than growth. +See comments for PyDict_MINSIZE_SPLIT, PyDict_MINSIZE_COMBINED, +USABLE_FRACTION and GROWTH_RATE in dictobject.c Tune-ups should be measured across a broad range of applications and use cases. A change to any parameter will help in some situations and diff -r 26e2ee402a0b Objects/dictobject.c --- a/Objects/dictobject.c Tue Jun 12 23:04:11 2012 +0200 +++ b/Objects/dictobject.c Wed Jun 13 16:29:04 2012 +0100 @@ -270,7 +270,13 @@ #define DK_MASK(dk) (((dk)->dk_size)-1) #define IS_POWER_OF_2(x) (((x) & (x-1)) == 0) -/* USABLE_FRACTION must obey the following: +/* USABLE_FRACTION is the maximum dictionary load. + * Currently set to (2n+1)/3. Increasing this ratio makes dictionaries more + * dense resulting in more collisions. Decreasing it improves sparseness + * at the expense of spreading entries over more cache lines and at the + * cost of total memory consumed. + * + * USABLE_FRACTION must obey the following: * (0 < USABLE_FRACTION(n) < n) for all n >= 2 * * USABLE_FRACTION should be very quick to calculate. @@ -290,6 +296,14 @@ * #define USABLE_FRACTION(n) (((n) >> 1) + ((n) >> 2) - ((n) >> 3)) */ +/* GROWTH_RATE. Growth rate upon hitting maximum load. Currently set to *2. + * Raising this to *4 doubles memory consumption depending on the size of + * the dictionary, but results in half the number of resizes, less effort to + * resize and better sparseness for some (but not all dict sizes). + * Setting to *4 eliminates every other resize step. + * GROWTH_RATE was set to *4 up to version 3.2. + */ +#define GROWTH_RATE(x) ((x) * 2) #define ENSURE_ALLOWS_DELETIONS(d) \ if ((d)->ma_keys->dk_lookup == lookdict_unicode_nodummy) { \ @@ -767,13 +781,7 @@ static int insertion_resize(PyDictObject *mp) { - /* - * Double the size of the dict, - * Previous versions quadrupled size, but doing so may result in excessive - * memory use. Doubling keeps the number of resizes low without wasting - * too much memory. - */ - return dictresize(mp, 2 * mp->ma_used); + return dictresize(mp, GROWTH_RATE(mp->ma_used)); } /*