This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Mark.Shannon
Recipients Jim.Jewett, Mark.Shannon, benjamin.peterson, giampaolo.rodola, gregory.p.smith, jcea, jcon, pitrou, pjenvey, rhettinger, terry.reedy, vstinner
Date 2012-02-29.16:24:25
SpamBayes Score 2.220446e-16
Marked as misclassified No
Message-id <4F4E513C.2010906@hotpy.org>
In-reply-to <1330528537.29.0.642752292673.issue13903@psf.upfronthosting.co.za>
Content
Jim Jewett wrote:
> Jim Jewett <jimjjewett@gmail.com> added the comment:
> 
> As of Feb 28, 2012, the PEP mentions an additional optimization of storing the values in an array indexed by (effectively) key insertion order, rather than key position. ("Alternative Implementation")
> 
> It states that this would reduce memory usage for the values array by 1/3.  1/3 is a worst-case measurement; average is 1/2.  (At savings of less than 1/3, the keys would resize, to initial savings of 2/3.  And yes, that means in practice, the average savings would be greater than half, because the frequency of dicts of size N decreases with N.)
> 

I presume you mean to allocate a values array of size == actual keys
rather than size == usable keys.

Making the  values array smaller than the the number of usable keys
causes a number of issues:

1. The values_size will need to be kept in the dict, not in the keys, 
which would make the dict larger for size 3 or 5, the same size for
size 2 or 4, but it would save memory for sizes 6-8 (see below).
So it might not save memory at all, it depends on the distribution of 
the sizes of shared-key dicts.
2. dk_usable would need to be reverted to dk_fill (c.f ma_fill) in order
to quickly allocate values arrays. This would slow down the resizing 
check, which is now done before insertion, so might be slower.
(not really a problem, but it might slow down inserts)
3. An extra check needs to be performed for each read to ensure the 
index is in-bounds
(again not really a problem, but it might slow down reads)

Comparative sizes of values array (including size field):

Items Proposed    Alternative    Other Alternative    Delta
                (size == usuable) (size == keys (+1))
   1      4           3                 2              -1	
   2      4           3                 3               0
   3      4           3                 4              +1
   4      8           5                 5               0
   5      8           5                 6              +1
   6     16          10                 7              -3
   7     16          10                 8              -2
   8     16          10                 9              -1
   9     16          10                10               0
  10     16          10                11              +1
  12     32          21                13              -8
  14     32          21                15              -6

The memory savings of the two alternative implementations are broadly
similar.

Clearly, either of the alternatives are attractive in terms of memory 
use. I think it is definitely worth investigating further, but I would 
like to get the proposed implementation into CPython first.

> It states that the keys table would need an additional "values_size" field, but in the absence of dummies, this is just ma_used.

values_size != ma_used
values_size is the size of the values array, not the number of values.

Don't forget deletions or the case where items are inserted in a 
different order from that of another dict sharing the same keys.
Although there are no dummies, (key != NULL, value == NULL) is a legal
pair representing a missing or deleted value.

Cheers,
Mark.
History
Date User Action Args
2012-02-29 16:24:26Mark.Shannonsetrecipients: + Mark.Shannon, rhettinger, terry.reedy, gregory.p.smith, jcea, pitrou, vstinner, giampaolo.rodola, pjenvey, benjamin.peterson, jcon, Jim.Jewett
2012-02-29 16:24:26Mark.Shannonlinkissue13903 messages
2012-02-29 16:24:25Mark.Shannoncreate