Message154645
Jim Jewett wrote:
> Jim Jewett <jimjjewett@gmail.com> added the comment:
>
> As of Feb 28, 2012, the PEP mentions an additional optimization of storing the values in an array indexed by (effectively) key insertion order, rather than key position. ("Alternative Implementation")
>
> It states that this would reduce memory usage for the values array by 1/3. 1/3 is a worst-case measurement; average is 1/2. (At savings of less than 1/3, the keys would resize, to initial savings of 2/3. And yes, that means in practice, the average savings would be greater than half, because the frequency of dicts of size N decreases with N.)
>
I presume you mean to allocate a values array of size == actual keys
rather than size == usable keys.
Making the values array smaller than the the number of usable keys
causes a number of issues:
1. The values_size will need to be kept in the dict, not in the keys,
which would make the dict larger for size 3 or 5, the same size for
size 2 or 4, but it would save memory for sizes 6-8 (see below).
So it might not save memory at all, it depends on the distribution of
the sizes of shared-key dicts.
2. dk_usable would need to be reverted to dk_fill (c.f ma_fill) in order
to quickly allocate values arrays. This would slow down the resizing
check, which is now done before insertion, so might be slower.
(not really a problem, but it might slow down inserts)
3. An extra check needs to be performed for each read to ensure the
index is in-bounds
(again not really a problem, but it might slow down reads)
Comparative sizes of values array (including size field):
Items Proposed Alternative Other Alternative Delta
(size == usuable) (size == keys (+1))
1 4 3 2 -1
2 4 3 3 0
3 4 3 4 +1
4 8 5 5 0
5 8 5 6 +1
6 16 10 7 -3
7 16 10 8 -2
8 16 10 9 -1
9 16 10 10 0
10 16 10 11 +1
12 32 21 13 -8
14 32 21 15 -6
The memory savings of the two alternative implementations are broadly
similar.
Clearly, either of the alternatives are attractive in terms of memory
use. I think it is definitely worth investigating further, but I would
like to get the proposed implementation into CPython first.
> It states that the keys table would need an additional "values_size" field, but in the absence of dummies, this is just ma_used.
values_size != ma_used
values_size is the size of the values array, not the number of values.
Don't forget deletions or the case where items are inserted in a
different order from that of another dict sharing the same keys.
Although there are no dummies, (key != NULL, value == NULL) is a legal
pair representing a missing or deleted value.
Cheers,
Mark. |
|
Date |
User |
Action |
Args |
2012-02-29 16:24:26 | Mark.Shannon | set | recipients:
+ Mark.Shannon, rhettinger, terry.reedy, gregory.p.smith, jcea, pitrou, vstinner, giampaolo.rodola, pjenvey, benjamin.peterson, jcon, Jim.Jewett |
2012-02-29 16:24:26 | Mark.Shannon | link | issue13903 messages |
2012-02-29 16:24:25 | Mark.Shannon | create | |
|