New shared-keys dictionary implementation #58111

markshannon · 2012-01-29T14:26:46Z

BPO	13903
Nosy	@loewis, @rhettinger, @terryjreedy, @gpshead, @jcea, @pitrou, @vstinner, @pjenvey, @benjaminp, @markshannon, @JimJJewett
Files	691ce331f955.diff: Patch to rev. 06a6fed0da56 49b7e7e4a27c.diff 257e16e71654.diff: patch to revision 2a142141e5fd 372d0bca85ae.diff: Patch to rev. 9fcb2676696c 73423916a242.diff: Patch to rev. 73423916a242 str_subclass.patch gc_tracking.patch: Patch make_split_table_error.patch cached_keys.patch dkdebug.patch insertdict.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2012-05-02.17:09:34.992>
created_at = <Date 2012-01-29.14:26:46.386>
labels = ['interpreter-core', 'performance']
title = 'New shared-keys dictionary implementation'
updated_at = <Date 2012-05-12.21:52:57.593>
user = 'https://github.com/markshannon'

bugs.python.org fields:

activity = <Date 2012-05-12.21:52:57.593>
actor = 'python-dev'
assignee = 'none'
closed = True
closed_date = <Date 2012-05-02.17:09:34.992>
closer = 'benjamin.peterson'
components = ['Interpreter Core']
creation = <Date 2012-01-29.14:26:46.386>
creator = 'Mark.Shannon'
dependencies = []
files = ['24510', '24670', '24766', '25096', '25313', '25318', '25339', '25340', '25381', '25385', '25422']
hgrepos = ['112']
issue_num = 13903
keywords = ['patch']
message_count = 43.0
messages = ['152229', '152253', '152293', '152345', '152351', '152366', '152373', '152380', '152419', '152863', '152885', '154637', '154645', '155246', '155248', '156096', '157365', '157466', '157477', '157482', '157500', '157649', '158051', '158192', '158723', '159018', '159028', '159031', '159032', '159049', '159055', '159139', '159143', '159161', '159172', '159188', '159478', '159484', '159485', '159489', '159683', '159698', '160496']
nosy_count = 14.0
nosy_names = ['loewis', 'rhettinger', 'terry.reedy', 'gregory.p.smith', 'jcea', 'pitrou', 'vstinner', 'pjenvey', 'benjamin.peterson', 'Yury.Selivanov', 'Mark.Shannon', 'python-dev', 'jcon', 'Jim.Jewett']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue13903'
versions = ['Python 3.3']

markshannon · 2012-01-29T14:26:46Z

The proposed dictionary implementation allows sharing of keys & hashes between dictionaries. This leads to substantial memory savings for object-oriented programs. For non-OO programs the impact is negligible.

terryjreedy · 2012-01-29T18:43:53Z

In the initial comment, 'Dummy' to 'Deleted' here but only here:

Holds an active (key, value) pair. Active can transition to Dummy
+ Holds an active (key, value) pair. Active can transition to Deleted

Im Lib/test/test_pprint.py
def test_set_reprs(self): ...
# Consequently, this test is fragile and ...
+ # XXX So why include this "test" in the first place?
Raymond, I believe you added this 44927 and revised for 3.x in 45067.
I imagine it will also be a problem with randomized hashes. Should it be removed or somehow revised?

vstinner · 2012-01-30T00:05:55Z

I see two unrelated parts in your patch:

change dictionary structure in memory
change many constants linked to optimization: PyDICT_MAXFREELIST: 80->40, 2/3->5/8, etc.

You may open a new issue for the second part, except if I am wrong and you need to change constants for the first part?

(I don't understand why you changed constants and how you chose new values.)

rhettinger · 2012-01-30T20:22:14Z

This would likely require a PEP before having a chance of being considered for inclusion.

markshannon · 2012-01-30T21:26:49Z

Does this really need a PEP?
There is no new language feature and no change to any API.
It is just saving some memory.

The only possible issue is changing dict repr() ordering,
but bpo-13703 will do that anyway.

rhettinger · 2012-01-31T02:17:20Z

Changing dictionaries is a big deal. You're changing many pieces at once (not a good idea) including changing tunable parameters that are well-studied (I spent a month testing whether 5/8 was better idea that 2/3 for resizing or when the ideal small dict size was 4, 8, or 16). You're changing the meaning of the fields in dictobject.h which will likely break any code that relied on those.

The ideas may be good ones but they warrant a good deal of thought. Dicts weren't just slapped together -- the current code is the product to two decades of tweaking by engineers who devoted significant time to the task. It would be easy to unknowingly undo some of their work.

markshannon · 2012-01-31T10:38:17Z

Raymond Hettinger wrote:

Raymond Hettinger <raymond.hettinger@gmail.com> added the comment:

Changing dictionaries is a big deal. You're changing many pieces at once (not a good idea) including changing tunable parameters that are well-studied (I spent a month testing whether 5/8 was better idea that 2/3 for resizing or when the ideal small dict size was 4, 8, or 16). You're changing the meaning of the fields in dictobject.h which will likely break any code that relied on those.

The ideas may be good ones but they warrant a good deal of thought. Dicts weren't just slapped together -- the current code is the product to two decades of tweaking by engineers who devoted significant time to the task. It would be easy to unknowingly undo some of their work.

OK.
I'll write a PEP.

By the way, I'm trying not changing the tunable parameters for the
unshared-keys case, just the shared-keys case. Of course, they do
interact with each other.

pitrou · 2012-01-31T12:55:32Z

As Victor I think the tunables could be changed separately, unless they truely get in the way of the shared-keys optimization.

By the way, there will need a "protection" for the case where instances are used as bags of key/value pairs (dict-alikes with attribute access). Perhaps bound the size of the keys array to 100 entries and then switch into unshared mode.

gpshead · 2012-02-01T00:09:10Z

FYI - I strongly support this type of work to reduce memory use of the Python interpreter! :)

Also, yes, constant changing should be separate from this change but are worth occasionally re-measuring and justifying as common computer architectures have changed since the original decisions were made.

pitrou · 2012-02-08T14:37:38Z

Looking at your latest patch, I worry about "any deletion
+(including pop & popitem) causes a split table to become a combined table". Why wouldn't you use a dummy pointer (such as ((PyObject *) 1)) to signal deleted slots?

markshannon · 2012-02-08T16:34:13Z

Antoine Pitrou wrote:

Antoine Pitrou <pitrou@free.fr> added the comment:

Looking at your latest patch, I worry about "any deletion
+(including pop & popitem) causes a split table to become a combined table". Why wouldn't you use a dummy pointer (such as ((PyObject *) 1)) to signal deleted slots?

In fact here is no need for a dummy pointer.
When deleting from a split-table, the value can just be set to NULL,
the resulting key-NULL pair is legal in a split-table.
Your suggestion doesn't make the code any more complex, so I've included it.

In practice, it will very rare that a deletion occurs in a split table
(since they are only used for attribute dictionaries).

jimjjewett · 2012-02-29T15:15:36Z

As of Feb 28, 2012, the PEP mentions an additional optimization of storing the values in an array indexed by (effectively) key insertion order, rather than key position. ("Alternative Implementation")

It states that this would reduce memory usage for the values array by 1/3. 1/3 is a worst-case measurement; average is 1/2. (At savings of less than 1/3, the keys would resize, to initial savings of 2/3. And yes, that means in practice, the average savings would be greater than half, because the frequency of dicts of size N decreases with N.)

It states that the keys table would need an additional "values_size" field, but in the absence of dummies, this is just ma_used.

Note that this would also simplify resizing, as the values arrays would not have to be re-ordered, and would not have to be modified at all unless/until that particular instance received a value for a position beyond its current size.

markshannon · 2012-02-29T16:24:25Z

Jim Jewett wrote:

Jim Jewett <jimjjewett@gmail.com> added the comment:

As of Feb 28, 2012, the PEP mentions an additional optimization of storing the values in an array indexed by (effectively) key insertion order, rather than key position. ("Alternative Implementation")

It states that this would reduce memory usage for the values array by 1/3. 1/3 is a worst-case measurement; average is 1/2. (At savings of less than 1/3, the keys would resize, to initial savings of 2/3. And yes, that means in practice, the average savings would be greater than half, because the frequency of dicts of size N decreases with N.)

I presume you mean to allocate a values array of size == actual keys
rather than size == usable keys.

Making the values array smaller than the the number of usable keys
causes a number of issues:

The values_size will need to be kept in the dict, not in the keys,
which would make the dict larger for size 3 or 5, the same size for
size 2 or 4, but it would save memory for sizes 6-8 (see below).
So it might not save memory at all, it depends on the distribution of
the sizes of shared-key dicts.
dk_usable would need to be reverted to dk_fill (c.f ma_fill) in order
to quickly allocate values arrays. This would slow down the resizing
check, which is now done before insertion, so might be slower.
(not really a problem, but it might slow down inserts)
An extra check needs to be performed for each read to ensure the
index is in-bounds
(again not really a problem, but it might slow down reads)

Comparative sizes of values array (including size field):

Items Proposed Alternative Other Alternative Delta
(size == usuable) (size == keys (+1))
1 4 3 2 -1
2 4 3 3 0
3 4 3 4 +1
4 8 5 5 0
5 8 5 6 +1
6 16 10 7 -3
7 16 10 8 -2
8 16 10 9 -1
9 16 10 10 0
10 16 10 11 +1
12 32 21 13 -8
14 32 21 15 -6

The memory savings of the two alternative implementations are broadly
similar.

Clearly, either of the alternatives are attractive in terms of memory
use. I think it is definitely worth investigating further, but I would
like to get the proposed implementation into CPython first.

It states that the keys table would need an additional "values_size" field, but in the absence of dummies, this is just ma_used.

values_size != ma_used
values_size is the size of the values array, not the number of values.

Don't forget deletions or the case where items are inserted in a
different order from that of another dict sharing the same keys.
Although there are no dummies, (key != NULL, value == NULL) is a legal
pair representing a missing or deleted value.

Cheers,
Mark.

jimjjewett · 2012-03-09T17:13:19Z

> As of Feb 28, 2012, the PEP mentions an additional
> optimization of storing the values in an array indexed
> by (effectively) key insertion order, rather than key
> position. ("Alternative Implementation")

> It states that this would reduce memory usage for the
> values array by 1/3. 1/3 is a worst-case measurement;
> average is 1/2. (At savings of less than 1/3, the keys
> would resize, to initial savings of 2/3. And yes, that
> means in practice, the average savings would be greater
> than half, because the frequency of dicts of size N
> decreases with N.)

I presume you mean to allocate a values array of
size == actual keys rather than size == usable keys.

Actually, I meant size==maximum(keys in use for this dict),
so that for objects with a potentially complex lifecycle,
those instances that had not yet needed the new attributes
would not need to allocate space for them.

But I see now that just allocating space for each of the
potential keys is indeed simpler. And I suppose that a
class which won't eventually need all the attributes for
every instance is an unusual case.

So to get beneath 2/3 without lots of reallocation
probably requires knowing when the key set is likely
to be complete, and that is indeed more complex than
the current changes. (That said, you have left code
in for immutable keys, so there may be cases where
this isn't so hard after all.)

Making the values array smaller than the the number
of usable keys causes a number of issues:

The values_size will need to be kept in the dict,
not in the keys,

That was indeed true for my proposal. If you just allocate
2/3, then you don't need to store the value, unless you
want to be lazy about reallocating when the keys grow.
Even then, there are few enough potential sizes to fit
the information in a byte, or we could get the info for
free *if* the dict were already timestamped or versioned.

> It states that the keys table would need an additional
> "values_size" field, but in the absence of dummies, this
> is just ma_used.

I was mixing two structures in my mind. The current key
count (which is sufficient for a new values array) is
actually USABLE_FRACTION(dk_size) - dk_free.

jimjjewett · 2012-03-09T17:30:28Z

On Fri, Mar 9, 2012 at 12:13 PM, Jim Jewett

So to get beneath 2/3 without lots of reallocation
probably requires knowing when the key set is likely
to be complete, and that is indeed more complex than
the current changes. (That said, you have left code
in for immutable keys, so there may be cases where
this isn't so hard after all.)

On second thought, avoiding reallocation doesn't have
to be perfect -- just good enough to work on average.

For a *normal* class, the keyset won't change after
the first instance has completed its __init__.

Which of course again leads to autoslots and a
normally NULL extra_dict. And having done that,
it makes sense to combine the (normal) instance
dict with the type dict to simplify the lookup chain,
but ... that is probably too aggressive for the 3.3
schedule. One silver lining to your patch plus hash
randomization is that that 3.4 should have a
pretty free hand with regards to internal details.

pitrou · 2012-03-16T22:47:46Z

The latest patch has a significant (negative) effect on some benchmarks:

### silent_logging ###
Min: 0.057927 -> 0.068228: 1.18x slower
Avg: 0.058218 -> 0.068660: 1.18x slower
Significant (t=-36.06)

### mako ###
Min: 0.118240 -> 0.140451: 1.19x slower
Avg: 0.120145 -> 0.142422: 1.19x slower
Significant (t=-61.66)

These regressions should be fixed before going any further, IMO.

markshannon · 2012-04-02T15:21:12Z

I'm not bothered by the regression in "silent_logging",
as it is a micro benchmark with a very short running time.

The regression in "mako" is, I think, caused by competition for the
data cache between the new dict implementation and the method-cache
used by _PyType_Lookup. Although the new-dict uses less memory overall,
the size of a single split-table dict (keys + value) is larger than a combined table.

Reducing the method-cache size from 2**10 to 2**9 allows the working set to fit better in the cache.
This fixes the regression in "mako", but makes OO programs that use few objects (such as richards) a bit slower.
Compared with tip, the new-dict implementation
is 4% slower, using 7% less memory for mako. 6% slower, using 5% less memory for richards.

pitrou · 2012-04-04T11:34:05Z

I'm not bothered by the regression in "silent_logging",
as it is a micro benchmark with a very short running time.

I'm not concerned about the micro-benchmark itself but the fact that it
might hint at a wider problem.
Also, I don't get your remark about it running in a short time. Your
patch AFAICT doesn't need any warm up period to exhibit any
improvements.

Reducing the method-cache size from 2**10 to 2**9 allows the working
set to fit better in the cache.
This fixes the regression in "mako", but makes OO programs that use
few objects (such as richards) a bit slower.

I don't think we should reduce the size of the method cache. 1024 is not
a very large number, and might even be too small for complex OO
programs.

I also think that, apart from the dict storage changes, your patch
should strive not to change any other tunables. Otherwise we're really
comparing apples to oranges.

markshannon · 2012-04-04T13:46:04Z

Antoine Pitrou wrote:

Antoine Pitrou <pitrou@free.fr> added the comment:

> I'm not bothered by the regression in "silent_logging",
> as it is a micro benchmark with a very short running time.

I'm not concerned about the micro-benchmark itself but the fact that it
might hint at a wider problem.

Or it might not.
Micro-benchmarks produce micro-optimisations.
That's why I dislike them.

Also, I don't get your remark about it running in a short time. Your
patch AFAICT doesn't need any warm up period to exhibit any
improvements.

What I mean is that the runtime is so short, no one would notice any
change, so who cares?

> Reducing the method-cache size from 2**10 to 2**9 allows the working
> set to fit better in the cache.
> This fixes the regression in "mako", but makes OO programs that use
> few objects (such as richards) a bit slower.

I don't think we should reduce the size of the method cache. 1024 is not
a very large number, and might even be too small for complex OO
programs.

"not very large" or "too small", by what metric?

The optimum size (for speed) of the method cache depends on the size of
hardware data cache, the complexity of the program, and many other factors.
Attempt to reason about it are pretty much futile.
Empiricism is the only way to go.

I also think that, apart from the dict storage changes, your patch
should strive not to change any other tunables. Otherwise we're really
comparing apples to oranges.

If the implementation changes, shouldn't the tunable parameters be retuned?

Cheers,
Mark.

pitrou · 2012-04-04T14:33:04Z

> Also, I don't get your remark about it running in a short time. Your
> patch AFAICT doesn't need any warm up period to exhibit any
> improvements.

What I mean is that the runtime is so short, no one would notice any
change, so who cares?

None of the benchmarks used here are real-world workloads, so you might
as well claim that they are all irrelevant. But then we'll have a hard
time assessing the consequences of your patch.

> I don't think we should reduce the size of the method cache. 1024 is not
> a very large number, and might even be too small for complex OO
> programs.

"not very large" or "too small", by what metric?

By the metric of the number of classes and methods in a complex OO
application (for example something based on Twisted or SQLAlchemy).

> I also think that, apart from the dict storage changes, your patch
> should strive not to change any other tunables. Otherwise we're really
> comparing apples to oranges.

If the implementation changes, shouldn't the tunable parameters be retuned?

Only if there's a reasoning behind it. Perhaps the retuning would have
given the same results without the rest of your patch.

rhettinger · 2012-04-04T19:47:09Z

[Antoine]

I also think that, apart from the dict storage changes,
your patch should strive not to change any other tunables.

I agree. Please keep the patch focused on the single task, the shared keys.

markshannon · 2012-04-06T07:58:03Z

How about changing the method-cache size in a separate patch?

On my machine, reducing the method-cache size from 2**10 to 2**9 results
in a very small improvement in performance (with the original dict).

That way we don't get a performance regression with the new dict
and the patch is better focussed.

markshannon · 2012-04-11T16:40:38Z

I don't really understand your objection to changing the method-cache size. As I said, I can remove the change, but that will cause the performance regression that Antoine noticed.

rhettinger · 2012-04-13T01:49:13Z

Moving the "assigned to" to "nobody". I won't have a chance for a thorough review for another ten days. Hopefully, someone else will have a chance to review it before then.

loewis · 2012-04-19T12:36:14Z

Mark, can you please submit a contributor form?

markshannon · 2012-04-23T13:59:48Z

I've updated the repository and uploaded a new patch in response to Benjamin's review. (And the contributor form is in the post).

One remaining issue is the return value of __sizeof__().
If it is an int, then it cannot accurately reflect the memory use,
but returning a float may seem rather surprising.

python-dev · 2012-04-23T15:24:58Z

New changeset 6e5855854a2e by Benjamin Peterson in branch 'default':
Implement PEP-412: Key-sharing dictionaries (closes bpo-13903)
http://hg.python.org/cpython/rev/6e5855854a2e

benjaminp · 2012-04-23T15:28:03Z

Okay. I committed the latest patch. Subtleties like __sizeof__ can be worked out as people start using it.

YurySelivanov · 2012-04-23T15:41:32Z

Mark, did you add the test that your patch initially was failing with? http://mail.python.org/pipermail/python-dev/2012-February/116605.html

markshannon · 2012-04-23T17:34:33Z

I fixed it back then, but didn't add the test.
It subsequently regressed.
Should know better.

Patch (with test this time) attached.

python-dev · 2012-04-23T17:50:14Z

New changeset 34b6998efd2c by Benjamin Peterson in branch 'default':
fix instance dicts with str subclasses (bpo-13903)
http://hg.python.org/cpython/rev/34b6998efd2c

markshannon · 2012-04-24T13:25:22Z

Failing to maintain GC tracking in setdefault and copy (for split-tables)

Patch attached

python-dev · 2012-04-24T14:34:01Z

New changeset 507a6703d6a3 by Benjamin Peterson in branch 'default':
fix dict gc tracking (bpo-13903)
http://hg.python.org/cpython/rev/507a6703d6a3

markshannon · 2012-04-24T15:37:48Z

Failed to differentiate between failure and error in make_split_table().

Patch attached

python-dev · 2012-04-24T17:14:39Z

New changeset b044e0568be2 by Martin v. Loewis in branch 'default':
Account for shared keys in type's __sizeof__ (bpo-13903).
http://hg.python.org/cpython/rev/b044e0568be2

python-dev · 2012-04-24T18:44:25Z

New changeset 5d5b72a71898 by Benjamin Peterson in branch 'default':
distiguish between refusing to creating shared keys and error (bpo-13903)
http://hg.python.org/cpython/rev/5d5b72a71898

markshannon · 2012-04-27T17:01:37Z

Decref cached-keys when type is deallocated.

Patch attached.

python-dev · 2012-04-27T19:07:43Z

New changeset a3beae842f13 by Benjamin Peterson in branch 'default':
decref cached keys on type deallocation (bpo-13903)
http://hg.python.org/cpython/rev/a3beae842f13

pitrou · 2012-04-27T19:08:33Z

New changeset a3beae842f13 by Benjamin Peterson in branch 'default':
decref cached keys on type deallocation (bpo-13903)
http://hg.python.org/cpython/rev/a3beae842f13

Is there any way to detect / avoid leaks on this separate refcounting
scheme?

pitrou · 2012-04-27T20:00:25Z

This patch integrates the dictkeys' refcounting into the refcount checking framework. Seems to work ok, but it would be better if someone more acquainted with the code could confirm it.

markshannon · 2012-04-30T09:14:53Z

Change insertdict to follow normal (non-stealing) ref-counting behaviour which fixes possible leakage.

Patch attached.

python-dev · 2012-04-30T14:23:53Z

New changeset c5fd332e5857 by Benjamin Peterson in branch 'default':
change insertdict to not steal references (bpo-13903)
http://hg.python.org/cpython/rev/c5fd332e5857

python-dev · 2012-05-12T21:52:58Z

New changeset 10e8b97d0fd7 by Antoine Pitrou in branch 'default':
Make the reference counting of dictkeys objects participate in refleak hunting
http://hg.python.org/cpython/rev/10e8b97d0fd7

markshannon added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Jan 29, 2012

rhettinger self-assigned this Jan 31, 2012

rhettinger removed their assignment Apr 13, 2012

python-dev mannequin closed this as completed Apr 23, 2012

markshannon reopened this Apr 23, 2012

benjaminp closed this as completed May 2, 2012

ezio-melotti transferred this issue from another repository Apr 10, 2022

serhiy-storchaka mentioned this issue Oct 21, 2023

test_set_of_sets_reprs of test_pprint is silently ignored #111147

Closed

New shared-keys dictionary implementation #58111

New shared-keys dictionary implementation #58111

Comments

markshannon commented Jan 29, 2012

markshannon commented Jan 29, 2012

terryjreedy commented Jan 29, 2012

vstinner commented Jan 30, 2012

rhettinger commented Jan 30, 2012

markshannon commented Jan 30, 2012

rhettinger commented Jan 31, 2012

markshannon commented Jan 31, 2012

pitrou commented Jan 31, 2012

gpshead commented Feb 1, 2012

pitrou commented Feb 8, 2012

markshannon commented Feb 8, 2012

jimjjewett mannequin commented Feb 29, 2012

markshannon commented Feb 29, 2012

jimjjewett mannequin commented Mar 9, 2012

jimjjewett mannequin commented Mar 9, 2012

pitrou commented Mar 16, 2012

markshannon commented Apr 2, 2012

pitrou commented Apr 4, 2012

markshannon commented Apr 4, 2012

pitrou commented Apr 4, 2012

rhettinger commented Apr 4, 2012

markshannon commented Apr 6, 2012

markshannon commented Apr 11, 2012

rhettinger commented Apr 13, 2012

loewis mannequin commented Apr 19, 2012

markshannon commented Apr 23, 2012

python-dev mannequin commented Apr 23, 2012

benjaminp commented Apr 23, 2012

YurySelivanov mannequin commented Apr 23, 2012

markshannon commented Apr 23, 2012

python-dev mannequin commented Apr 23, 2012

markshannon commented Apr 24, 2012

python-dev mannequin commented Apr 24, 2012

markshannon commented Apr 24, 2012

python-dev mannequin commented Apr 24, 2012

python-dev mannequin commented Apr 24, 2012

markshannon commented Apr 27, 2012

python-dev mannequin commented Apr 27, 2012

pitrou commented Apr 27, 2012

pitrou commented Apr 27, 2012

markshannon commented Apr 30, 2012

python-dev mannequin commented Apr 30, 2012

python-dev mannequin commented May 12, 2012