New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New shared-keys dictionary implementation #58111
Comments
The proposed dictionary implementation allows sharing of keys & hashes between dictionaries. This leads to substantial memory savings for object-oriented programs. For non-OO programs the impact is negligible. |
In the initial comment, 'Dummy' to 'Deleted' here but only here:
Im Lib/test/test_pprint.py |
I see two unrelated parts in your patch:
You may open a new issue for the second part, except if I am wrong and you need to change constants for the first part? (I don't understand why you changed constants and how you chose new values.) |
This would likely require a PEP before having a chance of being considered for inclusion. |
Does this really need a PEP? The only possible issue is changing dict repr() ordering, |
Changing dictionaries is a big deal. You're changing many pieces at once (not a good idea) including changing tunable parameters that are well-studied (I spent a month testing whether 5/8 was better idea that 2/3 for resizing or when the ideal small dict size was 4, 8, or 16). You're changing the meaning of the fields in dictobject.h which will likely break any code that relied on those. The ideas may be good ones but they warrant a good deal of thought. Dicts weren't just slapped together -- the current code is the product to two decades of tweaking by engineers who devoted significant time to the task. It would be easy to unknowingly undo some of their work. |
Raymond Hettinger wrote:
OK. By the way, I'm trying not changing the tunable parameters for the |
As Victor I think the tunables could be changed separately, unless they truely get in the way of the shared-keys optimization. By the way, there will need a "protection" for the case where instances are used as bags of key/value pairs (dict-alikes with attribute access). Perhaps bound the size of the keys array to 100 entries and then switch into unshared mode. |
FYI - I strongly support this type of work to reduce memory use of the Python interpreter! :) Also, yes, constant changing should be separate from this change but are worth occasionally re-measuring and justifying as common computer architectures have changed since the original decisions were made. |
Looking at your latest patch, I worry about "any deletion |
Antoine Pitrou wrote:
In fact here is no need for a dummy pointer. In practice, it will very rare that a deletion occurs in a split table |
As of Feb 28, 2012, the PEP mentions an additional optimization of storing the values in an array indexed by (effectively) key insertion order, rather than key position. ("Alternative Implementation") It states that this would reduce memory usage for the values array by 1/3. 1/3 is a worst-case measurement; average is 1/2. (At savings of less than 1/3, the keys would resize, to initial savings of 2/3. And yes, that means in practice, the average savings would be greater than half, because the frequency of dicts of size N decreases with N.) It states that the keys table would need an additional "values_size" field, but in the absence of dummies, this is just ma_used. Note that this would also simplify resizing, as the values arrays would not have to be re-ordered, and would not have to be modified at all unless/until that particular instance received a value for a position beyond its current size. |
Jim Jewett wrote:
I presume you mean to allocate a values array of size == actual keys Making the values array smaller than the the number of usable keys
Comparative sizes of values array (including size field): Items Proposed Alternative Other Alternative Delta The memory savings of the two alternative implementations are broadly Clearly, either of the alternatives are attractive in terms of memory
values_size != ma_used Don't forget deletions or the case where items are inserted in a Cheers, |
Actually, I meant size==maximum(keys in use for this dict), But I see now that just allocating space for each of the So to get beneath 2/3 without lots of reallocation
That was indeed true for my proposal. If you just allocate
I was mixing two structures in my mind. The current key |
On Fri, Mar 9, 2012 at 12:13 PM, Jim Jewett
On second thought, avoiding reallocation doesn't have For a *normal* class, the keyset won't change after Which of course again leads to autoslots and a |
The latest patch has a significant (negative) effect on some benchmarks: ### silent_logging ### ### mako ### These regressions should be fixed before going any further, IMO. |
I'm not bothered by the regression in "silent_logging", The regression in "mako" is, I think, caused by competition for the Reducing the method-cache size from 2**10 to 2**9 allows the working set to fit better in the cache. |
I'm not concerned about the micro-benchmark itself but the fact that it
I don't think we should reduce the size of the method cache. 1024 is not I also think that, apart from the dict storage changes, your patch |
Antoine Pitrou wrote:
Or it might not.
What I mean is that the runtime is so short, no one would notice any
"not very large" or "too small", by what metric? The optimum size (for speed) of the method cache depends on the size of
If the implementation changes, shouldn't the tunable parameters be retuned? Cheers, |
None of the benchmarks used here are real-world workloads, so you might
By the metric of the number of classes and methods in a complex OO
Only if there's a reasoning behind it. Perhaps the retuning would have |
[Antoine]
I agree. Please keep the patch focused on the single task, the shared keys. |
How about changing the method-cache size in a separate patch? On my machine, reducing the method-cache size from 2**10 to 2**9 results That way we don't get a performance regression with the new dict |
I don't really understand your objection to changing the method-cache size. As I said, I can remove the change, but that will cause the performance regression that Antoine noticed. |
Moving the "assigned to" to "nobody". I won't have a chance for a thorough review for another ten days. Hopefully, someone else will have a chance to review it before then. |
Mark, can you please submit a contributor form? |
I've updated the repository and uploaded a new patch in response to Benjamin's review. (And the contributor form is in the post). One remaining issue is the return value of __sizeof__(). |
New changeset 6e5855854a2e by Benjamin Peterson in branch 'default': |
Okay. I committed the latest patch. Subtleties like __sizeof__ can be worked out as people start using it. |
Mark, did you add the test that your patch initially was failing with? http://mail.python.org/pipermail/python-dev/2012-February/116605.html |
I fixed it back then, but didn't add the test. Patch (with test this time) attached. |
New changeset 34b6998efd2c by Benjamin Peterson in branch 'default': |
Failing to maintain GC tracking in setdefault and copy (for split-tables) Patch attached |
New changeset 507a6703d6a3 by Benjamin Peterson in branch 'default': |
Failed to differentiate between failure and error in make_split_table(). Patch attached |
New changeset b044e0568be2 by Martin v. Loewis in branch 'default': |
New changeset 5d5b72a71898 by Benjamin Peterson in branch 'default': |
Decref cached-keys when type is deallocated. Patch attached. |
New changeset a3beae842f13 by Benjamin Peterson in branch 'default': |
Is there any way to detect / avoid leaks on this separate refcounting |
This patch integrates the dictkeys' refcounting into the refcount checking framework. Seems to work ok, but it would be better if someone more acquainted with the code could confirm it. |
Change insertdict to follow normal (non-stealing) ref-counting behaviour which fixes possible leakage. Patch attached. |
New changeset c5fd332e5857 by Benjamin Peterson in branch 'default': |
New changeset 10e8b97d0fd7 by Antoine Pitrou in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: