Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New shared-keys dictionary implementation #58111

Closed
markshannon opened this issue Jan 29, 2012 · 43 comments
Closed

New shared-keys dictionary implementation #58111

markshannon opened this issue Jan 29, 2012 · 43 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@markshannon
Copy link
Member

BPO 13903
Nosy @loewis, @rhettinger, @terryjreedy, @gpshead, @jcea, @pitrou, @vstinner, @pjenvey, @benjaminp, @markshannon, @JimJJewett
Files
  • 691ce331f955.diff: Patch to rev. 06a6fed0da56
  • 49b7e7e4a27c.diff
  • 257e16e71654.diff: patch to revision 2a142141e5fd
  • 372d0bca85ae.diff: Patch to rev. 9fcb2676696c
  • 73423916a242.diff: Patch to rev. 73423916a242
  • str_subclass.patch
  • gc_tracking.patch: Patch
  • make_split_table_error.patch
  • cached_keys.patch
  • dkdebug.patch
  • insertdict.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-05-02.17:09:34.992>
    created_at = <Date 2012-01-29.14:26:46.386>
    labels = ['interpreter-core', 'performance']
    title = 'New shared-keys dictionary implementation'
    updated_at = <Date 2012-05-12.21:52:57.593>
    user = 'https://github.com/markshannon'

    bugs.python.org fields:

    activity = <Date 2012-05-12.21:52:57.593>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-05-02.17:09:34.992>
    closer = 'benjamin.peterson'
    components = ['Interpreter Core']
    creation = <Date 2012-01-29.14:26:46.386>
    creator = 'Mark.Shannon'
    dependencies = []
    files = ['24510', '24670', '24766', '25096', '25313', '25318', '25339', '25340', '25381', '25385', '25422']
    hgrepos = ['112']
    issue_num = 13903
    keywords = ['patch']
    message_count = 43.0
    messages = ['152229', '152253', '152293', '152345', '152351', '152366', '152373', '152380', '152419', '152863', '152885', '154637', '154645', '155246', '155248', '156096', '157365', '157466', '157477', '157482', '157500', '157649', '158051', '158192', '158723', '159018', '159028', '159031', '159032', '159049', '159055', '159139', '159143', '159161', '159172', '159188', '159478', '159484', '159485', '159489', '159683', '159698', '160496']
    nosy_count = 14.0
    nosy_names = ['loewis', 'rhettinger', 'terry.reedy', 'gregory.p.smith', 'jcea', 'pitrou', 'vstinner', 'pjenvey', 'benjamin.peterson', 'Yury.Selivanov', 'Mark.Shannon', 'python-dev', 'jcon', 'Jim.Jewett']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue13903'
    versions = ['Python 3.3']

    @markshannon
    Copy link
    Member Author

    The proposed dictionary implementation allows sharing of keys & hashes between dictionaries. This leads to substantial memory savings for object-oriented programs. For non-OO programs the impact is negligible.

    @markshannon markshannon added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Jan 29, 2012
    @terryjreedy
    Copy link
    Member

    In the initial comment, 'Dummy' to 'Deleted' here but only here:

    • Holds an active (key, value) pair. Active can transition to Dummy
      + Holds an active (key, value) pair. Active can transition to Deleted

    Im Lib/test/test_pprint.py
    def test_set_reprs(self): ...
    # Consequently, this test is fragile and ...
    + # XXX So why include this "test" in the first place?
    Raymond, I believe you added this 44927 and revised for 3.x in 45067.
    I imagine it will also be a problem with randomized hashes. Should it be removed or somehow revised?

    @vstinner
    Copy link
    Member

    I see two unrelated parts in your patch:

    • change dictionary structure in memory
    • change many constants linked to optimization: PyDICT_MAXFREELIST: 80->40, 2/3->5/8, etc.

    You may open a new issue for the second part, except if I am wrong and you need to change constants for the first part?

    (I don't understand why you changed constants and how you chose new values.)

    @rhettinger
    Copy link
    Contributor

    This would likely require a PEP before having a chance of being considered for inclusion.

    @markshannon
    Copy link
    Member Author

    Does this really need a PEP?
    There is no new language feature and no change to any API.
    It is just saving some memory.

    The only possible issue is changing dict repr() ordering,
    but bpo-13703 will do that anyway.

    @rhettinger rhettinger self-assigned this Jan 31, 2012
    @rhettinger
    Copy link
    Contributor

    Changing dictionaries is a big deal. You're changing many pieces at once (not a good idea) including changing tunable parameters that are well-studied (I spent a month testing whether 5/8 was better idea that 2/3 for resizing or when the ideal small dict size was 4, 8, or 16). You're changing the meaning of the fields in dictobject.h which will likely break any code that relied on those.

    The ideas may be good ones but they warrant a good deal of thought. Dicts weren't just slapped together -- the current code is the product to two decades of tweaking by engineers who devoted significant time to the task. It would be easy to unknowingly undo some of their work.

    @markshannon
    Copy link
    Member Author

    Raymond Hettinger wrote:

    Raymond Hettinger <raymond.hettinger@gmail.com> added the comment:

    Changing dictionaries is a big deal. You're changing many pieces at once (not a good idea) including changing tunable parameters that are well-studied (I spent a month testing whether 5/8 was better idea that 2/3 for resizing or when the ideal small dict size was 4, 8, or 16). You're changing the meaning of the fields in dictobject.h which will likely break any code that relied on those.

    The ideas may be good ones but they warrant a good deal of thought. Dicts weren't just slapped together -- the current code is the product to two decades of tweaking by engineers who devoted significant time to the task. It would be easy to unknowingly undo some of their work.

    OK.
    I'll write a PEP.

    By the way, I'm trying not changing the tunable parameters for the
    unshared-keys case, just the shared-keys case. Of course, they do
    interact with each other.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 31, 2012

    As Victor I think the tunables could be changed separately, unless they truely get in the way of the shared-keys optimization.

    By the way, there will need a "protection" for the case where instances are used as bags of key/value pairs (dict-alikes with attribute access). Perhaps bound the size of the keys array to 100 entries and then switch into unshared mode.

    @gpshead
    Copy link
    Member

    gpshead commented Feb 1, 2012

    FYI - I strongly support this type of work to reduce memory use of the Python interpreter! :)

    Also, yes, constant changing should be separate from this change but are worth occasionally re-measuring and justifying as common computer architectures have changed since the original decisions were made.

    @pitrou
    Copy link
    Member

    pitrou commented Feb 8, 2012

    Looking at your latest patch, I worry about "any deletion
    +(including pop & popitem) causes a split table to become a combined table". Why wouldn't you use a dummy pointer (such as ((PyObject *) 1)) to signal deleted slots?

    @markshannon
    Copy link
    Member Author

    Antoine Pitrou wrote:

    Antoine Pitrou <pitrou@free.fr> added the comment:

    Looking at your latest patch, I worry about "any deletion
    +(including pop & popitem) causes a split table to become a combined table". Why wouldn't you use a dummy pointer (such as ((PyObject *) 1)) to signal deleted slots?

    In fact here is no need for a dummy pointer.
    When deleting from a split-table, the value can just be set to NULL,
    the resulting key-NULL pair is legal in a split-table.
    Your suggestion doesn't make the code any more complex, so I've included it.

    In practice, it will very rare that a deletion occurs in a split table
    (since they are only used for attribute dictionaries).

    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented Feb 29, 2012

    As of Feb 28, 2012, the PEP mentions an additional optimization of storing the values in an array indexed by (effectively) key insertion order, rather than key position. ("Alternative Implementation")

    It states that this would reduce memory usage for the values array by 1/3. 1/3 is a worst-case measurement; average is 1/2. (At savings of less than 1/3, the keys would resize, to initial savings of 2/3. And yes, that means in practice, the average savings would be greater than half, because the frequency of dicts of size N decreases with N.)

    It states that the keys table would need an additional "values_size" field, but in the absence of dummies, this is just ma_used.

    Note that this would also simplify resizing, as the values arrays would not have to be re-ordered, and would not have to be modified at all unless/until that particular instance received a value for a position beyond its current size.

    @markshannon
    Copy link
    Member Author

    Jim Jewett wrote:

    Jim Jewett <jimjjewett@gmail.com> added the comment:

    As of Feb 28, 2012, the PEP mentions an additional optimization of storing the values in an array indexed by (effectively) key insertion order, rather than key position. ("Alternative Implementation")

    It states that this would reduce memory usage for the values array by 1/3. 1/3 is a worst-case measurement; average is 1/2. (At savings of less than 1/3, the keys would resize, to initial savings of 2/3. And yes, that means in practice, the average savings would be greater than half, because the frequency of dicts of size N decreases with N.)

    I presume you mean to allocate a values array of size == actual keys
    rather than size == usable keys.

    Making the values array smaller than the the number of usable keys
    causes a number of issues:

    1. The values_size will need to be kept in the dict, not in the keys,
      which would make the dict larger for size 3 or 5, the same size for
      size 2 or 4, but it would save memory for sizes 6-8 (see below).
      So it might not save memory at all, it depends on the distribution of
      the sizes of shared-key dicts.
    2. dk_usable would need to be reverted to dk_fill (c.f ma_fill) in order
      to quickly allocate values arrays. This would slow down the resizing
      check, which is now done before insertion, so might be slower.
      (not really a problem, but it might slow down inserts)
    3. An extra check needs to be performed for each read to ensure the
      index is in-bounds
      (again not really a problem, but it might slow down reads)

    Comparative sizes of values array (including size field):

    Items Proposed Alternative Other Alternative Delta
    (size == usuable) (size == keys (+1))
    1 4 3 2 -1
    2 4 3 3 0
    3 4 3 4 +1
    4 8 5 5 0
    5 8 5 6 +1
    6 16 10 7 -3
    7 16 10 8 -2
    8 16 10 9 -1
    9 16 10 10 0
    10 16 10 11 +1
    12 32 21 13 -8
    14 32 21 15 -6

    The memory savings of the two alternative implementations are broadly
    similar.

    Clearly, either of the alternatives are attractive in terms of memory
    use. I think it is definitely worth investigating further, but I would
    like to get the proposed implementation into CPython first.

    It states that the keys table would need an additional "values_size" field, but in the absence of dummies, this is just ma_used.

    values_size != ma_used
    values_size is the size of the values array, not the number of values.

    Don't forget deletions or the case where items are inserted in a
    different order from that of another dict sharing the same keys.
    Although there are no dummies, (key != NULL, value == NULL) is a legal
    pair representing a missing or deleted value.

    Cheers,
    Mark.

    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented Mar 9, 2012

    > As of Feb 28, 2012, the PEP mentions an additional
    > optimization of storing the values in an array indexed
    > by (effectively) key insertion order, rather than key
    > position. ("Alternative Implementation")

    > It states that this would reduce memory usage for the
    > values array by 1/3. 1/3 is a worst-case measurement;
    > average is 1/2. (At savings of less than 1/3, the keys
    > would resize, to initial savings of 2/3. And yes, that
    > means in practice, the average savings would be greater
    > than half, because the frequency of dicts of size N
    > decreases with N.)

    I presume you mean to allocate a values array of
    size == actual keys rather than size == usable keys.

    Actually, I meant size==maximum(keys in use for this dict),
    so that for objects with a potentially complex lifecycle,
    those instances that had not yet needed the new attributes
    would not need to allocate space for them.

    But I see now that just allocating space for each of the
    potential keys is indeed simpler. And I suppose that a
    class which won't eventually need all the attributes for
    every instance is an unusual case.

    So to get beneath 2/3 without lots of reallocation
    probably requires knowing when the key set is likely
    to be complete, and that is indeed more complex than
    the current changes. (That said, you have left code
    in for immutable keys, so there may be cases where
    this isn't so hard after all.)

    Making the values array smaller than the the number
    of usable keys causes a number of issues:

    1. The values_size will need to be kept in the dict,
      not in the keys,

    That was indeed true for my proposal. If you just allocate
    2/3, then you don't need to store the value, unless you
    want to be lazy about reallocating when the keys grow.
    Even then, there are few enough potential sizes to fit
    the information in a byte, or we could get the info for
    free *if* the dict were already timestamped or versioned.

    > It states that the keys table would need an additional
    > "values_size" field, but in the absence of dummies, this
    > is just ma_used.

    I was mixing two structures in my mind. The current key
    count (which is sufficient for a new values array) is
    actually USABLE_FRACTION(dk_size) - dk_free.

    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented Mar 9, 2012

    On Fri, Mar 9, 2012 at 12:13 PM, Jim Jewett

    So to get beneath 2/3 without lots of reallocation
    probably requires knowing when the key set is likely
    to be complete, and that is indeed more complex than
    the current changes.  (That said, you have left code
    in for immutable keys, so there may be cases where
    this isn't so hard after all.)

    On second thought, avoiding reallocation doesn't have
    to be perfect -- just good enough to work on average.

    For a *normal* class, the keyset won't change after
    the first instance has completed its __init__.

    Which of course again leads to autoslots and a
    normally NULL extra_dict. And having done that,
    it makes sense to combine the (normal) instance
    dict with the type dict to simplify the lookup chain,
    but ... that is probably too aggressive for the 3.3
    schedule. One silver lining to your patch plus hash
    randomization is that that 3.4 should have a
    pretty free hand with regards to internal details.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 16, 2012

    The latest patch has a significant (negative) effect on some benchmarks:

    ### silent_logging ###
    Min: 0.057927 -> 0.068228: 1.18x slower
    Avg: 0.058218 -> 0.068660: 1.18x slower
    Significant (t=-36.06)

    ### mako ###
    Min: 0.118240 -> 0.140451: 1.19x slower
    Avg: 0.120145 -> 0.142422: 1.19x slower
    Significant (t=-61.66)

    These regressions should be fixed before going any further, IMO.

    @markshannon
    Copy link
    Member Author

    I'm not bothered by the regression in "silent_logging",
    as it is a micro benchmark with a very short running time.

    The regression in "mako" is, I think, caused by competition for the
    data cache between the new dict implementation and the method-cache
    used by _PyType_Lookup. Although the new-dict uses less memory overall,
    the size of a single split-table dict (keys + value) is larger than a combined table.

    Reducing the method-cache size from 2**10 to 2**9 allows the working set to fit better in the cache.
    This fixes the regression in "mako", but makes OO programs that use few objects (such as richards) a bit slower.
    Compared with tip, the new-dict implementation
    is 4% slower, using 7% less memory for mako. 6% slower, using 5% less memory for richards.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 4, 2012

    I'm not bothered by the regression in "silent_logging",
    as it is a micro benchmark with a very short running time.

    I'm not concerned about the micro-benchmark itself but the fact that it
    might hint at a wider problem.
    Also, I don't get your remark about it running in a short time. Your
    patch AFAICT doesn't need any warm up period to exhibit any
    improvements.

    Reducing the method-cache size from 2**10 to 2**9 allows the working
    set to fit better in the cache.
    This fixes the regression in "mako", but makes OO programs that use
    few objects (such as richards) a bit slower.

    I don't think we should reduce the size of the method cache. 1024 is not
    a very large number, and might even be too small for complex OO
    programs.

    I also think that, apart from the dict storage changes, your patch
    should strive not to change any other tunables. Otherwise we're really
    comparing apples to oranges.

    @markshannon
    Copy link
    Member Author

    Antoine Pitrou wrote:

    Antoine Pitrou <pitrou@free.fr> added the comment:

    > I'm not bothered by the regression in "silent_logging",
    > as it is a micro benchmark with a very short running time.

    I'm not concerned about the micro-benchmark itself but the fact that it
    might hint at a wider problem.

    Or it might not.
    Micro-benchmarks produce micro-optimisations.
    That's why I dislike them.

    Also, I don't get your remark about it running in a short time. Your
    patch AFAICT doesn't need any warm up period to exhibit any
    improvements.

    What I mean is that the runtime is so short, no one would notice any
    change, so who cares?

    > Reducing the method-cache size from 2**10 to 2**9 allows the working
    > set to fit better in the cache.
    > This fixes the regression in "mako", but makes OO programs that use
    > few objects (such as richards) a bit slower.

    I don't think we should reduce the size of the method cache. 1024 is not
    a very large number, and might even be too small for complex OO
    programs.

    "not very large" or "too small", by what metric?

    The optimum size (for speed) of the method cache depends on the size of
    hardware data cache, the complexity of the program, and many other factors.
    Attempt to reason about it are pretty much futile.
    Empiricism is the only way to go.

    I also think that, apart from the dict storage changes, your patch
    should strive not to change any other tunables. Otherwise we're really
    comparing apples to oranges.

    If the implementation changes, shouldn't the tunable parameters be retuned?

    Cheers,
    Mark.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 4, 2012

    > Also, I don't get your remark about it running in a short time. Your
    > patch AFAICT doesn't need any warm up period to exhibit any
    > improvements.

    What I mean is that the runtime is so short, no one would notice any
    change, so who cares?

    None of the benchmarks used here are real-world workloads, so you might
    as well claim that they are all irrelevant. But then we'll have a hard
    time assessing the consequences of your patch.

    > I don't think we should reduce the size of the method cache. 1024 is not
    > a very large number, and might even be too small for complex OO
    > programs.

    "not very large" or "too small", by what metric?

    By the metric of the number of classes and methods in a complex OO
    application (for example something based on Twisted or SQLAlchemy).

    > I also think that, apart from the dict storage changes, your patch
    > should strive not to change any other tunables. Otherwise we're really
    > comparing apples to oranges.

    If the implementation changes, shouldn't the tunable parameters be retuned?

    Only if there's a reasoning behind it. Perhaps the retuning would have
    given the same results without the rest of your patch.

    @rhettinger
    Copy link
    Contributor

    [Antoine]

    I also think that, apart from the dict storage changes,
    your patch should strive not to change any other tunables.

    I agree. Please keep the patch focused on the single task, the shared keys.

    @markshannon
    Copy link
    Member Author

    How about changing the method-cache size in a separate patch?

    On my machine, reducing the method-cache size from 2**10 to 2**9 results
    in a very small improvement in performance (with the original dict).

    That way we don't get a performance regression with the new dict
    and the patch is better focussed.

    @markshannon
    Copy link
    Member Author

    I don't really understand your objection to changing the method-cache size. As I said, I can remove the change, but that will cause the performance regression that Antoine noticed.

    @rhettinger
    Copy link
    Contributor

    Moving the "assigned to" to "nobody". I won't have a chance for a thorough review for another ten days. Hopefully, someone else will have a chance to review it before then.

    @rhettinger rhettinger removed their assignment Apr 13, 2012
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Apr 19, 2012

    Mark, can you please submit a contributor form?

    @markshannon
    Copy link
    Member Author

    I've updated the repository and uploaded a new patch in response to Benjamin's review. (And the contributor form is in the post).

    One remaining issue is the return value of __sizeof__().
    If it is an int, then it cannot accurately reflect the memory use,
    but returning a float may seem rather surprising.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 23, 2012

    New changeset 6e5855854a2e by Benjamin Peterson in branch 'default':
    Implement PEP-412: Key-sharing dictionaries (closes bpo-13903)
    http://hg.python.org/cpython/rev/6e5855854a2e

    @python-dev python-dev mannequin closed this as completed Apr 23, 2012
    @benjaminp
    Copy link
    Contributor

    Okay. I committed the latest patch. Subtleties like __sizeof__ can be worked out as people start using it.

    @YurySelivanov
    Copy link
    Mannequin

    YurySelivanov mannequin commented Apr 23, 2012

    Mark, did you add the test that your patch initially was failing with? http://mail.python.org/pipermail/python-dev/2012-February/116605.html

    @markshannon
    Copy link
    Member Author

    I fixed it back then, but didn't add the test.
    It subsequently regressed.
    Should know better.

    Patch (with test this time) attached.

    @markshannon markshannon reopened this Apr 23, 2012
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 23, 2012

    New changeset 34b6998efd2c by Benjamin Peterson in branch 'default':
    fix instance dicts with str subclasses (bpo-13903)
    http://hg.python.org/cpython/rev/34b6998efd2c

    @markshannon
    Copy link
    Member Author

    Failing to maintain GC tracking in setdefault and copy (for split-tables)

    Patch attached

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 24, 2012

    New changeset 507a6703d6a3 by Benjamin Peterson in branch 'default':
    fix dict gc tracking (bpo-13903)
    http://hg.python.org/cpython/rev/507a6703d6a3

    @markshannon
    Copy link
    Member Author

    Failed to differentiate between failure and error in make_split_table().

    Patch attached

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 24, 2012

    New changeset b044e0568be2 by Martin v. Loewis in branch 'default':
    Account for shared keys in type's __sizeof__ (bpo-13903).
    http://hg.python.org/cpython/rev/b044e0568be2

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 24, 2012

    New changeset 5d5b72a71898 by Benjamin Peterson in branch 'default':
    distiguish between refusing to creating shared keys and error (bpo-13903)
    http://hg.python.org/cpython/rev/5d5b72a71898

    @markshannon
    Copy link
    Member Author

    Decref cached-keys when type is deallocated.

    Patch attached.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 27, 2012

    New changeset a3beae842f13 by Benjamin Peterson in branch 'default':
    decref cached keys on type deallocation (bpo-13903)
    http://hg.python.org/cpython/rev/a3beae842f13

    @pitrou
    Copy link
    Member

    pitrou commented Apr 27, 2012

    New changeset a3beae842f13 by Benjamin Peterson in branch 'default':
    decref cached keys on type deallocation (bpo-13903)
    http://hg.python.org/cpython/rev/a3beae842f13

    Is there any way to detect / avoid leaks on this separate refcounting
    scheme?

    @pitrou
    Copy link
    Member

    pitrou commented Apr 27, 2012

    This patch integrates the dictkeys' refcounting into the refcount checking framework. Seems to work ok, but it would be better if someone more acquainted with the code could confirm it.

    @markshannon
    Copy link
    Member Author

    Change insertdict to follow normal (non-stealing) ref-counting behaviour which fixes possible leakage.

    Patch attached.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 30, 2012

    New changeset c5fd332e5857 by Benjamin Peterson in branch 'default':
    change insertdict to not steal references (bpo-13903)
    http://hg.python.org/cpython/rev/c5fd332e5857

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 12, 2012

    New changeset 10e8b97d0fd7 by Antoine Pitrou in branch 'default':
    Make the reference counting of dictkeys objects participate in refleak hunting
    http://hg.python.org/cpython/rev/10e8b97d0fd7

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants