Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symmetric difference on dict_views is inefficient #85066

Closed
rhettinger opened this issue Jun 6, 2020 · 11 comments
Closed

Symmetric difference on dict_views is inefficient #85066

rhettinger opened this issue Jun 6, 2020 · 11 comments
Labels
3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@rhettinger
Copy link
Contributor

BPO 40889
Nosy @rhettinger, @serhiy-storchaka, @sweeneyde
PRs
  • bpo-40889: special-case dict.items() ^ dict.items() for performance #20718
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-06-10.05:57:30.023>
    created_at = <Date 2020-06-06.17:09:59.550>
    labels = ['interpreter-core', '3.10', 'performance']
    title = 'Symmetric difference on dict_views is inefficient'
    updated_at = <Date 2020-06-10.06:50:59.313>
    user = 'https://github.com/rhettinger'

    bugs.python.org fields:

    activity = <Date 2020-06-10.06:50:59.313>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-06-10.05:57:30.023>
    closer = 'methane'
    components = ['Interpreter Core']
    creation = <Date 2020-06-06.17:09:59.550>
    creator = 'rhettinger'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 40889
    keywords = ['patch']
    message_count = 11.0
    messages = ['370839', '370933', '370954', '370956', '370957', '370976', '371048', '371082', '371139', '371161', '371167']
    nosy_count = 3.0
    nosy_names = ['rhettinger', 'serhiy.storchaka', 'Dennis Sweeney']
    pr_nums = ['20718']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue40889'
    versions = ['Python 3.10']

    @rhettinger
    Copy link
    Contributor Author

    Running "d1.items() ^ d2.items()" will rehash every key and value in both dictionaries regardless of how much they overlap.

    By taking advantage of the known hashes, the analysis step could avoid making any calls to __hash__(). Only the result tuples would need to hashed.

    Currently the code below calls hash for every key and value on the left and for every key and value on the right:

      >>> left = {1: -1, 2: -2, 3: -3, 4: -4, 5: -5, 6: -6, 7: -7}
      >>> right = {1: -1, 2: -2, 3: -3, 4: -4, 5: -5, 8: -8, 9: -9}
      >>> left.items() ^ right.items()        # Total work: 28 __hash__() calls
      {(6, -6), (7, -7), (8, -8), (9, -9)}

    Compare that with the workload for set symmetric difference which makes zero calls to __hash__():

      >>> set(left) ^ set(right)
      {6, 7, 8, 9}

    FWIW, I do have an important use case where this matters.

    @rhettinger rhettinger added 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Jun 6, 2020
    @serhiy-storchaka
    Copy link
    Member

    But hashes of items are not known. Hashes of keys are known, hashes of values and items are not. We can add a special case for dict views in the set constructor and inline the hashing code for tuples, but it will be a lot of code for pretty rare case. And since hashes of values should be calculated in any case, and the significant time will be spent on allocating 2-tuples, the benefit of such optimization will be minor.

    @rhettinger
    Copy link
    Contributor Author

    It really depends on whether the key hashes are cheap or not.

    @sweeneyde
    Copy link
    Member

    What about returning another dict_items instead of a set? As in (using the convention d.items().mapping is d):

        dict_items = type({}.items())
    
        def __xor__(self: dict_items, other):
            if isinstance(other, dict_items):
                new = self.mapping.copy()
                MISSING = object()
                for key, value in other:
                    existing = new.get(key, MISSING)
                    if existing is MISSING:
                        # (key, value) in (other - self)
                        new[key] = value
                    elif existing == value:
                        # (key, value) in (self & other)
                        del new[key]
                    else:
                        # (key, value) in (self - other)
                        new[key] = value
                return new.items()
            else:
                return set(self) ^ set(other)

    I believe this wouldn't require any re-hashing. It would also allow for non-hashable values.

    @rhettinger
    Copy link
    Contributor Author

    What about returning another dict_items instead of a set?

    That API ship has already sailed. The published API guarantees that a set is returned — there is no changing that.

    The question is whether we care enough to provide an optimized implementation that doesn't call __hash__() when the hash value has already been computed and stored.

    @sweeneyde
    Copy link
    Member

    PR 20718 helps somewhat by only creating and hashing the tuples that wind up in the final set. Here's a benchmark:

    -m pyperf timeit -s "d1 = {i:i for i in range(100_000)}; d2 = {i:i|1 for i in range(100_000)}" "d1.items() ^ d2.items()"

      Master: 37.9 ms +- 0.4 ms
    [PR 20718](https://github.com/python/cpython/pull/20718): 25.5 ms +- 0.4 ms
    

    @sweeneyde
    Copy link
    Member

    A demo:

    >>> class Int(int):
    ...     hash_calls = 0
    ...     def __hash__(self):
    ...         Int.hash_calls += 1
    ...         return super().__hash__()
    ...
    >>> left = {Int(1): -1, Int(2): -2, Int(3): -3, Int(4): -4, Int(5): -5, Int(6): -6, Int(7): -7}
    >>> right = {Int(1): -1, Int(2): -2, Int(3): -3, Int(4): -4, Int(5): -5, Int(8): -8, Int(9): -9}
    >>> Int.hash_calls = 0
    >>> left.items() ^ right.items()
    {(9, -9), (7, -7), (8, -8), (6, -6)}
    >>> Int.hash_calls
    <Result: 14 on Master, but only 4 on PR 20718>

    It looks like the same trick (searching by key and comparing values before maybe constructing a 2-tuple) might give similar performance improvements for dict_items.__or__, dict_items.__and__, and dict_items.__sub__. Is it worth discussing these other operators in this issue?

    @serhiy-storchaka
    Copy link
    Member

    So it needs to add 100 lines of C code to speed up a pretty uncommon operation for arguments of a particular type.

    @rhettinger
    Copy link
    Contributor Author

    To my eyes, the patch looks somewhat tame. It is readable enough and doesn't do anything tricky.

    The set object implementation aimed to never recompute the hash when it was already known. It is reasonable that other set-like operations share that aspiration.

    @methane
    Copy link
    Member

    methane commented Jun 10, 2020

    New changeset 07d8112 by Dennis Sweeney in branch 'master':
    bpo-40889: Optimize dict.items() ^ dict.items() (GH-20718)
    07d8112

    @serhiy-storchaka
    Copy link
    Member

    I don't like a tendency of optimizing very uncommon cases. We can optimize every operation for specific types by inlining the code. But it increases maintaining cost and can have negative net effect on performance: because increasing the code size and adding additional runtime checks. In past we usually rejected such kind of patches.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants