Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop set.difference when set is empty #72258

Closed
terryjreedy opened this issue Sep 10, 2016 · 8 comments
Closed

Stop set.difference when set is empty #72258

terryjreedy opened this issue Sep 10, 2016 · 8 comments
Assignees
Labels
3.7 (EOL) end of life performance Performance or resource usage

Comments

@terryjreedy
Copy link
Member

BPO 28071
Nosy @rhettinger, @terryjreedy, @serhiy-storchaka, @tirkarthi
Files
  • set_diff_early_out.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2016-09-12.05:43:41.329>
    created_at = <Date 2016-09-10.22:22:04.587>
    labels = ['3.7', 'performance']
    title = 'Stop set.difference when set is empty'
    updated_at = <Date 2019-06-11.05:50:18.981>
    user = 'https://github.com/terryjreedy'

    bugs.python.org fields:

    activity = <Date 2019-06-11.05:50:18.981>
    actor = 'xtreak'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2016-09-12.05:43:41.329>
    closer = 'rhettinger'
    components = []
    creation = <Date 2016-09-10.22:22:04.587>
    creator = 'terry.reedy'
    dependencies = []
    files = ['44565']
    hgrepos = []
    issue_num = 28071
    keywords = ['patch']
    message_count = 8.0
    messages = ['275707', '275835', '275841', '275962', '275965', '275967', '306973', '345185']
    nosy_count = 5.0
    nosy_names = ['rhettinger', 'terry.reedy', 'python-dev', 'serhiy.storchaka', 'xtreak']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue28071'
    versions = ['Python 3.6', 'Python 3.7']

    @terryjreedy
    Copy link
    Member Author

    Proposal from SO: in the iteration loop for 'myset.difference(iterable)', add equivalent of "if not myset: break'.
    https://stackoverflow.com/questions/39378043/why-does-pythons-set-difference-method-take-time-with-an-empty-set

    In the toy example for testing that this is currently not so, myset starts empty, but it was noted by the OP that a more realistic example would be that myset becomes empty after deletions.

    Postulated reasons why not to do this (as opposed to why it has not been done yet ;-):

    1. averaged over all uses, the time saved not iterating would be less than the time spent testing emptyness.

    2. an implicit guarantee to complete the iteration for possible side-effects.

    One answer notes that myset.difference(anotherset) is special-cased and faster than the equivalent non-set iterable.

    I searched the tracker for 'empty set difference' and got no hits. If I remember, I will post any disposition of this issue back to SO.

    @terryjreedy terryjreedy added 3.7 (EOL) end of life performance Performance or resource usage labels Sep 10, 2016
    @rhettinger rhettinger self-assigned this Sep 11, 2016
    @rhettinger
    Copy link
    Contributor

    This is reasonable, cheap, and not hard to do. I'll whip-up a patch shortly.

    @rhettinger
    Copy link
    Contributor

    New timings look nice:

         $ py -m timeit -s "r = range(10 ** 4); s = set()" "s.difference(r)"
        10000000 loops, best of 3: 0.104 usec per loop
         $ py -m timeit -s "r = set(range(10 ** 4)); s = set()" "s.difference(r)"
        10000000 loops, best of 3: 0.105 usec per loop
         $ py -m timeit -s "r = range(10 ** 4); s = set()" "s.difference_update(r)"
        10000000 loops, best of 3: 0.0659 usec per loop
         $ py -m timeit -s "r = set(range(10 ** 4)); s = set()" "s.difference_update(r)"
        10000000 loops, best of 3: 0.0684 usec per loop

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 12, 2016

    New changeset fa0af1a6344d by Raymond Hettinger in branch 'default':
    Issue bpo-28071: Add early-out for differencing from an empty set.
    https://hg.python.org/cpython/rev/fa0af1a6344d

    @terryjreedy
    Copy link
    Member Author

    The patch covers sets that start empty, but not sets that become empty during iteration. Are you thinking about or rejecting the latter? (And should this then be closed?)

    @terryjreedy terryjreedy reopened this Sep 12, 2016
    @rhettinger
    Copy link
    Contributor

    I only want a single quick check upfront. It doesn't make much sense to put this in the inner loop.

    @serhiy-storchaka
    Copy link
    Member

    This optimization caused a behavior change:

    Python 3.5:

    >>> set().difference([[]])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unhashable type: 'list'

    Python 3.6:

    >>> set().difference([[]])
    set()

    @tirkarthi
    Copy link
    Member

    Behavior change in msg306973 is tracked with bpo-37219

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants