Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a -= b should be fast if a is a small set and b is a large set #52672

Closed
abacabadabacaba mannequin opened this issue Apr 16, 2010 · 27 comments
Closed

a -= b should be fast if a is a small set and b is a large set #52672

abacabadabacaba mannequin opened this issue Apr 16, 2010 · 27 comments
Assignees
Labels
3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@abacabadabacaba
Copy link
Mannequin

abacabadabacaba mannequin commented Apr 16, 2010

BPO 8425
Nosy @rhettinger, @mdickinson, @abalkin, @ericvsmith, @ezio-melotti, @bitdancer, @mmaker, @serhiy-storchaka, @csabella
PRs
  • bpo-8425: Set inplace difference fash path for large complement sets #15590
  • Files
  • issue8425.diff
  • issue8425a.diff
  • issue8425-with-tests.diff: Patch against py3k revision 81048
  • issue8425.patch
  • issue8425.1.patch
  • issue8425.2.patch
  • issue8425.3.patch
  • bench.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2019-08-29.16:03:37.386>
    created_at = <Date 2010-04-16.17:19:41.106>
    labels = ['interpreter-core', '3.8', 'performance']
    title = 'a -= b should be fast if a is a small set and b is a large set'
    updated_at = <Date 2019-08-29.16:03:37.385>
    user = 'https://bugs.python.org/abacabadabacaba'

    bugs.python.org fields:

    activity = <Date 2019-08-29.16:03:37.385>
    actor = 'rhettinger'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2019-08-29.16:03:37.386>
    closer = 'rhettinger'
    components = ['Interpreter Core']
    creation = <Date 2010-04-16.17:19:41.106>
    creator = 'abacabadabacaba'
    dependencies = []
    files = ['17282', '17283', '17284', '27265', '27272', '27292', '27303', '27306']
    hgrepos = []
    issue_num = 8425
    keywords = ['patch']
    message_count = 27.0
    messages = ['103343', '103552', '105450', '105451', '105452', '105455', '105492', '122954', '122957', '171049', '171106', '171117', '171153', '171159', '171160', '171162', '171164', '171232', '171306', '171307', '171312', '171322', '171375', '193784', '193807', '342713', '350795']
    nosy_count = 11.0
    nosy_names = ['rhettinger', 'mark.dickinson', 'belopolsky', 'eric.smith', 'stutzbach', 'ezio.melotti', 'r.david.murray', 'abacabadabacaba', 'maker', 'serhiy.storchaka', 'cheryl.sabella']
    pr_nums = ['15590']
    priority = 'low'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue8425'
    versions = ['Python 3.8']

    @abacabadabacaba
    Copy link
    Mannequin Author

    abacabadabacaba mannequin commented Apr 16, 2010

    >> small_set = set(range(2000))
    >> large_set = set(range(20000000))
    >> large_set -= small_set # Fast
    >> small_set -= large_set # Slow, should be fast
    >> small_set = small_set - large_set # Fast

    @abacabadabacaba abacabadabacaba mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Apr 16, 2010
    @rhettinger rhettinger self-assigned this Apr 16, 2010
    @ezio-melotti
    Copy link
    Member

    Raymond, have you forgotten to attach the patch or have you just set the stage to 'patch review' by mistake?

    @abalkin
    Copy link
    Member

    abalkin commented May 10, 2010

    The problem is apparently due to the fact that small_set -= large_set
    iterates over the large set removing entries from small_set while more
    efficient small_set - large_set builds a new set iterating over a
    small_set and adding items that are not in the large set. Since
    lookups are O(1), it is more efficient to iterate over a small set
    while looking up a large set than the other way around. I am
    attaching a minimalist patch which gives up in-place update for s1 -=
    s2 if len(s1) < len(s2) and effectively replaces it with s1 = s1 - s2.
    The code can be further optimized by replicating set_difference logic
    in set_isub instead of relying on fall back behavior, but the big
    question here with whether it is acceptable to give up preserving set
    identity in in-place subtract to gain performance.

    @bitdancer
    Copy link
    Member

    The answer is almost certainly "no".

    @abalkin
    Copy link
    Member

    abalkin commented May 10, 2010

    On the second thought, it is possible to preserve set identity and avoid iteration over a large set. See issue8425a.diff attached.

    It looks like the unit tests don't check identity preservation. I will submit additional tests shortly.

    @abalkin
    Copy link
    Member

    abalkin commented May 10, 2010

    Note that issue8425a.diff leaves small_set.difference_update(large_dict) unoptimized:

    $ ./python.exe -m timeit -s "s = {1}; l = dict.fromkeys(range(1000));" "s.difference_update(l)"
    1000 loops, best of 3: 842 usec per loop
    $ ./python.exe -m timeit -s "s = {1}; l = dict.fromkeys(range(1000));" "s.difference(l)"
    100000 loops, best of 3: 13.2 usec per loop

    It would be easy to add an extra type check, but I noticed that small_set.difference_update(large_dict.viewkeys()) is not optimized either:

    $ ./python.exe -m timeit -s "s = {1}; l = dict.fromkeys(range(1000));" "s.difference(l.viewkeys())"
    1000 loops, best of 3: 842 usec per loop

    It would be nice if there was a uniform C-API that would allow to uniformly check that an object supports O(1) lookup and invoke such lookup efficiently. I don't think such C-API exists at the moment.

    I would like to hear what others have to say before adding optimizations to the patch that go beyond the scope of this issue.

    @mdickinson
    Copy link
    Member

    See also bpo-8685.

    @rhettinger
    Copy link
    Contributor

    Deferring to 3.3.

    @rhettinger
    Copy link
    Contributor

    Any new logic should make maximum use of existing tools:

    def __isub__(self, other)
        if len(other) > len(self)*8:
            other = self & other
        . . .
        # rest of isub unchanged

    @mmaker
    Copy link
    Mannequin

    mmaker mannequin commented Sep 23, 2012

    Something like this would also need unit tests?

    $ ./python.exe -m timeit -s "s= set(range(2000)); l = set(range(20000000)); s=s-l"
    10000000 loops, best of 3: 0.0622 usec per loop
    [48787 refs]
    $ ./python.exe -m timeit -s "s= set(range(2000)); l = set(range(20000000)); s-=l"
    10000000 loops, best of 3: 0.0591 usec per loop
    [48787 refs]

    @mmaker
    Copy link
    Mannequin

    mmaker mannequin commented Sep 24, 2012

    Reviewed with Ezio.

    @serhiy-storchaka
    Copy link
    Member

    $ ./python.exe -m timeit -s "s= set(range(2000)); l = set(range(20000000)); s=s-l"

    You are measure empty loop.

    @mmaker
    Copy link
    Mannequin

    mmaker mannequin commented Sep 24, 2012

    woops, sry. Re-posting the benchmark, with three tests: the first one proposed (@abacabadabacaba) with very large sets; another one with smaller sets.

    $ ./python.exe -m timeit  -n 100 -s "s= set(range(2000)); l = set(range(20000000))"   "s-=l"
    100 loops, best of 3: 9.71 usec per loop
    [48787 refs]
    
    $ ./python.exe -m timeit  -n 100 -s "s= set(range(1)); l = set(range(20))"   "s-=l"
    100 loops, best of 3: 0.366 usec per loop
    
    
    $ hg co -C
    $ make -j3

    ----[!PATCHED]--------------------------------------------------

    $ ./python.exe -m timeit  -n 100 -s "s= set(range(2000)); l = set(range(20000000))"   "s-=l"
    100 loops, best of 3: 665 msec per loop
    [48787 refs]
    
    $ ./python.exe -m timeit  -n 100 -s "s= set(range(1)); l = set(range(20))"   "s-=l"
    100 loops, best of 3: 0.849 usec per loop
    [48787 refs]

    @serhiy-storchaka
    Copy link
    Member

    $ ./python.exe -m timeit -n 100 -s "s= set(range(2000)); l = set(range(20000000))" "s-=l"

    s is empty set after first loop. Try this:

    $ ./python.exe -m timeit  -n 100 -s "s= set(range(2000)); l = set(range(2000,2000+20000000))"   "s-=l"

    @serhiy-storchaka
    Copy link
    Member

    Michele, in any case you patch is not preserve set identity.

    @mmaker
    Copy link
    Mannequin

    mmaker mannequin commented Sep 24, 2012

    What do you mean by "does not preserve set identity"? Because I see:

    $ ./python.exe
    Python 3.3.0rc2+ (default:178f9042af81+, Sep 24 2012, 18:54:31) 
    [GCC 4.2.1 Compatible Apple Clang 2.1 (tags/Apple/clang-163.7.1)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> a = set(range(1))
    [65967 refs]
    >>> b = set(range(20))
    [65989 refs]
    >>> id(a)
    4540421032
    [65992 refs]
    >>> a -= b
    [65991 refs]
    >>> id(a)
    4540421032

    @serhiy-storchaka
    Copy link
    Member

    Sorry, I was wrong.

    I think that the proposed changes should be applied in set_difference_update_internal() directly.

    @mmaker
    Copy link
    Mannequin

    mmaker mannequin commented Sep 25, 2012

    Updated.

    tumbolandia:cpython maker$ hg import --no-commit -f bpo-8425.2.patch && make -j3 >/dev/null 2>/dev/null
    sto applicando bpo-8425.2.patch
    tumbolandia:cpython maker$ ./python.exe bench.py starting...
    setting up tests...
    testing...
    big_timer_no_intersection 0.0005461599998852762
    big_timer_subtraction 382.0618862710003
    small_timer 0.0034195990001535392
    big_timer 603.5105132449999
    std_timer_subtraction 0.0006865100003778934
    big_timer_reversed 292.44042680999974
    std_timer 8.092500047496287e-05
    [44705 refs]
    tumbolandia:cpython maker$ hg co -C && make -j3 >/dev/null 2>/dev/null1 files updated, 0 files merged, 0 files removed, 0 files unresolved
    tumbolandia:cpython maker$ ./python.exe bench.py
    starting...
    setting up tests...
    testing...
    big_timer_subtraction 611.292889542
    big_timer 465.68463925800006
    small_timer 0.002618359999360109
    big_timer_reversed 256.5112134430001
    std_timer 0.00011092699969594833
    big_timer_no_intersection 0.0005648139995173551
    std_timer_subtraction 2.8619999284273945e-05
    [44705 refs]

    @serhiy-storchaka
    Copy link
    Member

    I added a few comments in Rietveld.

    Did you run benchmarks in debug mode?

    In order for results to be convenient to compare please sort it by name.

    @serhiy-storchaka
    Copy link
    Member

    s/it/them/

    @mmaker
    Copy link
    Mannequin

    mmaker mannequin commented Sep 25, 2012

    I'm not sure of the usefulness of this comment.
    removing, then.

    "else" redundant here.

    Instead of using a local variable "intersected", you can simply add "else
    Py_INCREF(other)" here and then decref "other" unconditionally. It will be
    shorter and perhaps a little clearer, but a bit slower.

    I find my first patch more readable and efficient, but if these comments are conformant to PEP-7..
    Attaching updated patch (bpo-8425.3.patch)

    Did you run benchmarks in debug mode?
    Yes, but bench.py is available, fell free to run it with whatever configuration; my example was done with a --with-pydebug and clang compiler.

    In order for results to be convenient to compare please sort it by name.
    Done.

    @rhettinger
    Copy link
    Contributor

    Please don't apply this until I've signed-off on it.

    @serhiy-storchaka
    Copy link
    Member

    I find my first patch more readable and efficient, but if these comments
    are conformant to PEP-7.. Attaching updated patch (bpo-8425.3.patch)

    It was only a suggestion. Both forms are good enougth.

    Yes, but bench.py is available, fell free to run it with whatever
    configuration; my example was done with a --with-pydebug and clang
    compiler.

    Yes, I ran it in release mode (gcc, 32-bit Linux) and confirm your results.

    In general except minor whitespace defects last patch LGTM.

    @mmaker
    Copy link
    Mannequin

    mmaker mannequin commented Jul 27, 2013

    ping.

    @rhettinger
    Copy link
    Contributor

    There's no rush on this. I have other work I want to do on set objects before applying further optimizations, so I want to hold off on it for a bit.

    @csabella
    Copy link
    Contributor

    @maker, would you be interested in converting your patch to a Github pull request? Thanks!

    @csabella csabella added the 3.8 only security fixes label May 17, 2019
    @rhettinger
    Copy link
    Contributor

    New changeset 88ea166 by Raymond Hettinger in branch 'master':
    bpo-8425: Fast path for set inplace difference when the second set is large (GH-15590)
    88ea166

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants