a -= b should be fast if a is a small set and b is a large set #52672

abacabadabacaba · 2010-04-16T17:19:41Z

BPO	8425
Nosy	@rhettinger, @mdickinson, @abalkin, @ericvsmith, @ezio-melotti, @bitdancer, @mmaker, @serhiy-storchaka, @csabella
PRs	bpo-8425: Set inplace difference fash path for large complement sets #15590
Files	issue8425.diff issue8425a.diff issue8425-with-tests.diff: Patch against py3k revision 81048 issue8425.patch issue8425.1.patch issue8425.2.patch issue8425.3.patch bench.py

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/rhettinger'
closed_at = <Date 2019-08-29.16:03:37.386>
created_at = <Date 2010-04-16.17:19:41.106>
labels = ['interpreter-core', '3.8', 'performance']
title = 'a -= b should be fast if a is a small set and b is a large set'
updated_at = <Date 2019-08-29.16:03:37.385>
user = 'https://bugs.python.org/abacabadabacaba'

bugs.python.org fields:

activity = <Date 2019-08-29.16:03:37.385>
actor = 'rhettinger'
assignee = 'rhettinger'
closed = True
closed_date = <Date 2019-08-29.16:03:37.386>
closer = 'rhettinger'
components = ['Interpreter Core']
creation = <Date 2010-04-16.17:19:41.106>
creator = 'abacabadabacaba'
dependencies = []
files = ['17282', '17283', '17284', '27265', '27272', '27292', '27303', '27306']
hgrepos = []
issue_num = 8425
keywords = ['patch']
message_count = 27.0
messages = ['103343', '103552', '105450', '105451', '105452', '105455', '105492', '122954', '122957', '171049', '171106', '171117', '171153', '171159', '171160', '171162', '171164', '171232', '171306', '171307', '171312', '171322', '171375', '193784', '193807', '342713', '350795']
nosy_count = 11.0
nosy_names = ['rhettinger', 'mark.dickinson', 'belopolsky', 'eric.smith', 'stutzbach', 'ezio.melotti', 'r.david.murray', 'abacabadabacaba', 'maker', 'serhiy.storchaka', 'cheryl.sabella']
pr_nums = ['15590']
priority = 'low'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'resource usage'
url = 'https://bugs.python.org/issue8425'
versions = ['Python 3.8']

abacabadabacaba · 2010-04-16T17:19:41Z

>> small_set = set(range(2000))
>> large_set = set(range(20000000))
>> large_set -= small_set # Fast
>> small_set -= large_set # Slow, should be fast
>> small_set = small_set - large_set # Fast

ezio-melotti · 2010-04-19T04:20:04Z

Raymond, have you forgotten to attach the patch or have you just set the stage to 'patch review' by mistake?

abalkin · 2010-05-10T17:44:35Z

The problem is apparently due to the fact that small_set -= large_set
iterates over the large set removing entries from small_set while more
efficient small_set - large_set builds a new set iterating over a
small_set and adding items that are not in the large set. Since
lookups are O(1), it is more efficient to iterate over a small set
while looking up a large set than the other way around. I am
attaching a minimalist patch which gives up in-place update for s1 -=
s2 if len(s1) < len(s2) and effectively replaces it with s1 = s1 - s2.
The code can be further optimized by replicating set_difference logic
in set_isub instead of relying on fall back behavior, but the big
question here with whether it is acceptable to give up preserving set
identity in in-place subtract to gain performance.

bitdancer · 2010-05-10T18:07:41Z

The answer is almost certainly "no".

abalkin · 2010-05-10T18:13:06Z

On the second thought, it is possible to preserve set identity and avoid iteration over a large set. See issue8425a.diff attached.

It looks like the unit tests don't check identity preservation. I will submit additional tests shortly.

abalkin · 2010-05-10T18:51:22Z

Note that issue8425a.diff leaves small_set.difference_update(large_dict) unoptimized:

$ ./python.exe -m timeit -s "s = {1}; l = dict.fromkeys(range(1000));" "s.difference_update(l)"
1000 loops, best of 3: 842 usec per loop
$ ./python.exe -m timeit -s "s = {1}; l = dict.fromkeys(range(1000));" "s.difference(l)"
100000 loops, best of 3: 13.2 usec per loop

It would be easy to add an extra type check, but I noticed that small_set.difference_update(large_dict.viewkeys()) is not optimized either:

$ ./python.exe -m timeit -s "s = {1}; l = dict.fromkeys(range(1000));" "s.difference(l.viewkeys())"
1000 loops, best of 3: 842 usec per loop

It would be nice if there was a uniform C-API that would allow to uniformly check that an object supports O(1) lookup and invoke such lookup efficiently. I don't think such C-API exists at the moment.

I would like to hear what others have to say before adding optimizations to the patch that go beyond the scope of this issue.

mdickinson · 2010-05-11T09:31:01Z

See also bpo-8685.

rhettinger · 2010-11-30T23:24:48Z

Deferring to 3.3.

rhettinger · 2010-11-30T23:48:30Z

Any new logic should make maximum use of existing tools:

def __isub__(self, other)
    if len(other) > len(self)*8:
        other = self & other
    . . .
    # rest of isub unchanged

mmaker · 2012-09-23T15:59:36Z

Something like this would also need unit tests?

$ ./python.exe -m timeit -s "s= set(range(2000)); l = set(range(20000000)); s=s-l"
10000000 loops, best of 3: 0.0622 usec per loop
[48787 refs]
$ ./python.exe -m timeit -s "s= set(range(2000)); l = set(range(20000000)); s-=l"
10000000 loops, best of 3: 0.0591 usec per loop
[48787 refs]

mmaker · 2012-09-24T09:24:21Z

Reviewed with Ezio.

serhiy-storchaka · 2012-09-24T11:28:15Z

$ ./python.exe -m timeit -s "s= set(range(2000)); l = set(range(20000000)); s=s-l"

You are measure empty loop.

mmaker · 2012-09-24T15:06:34Z

woops, sry. Re-posting the benchmark, with three tests: the first one proposed (@abacabadabacaba) with very large sets; another one with smaller sets.

$ ./python.exe -m timeit  -n 100 -s "s= set(range(2000)); l = set(range(20000000))"   "s-=l"
100 loops, best of 3: 9.71 usec per loop
[48787 refs]

$ ./python.exe -m timeit  -n 100 -s "s= set(range(1)); l = set(range(20))"   "s-=l"
100 loops, best of 3: 0.366 usec per loop


$ hg co -C
$ make -j3

----[!PATCHED]--------------------------------------------------

$ ./python.exe -m timeit  -n 100 -s "s= set(range(2000)); l = set(range(20000000))"   "s-=l"
100 loops, best of 3: 665 msec per loop
[48787 refs]

$ ./python.exe -m timeit  -n 100 -s "s= set(range(1)); l = set(range(20))"   "s-=l"
100 loops, best of 3: 0.849 usec per loop
[48787 refs]

serhiy-storchaka · 2012-09-24T16:23:39Z

$ ./python.exe -m timeit -n 100 -s "s= set(range(2000)); l = set(range(20000000))" "s-=l"

s is empty set after first loop. Try this:

$ ./python.exe -m timeit  -n 100 -s "s= set(range(2000)); l = set(range(2000,2000+20000000))"   "s-=l"

serhiy-storchaka · 2012-09-24T16:38:37Z

Michele, in any case you patch is not preserve set identity.

mmaker · 2012-09-24T17:02:38Z

What do you mean by "does not preserve set identity"? Because I see:

$ ./python.exe
Python 3.3.0rc2+ (default:178f9042af81+, Sep 24 2012, 18:54:31) 
[GCC 4.2.1 Compatible Apple Clang 2.1 (tags/Apple/clang-163.7.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = set(range(1))
[65967 refs]
>>> b = set(range(20))
[65989 refs]
>>> id(a)
4540421032
[65992 refs]
>>> a -= b
[65991 refs]
>>> id(a)
4540421032

serhiy-storchaka · 2012-09-24T17:33:11Z

Sorry, I was wrong.

I think that the proposed changes should be applied in set_difference_update_internal() directly.

mmaker · 2012-09-25T08:22:38Z

Updated.

tumbolandia:cpython maker$ hg import --no-commit -f bpo-8425.2.patch && make -j3 >/dev/null 2>/dev/null
sto applicando bpo-8425.2.patch
tumbolandia:cpython maker$ ./python.exe bench.py starting...
setting up tests...
testing...
big_timer_no_intersection 0.0005461599998852762
big_timer_subtraction 382.0618862710003
small_timer 0.0034195990001535392
big_timer 603.5105132449999
std_timer_subtraction 0.0006865100003778934
big_timer_reversed 292.44042680999974
std_timer 8.092500047496287e-05
[44705 refs]
tumbolandia:cpython maker$ hg co -C && make -j3 >/dev/null 2>/dev/null1 files updated, 0 files merged, 0 files removed, 0 files unresolved
tumbolandia:cpython maker$ ./python.exe bench.py
starting...
setting up tests...
testing...
big_timer_subtraction 611.292889542
big_timer 465.68463925800006
small_timer 0.002618359999360109
big_timer_reversed 256.5112134430001
std_timer 0.00011092699969594833
big_timer_no_intersection 0.0005648139995173551
std_timer_subtraction 2.8619999284273945e-05
[44705 refs]

serhiy-storchaka · 2012-09-25T19:03:07Z

I added a few comments in Rietveld.

Did you run benchmarks in debug mode?

In order for results to be convenient to compare please sort it by name.

serhiy-storchaka · 2012-09-25T19:04:56Z

s/it/them/

mmaker · 2012-09-25T19:27:16Z

I'm not sure of the usefulness of this comment.
removing, then.

"else" redundant here.

Instead of using a local variable "intersected", you can simply add "else
Py_INCREF(other)" here and then decref "other" unconditionally. It will be
shorter and perhaps a little clearer, but a bit slower.

I find my first patch more readable and efficient, but if these comments are conformant to PEP-7..
Attaching updated patch (bpo-8425.3.patch)

Did you run benchmarks in debug mode?
Yes, but bench.py is available, fell free to run it with whatever configuration; my example was done with a --with-pydebug and clang compiler.

In order for results to be convenient to compare please sort it by name.
Done.

rhettinger · 2012-09-25T23:06:47Z

Please don't apply this until I've signed-off on it.

serhiy-storchaka · 2012-09-27T14:16:18Z

I find my first patch more readable and efficient, but if these comments
are conformant to PEP-7.. Attaching updated patch (bpo-8425.3.patch)

It was only a suggestion. Both forms are good enougth.

Yes, but bench.py is available, fell free to run it with whatever
configuration; my example was done with a --with-pydebug and clang
compiler.

Yes, I ran it in release mode (gcc, 32-bit Linux) and confirm your results.

In general except minor whitespace defects last patch LGTM.

mmaker · 2013-07-27T13:57:30Z

ping.

rhettinger · 2013-07-28T04:10:08Z

There's no rush on this. I have other work I want to do on set objects before applying further optimizations, so I want to hold off on it for a bit.

csabella · 2019-05-17T12:44:34Z

@maker, would you be interested in converting your patch to a Github pull request? Thanks!

rhettinger · 2019-08-29T16:03:05Z

New changeset 88ea166 by Raymond Hettinger in branch 'master':
bpo-8425: Fast path for set inplace difference when the second set is large (GH-15590)
88ea166

abacabadabacaba mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Apr 16, 2010

rhettinger self-assigned this Apr 16, 2010

rhettinger added the easy label Apr 18, 2010

serhiy-storchaka removed the easy label Sep 24, 2012

csabella added the 3.8 only security fixes label May 17, 2019

rhettinger closed this as completed Aug 29, 2019

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a -= b should be fast if a is a small set and b is a large set #52672

a -= b should be fast if a is a small set and b is a large set #52672

abacabadabacaba mannequin commented Apr 16, 2010

abacabadabacaba mannequin commented Apr 16, 2010

ezio-melotti commented Apr 19, 2010

abalkin commented May 10, 2010

bitdancer commented May 10, 2010

abalkin commented May 10, 2010

abalkin commented May 10, 2010

mdickinson commented May 11, 2010

rhettinger commented Nov 30, 2010

rhettinger commented Nov 30, 2010

mmaker mannequin commented Sep 23, 2012

mmaker mannequin commented Sep 24, 2012

serhiy-storchaka commented Sep 24, 2012

mmaker mannequin commented Sep 24, 2012

serhiy-storchaka commented Sep 24, 2012

serhiy-storchaka commented Sep 24, 2012

mmaker mannequin commented Sep 24, 2012

serhiy-storchaka commented Sep 24, 2012

mmaker mannequin commented Sep 25, 2012

serhiy-storchaka commented Sep 25, 2012

serhiy-storchaka commented Sep 25, 2012

mmaker mannequin commented Sep 25, 2012

rhettinger commented Sep 25, 2012

serhiy-storchaka commented Sep 27, 2012

mmaker mannequin commented Jul 27, 2013

rhettinger commented Jul 28, 2013

csabella commented May 17, 2019

rhettinger commented Aug 29, 2019

a -= b should be fast if a is a small set and b is a large set #52672

a -= b should be fast if a is a small set and b is a large set #52672

Comments

abacabadabacaba mannequin commented Apr 16, 2010

abacabadabacaba mannequin commented Apr 16, 2010

ezio-melotti commented Apr 19, 2010

abalkin commented May 10, 2010

bitdancer commented May 10, 2010

abalkin commented May 10, 2010

abalkin commented May 10, 2010

mdickinson commented May 11, 2010

rhettinger commented Nov 30, 2010

rhettinger commented Nov 30, 2010

mmaker mannequin commented Sep 23, 2012

mmaker mannequin commented Sep 24, 2012

serhiy-storchaka commented Sep 24, 2012

mmaker mannequin commented Sep 24, 2012

serhiy-storchaka commented Sep 24, 2012

serhiy-storchaka commented Sep 24, 2012

mmaker mannequin commented Sep 24, 2012

serhiy-storchaka commented Sep 24, 2012

mmaker mannequin commented Sep 25, 2012

serhiy-storchaka commented Sep 25, 2012

serhiy-storchaka commented Sep 25, 2012

mmaker mannequin commented Sep 25, 2012

rhettinger commented Sep 25, 2012

serhiy-storchaka commented Sep 27, 2012

mmaker mannequin commented Jul 27, 2013

rhettinger commented Jul 28, 2013

csabella commented May 17, 2019

rhettinger commented Aug 29, 2019