classification
Title: Stop set.difference when set is empty
Type: performance Stage: patch review
Components: Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: python-dev, rhettinger, serhiy.storchaka, terry.reedy, xtreak
Priority: normal Keywords: patch

Created on 2016-09-10 22:22 by terry.reedy, last changed 2019-06-11 05:50 by xtreak. This issue is now closed.

Files
File name Uploaded Description Edit
set_diff_early_out.diff rhettinger, 2016-09-11 19:53 review
Messages (8)
msg275707 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-09-10 22:22
Proposal from SO: in the iteration loop for 'myset.difference(iterable)', add equivalent of "if not myset: break'.
https://stackoverflow.com/questions/39378043/why-does-pythons-set-difference-method-take-time-with-an-empty-set

In the toy example for testing that this is currently not so, myset starts empty, but it was noted by the OP that a more realistic example would be that myset becomes empty after deletions.

Postulated reasons why not to do this (as opposed to why it has not been done yet ;-):

1) averaged over all uses, the time saved not iterating would be less than the time spent testing emptyness.

2) an implicit guarantee to complete the iteration for possible side-effects.

One answer notes that myset.difference(anotherset) is special-cased and faster than the equivalent non-set iterable.

I searched the tracker for 'empty set difference' and got no hits.  If I remember, I will post any disposition of this issue back to SO.
msg275835 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-09-11 19:24
This is reasonable, cheap, and not hard to do.   I'll whip-up a patch shortly.
msg275841 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-09-11 19:53
New timings look nice:

     $ py -m timeit -s "r = range(10 ** 4); s = set()" "s.difference(r)"
    10000000 loops, best of 3: 0.104 usec per loop
     $ py -m timeit -s "r = set(range(10 ** 4)); s = set()" "s.difference(r)"
    10000000 loops, best of 3: 0.105 usec per loop
     $ py -m timeit -s "r = range(10 ** 4); s = set()" "s.difference_update(r)"
    10000000 loops, best of 3: 0.0659 usec per loop
     $ py -m timeit -s "r = set(range(10 ** 4)); s = set()" "s.difference_update(r)"
    10000000 loops, best of 3: 0.0684 usec per loop
msg275962 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-09-12 05:02
New changeset fa0af1a6344d by Raymond Hettinger in branch 'default':
Issue #28071: Add early-out for differencing from an empty set.
https://hg.python.org/cpython/rev/fa0af1a6344d
msg275965 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-09-12 05:15
The patch covers sets that start empty, but not sets that become empty during iteration.  Are you thinking about or rejecting the latter?  (And should this then be closed?)
msg275967 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-09-12 05:43
I only want a single quick check upfront.  It doesn't make much sense to put this in the inner loop.
msg306973 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-11-25 20:35
This optimization caused a behavior change:

Python 3.5:

>>> set().difference([[]])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Python 3.6:

>>> set().difference([[]])
set()
msg345185 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-06-11 05:50
Behavior change in msg306973 is tracked with issue37219
History
Date User Action Args
2019-06-11 05:50:18xtreaksetnosy: + xtreak
messages: + msg345185
2017-11-25 20:35:11serhiy.storchakasetmessages: + msg306973
2016-09-12 05:43:41rhettingersetstatus: open -> closed
resolution: fixed
messages: + msg275967
2016-09-12 05:31:37serhiy.storchakasetnosy: + serhiy.storchaka
2016-09-12 05:15:15terry.reedysetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg275965
2016-09-12 05:04:55rhettingersetstatus: open -> closed
resolution: fixed
2016-09-12 05:02:38python-devsetnosy: + python-dev
messages: + msg275962
2016-09-11 19:53:14rhettingersetfiles: + set_diff_early_out.diff
keywords: + patch
messages: + msg275841

stage: needs patch -> patch review
2016-09-11 19:24:08rhettingersetmessages: + msg275835
2016-09-11 19:19:53rhettingersetassignee: rhettinger
2016-09-10 22:22:04terry.reedycreate