classification
Title: One operation on sets < 1/100th as efficient in 3.9 than before
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Dennis Sweeney, eamartin, rhettinger, tim.peters, zach.ware
Priority: normal Keywords: patch

Created on 2021-02-11 00:15 by eamartin, last changed 2021-03-24 22:33 by rhettinger. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 25010 merged rhettinger, 2021-03-24 06:18
Messages (9)
msg386815 - (view) Author: Eric Martin (eamartin) Date: 2021-02-11 00:15
Run on a MacBook Pro (15-inch, 2019)
         2.3 GHz 8-Core Intel Core i9
         16 GB 2400 MHz DDR4


Python 3.8.7 (default, Dec 24 2020, 19:07:18) 
[Clang 12.0.0 (clang-1200.0.32.27)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from timeit import timeit
>>> S = set(range(10_000))
>>> timeit('S.remove(5000); S.add(5000)', globals=globals(), number=100_000)
0.01933291699999984

$ python3.9
Python 3.9.1 (default, Dec  8 2020, 13:10:53) 
[Clang 12.0.0 (clang-1200.0.32.27)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from timeit import timeit
>>> S = set(range(10_000))
>>> timeit('S.remove(5000); S.add(5000)', globals=globals(), number=100_000)
3.8517225489999998
msg386816 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2021-02-11 00:31
Can reproduce on Linux:

$ for python in /usr/bin/python3.? /usr/local/bin/python3.?
do 
    $python -VV
    $python -m timeit -r 10 -n 100_000 -u usec -s 'S = set(range(10_000))' 'S.remove(5000);S.add(5000)'
done
Python 3.8.5 (default, Jul 28 2020, 12:59:40) 
[GCC 9.3.0]
100000 loops, best of 10: 0.0831 usec per loop
Python 3.6.11 (tags/v3.6.11:d56cd4006a, Jul  1 2020, 10:47:06) 
[GCC 9.3.0]
100000 loops, best of 10: 0.0968 usec per loop
Python 3.7.9 (tags/v3.7.9:13c94747c7, Oct 19 2020, 23:50:03) 
[GCC 9.3.0]
100000 loops, best of 10: 0.0935 usec per loop
Python 3.9.0 (tags/v3.9.0:9cf6752276, Nov 14 2020, 15:41:20) 
[GCC 9.3.0]
100000 loops, best of 10: 50.2 usec per loop
msg386817 - (view) Author: Dennis Sweeney (Dennis Sweeney) * (Python triager) Date: 2021-02-11 03:27
I bisected the change to here:

https://github.com/python/cpython/pull/19881
commit 3dd2157febae5087ca3333d24f69b6de9cbd13cd
Author: Raymond Hettinger <rhettinger@users.noreply.github.com>
Date:   Sun May 3 04:51:05 2020 -0700

    Simplify set entry insertion logic. (GH-19881)

"""Dictionaries no longer reuse dummy entries. Instead, dummies accumulate until cleared by resizes. That is a good strategy because it tightens the inner search loop and it makes the code cleaner. Here, we do the same thing for sets"""


It looks like this was a deliberate change to speed up lookups, but it also looks like it adversely affected this case.  With dictionaries, mutation typically occurs as setting a different value for the same key---no dummy entries get created, so dummy entries are less common. But with sets, mutation can only come from adding/discarding, so maybe this optimization is less worthwhile for sets.
msg386819 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-02-11 05:40
This only affects the contrived case of adding and removing exactly the same element in a tight loop, when the set is large.  Optimizing that one corner case came at the expense of all other cases.  The code is now simpler and slightly faster than before.  This is also what we do for dictionaries.

$ python3.8 -m timeit -r 11 -s 's=set(range(10_000))' 'for i in range(10_000): (s.discard(i), s.add(10_000 - i))'
200 loops, best of 11: 1.72 msec per loop

$ python3.9 -m timeit -r 11 -s 's=set(range(10_000))' 'for i in range(10_000): (s.discard(i), s.add(10_000 - i))'
200 loops, best of 11: 1.09 msec per loop

Thank you for the report, but this was an intended change.
msg386848 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-02-12 03:04
Addenda [reposted to fix a premature submission].

Top use cases for sets (roughly in order of commonality):

1. Eliminate duplicates from an iterable input and loop over the result.
2. Store elements in a set just once but do many membership tests.
3. Perform data analysis on multiple sets using union, intersection, difference, and isdisjoint.
4. Remove elements from a set, one at a time, until it is empty.
5. Algorithms and that alternately add and remove different elements over time (toposort, NFA, DFA, etc).  

The first three are all penalized by an extra inner loop test and the loss of the register to track the freeslot.  Those use cases get no offsetting benefit.

Case 4 doesn't exercise the dummy reuse at all.  So, it is unaffected.

The last is approximately a net wash.  It pays the inner loop price, suffers the loss of a register, and incurs an extra test outside the loop, but it occasionally gets lucky and reuses a freeslot. The benefit for reuse is that it is delays the inevitable resize which would have reclaimed the dummy entries earlier. (The comparable use case for dicts is LRU/LFU caches where new entries are added at the same rate as old entries are removed.  That use case also showed a net wash when freeslot tracking was removed from dicts).

Not on the list at all is the case of a large set, where exactly the same element is added and removed in a tight loop.  The payoff for this case is that the O(n) resize step never occurs; however, this case is so rare that there is no reason to preference it over the common cases.

If the addition and removal of the same element happens only a few times, with no other set updates, the performance is about the same as case 5.

However, if there are many such updates and the set is large (as in the OP's code sample), the add() operation becomes quadratic because the chain of dummy or used entries grows larger and larger with each reinsertion.  (FWIW, dicts also face the same issue and some additional ones related to maintaining insertion order).  The question is whether this unhappy case is worth burdening all of the normal use cases.  If this turns out to be a real issue in practice, we can reopen this; otherwise, let's stick with what we have.
msg386849 - (view) Author: Eric Martin (eamartin) Date: 2021-02-12 03:40
Many thanks Raymond for these insights and thorough explanations, I appreciate it very much. Your conclusions are of course most reasonable. For me, I cannot say that it is a real issue in practice. I have teaching notes in which I illustrate with plots the differences in complexity when working with sets versus lists. One such plot showed that the cost of removing an element from a set does not depend on what the element is (which is why I reused the same element again and again) and is orders of magnitude less than for removing an element from the middle of a list. I just found out that something had changed with Python3.9 when I last run the cell of the jupyter notebook in which that plot was produced...
msg388326 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-03-09 02:49
I've had misgivings about this and will revert the change for Python 3.10.  

Usually, it is safe for sets to implement whatever has been battle-tested with dicts, but sets should aspire to do better for alternating deletions and re-additions of the same element, even if that isn't a common case.
msg388349 - (view) Author: Eric Martin (eamartin) Date: 2021-03-09 10:50
Thank you for letting us know, I am actually not surprised the issue would bugger you and you would change your mind... Thinking further about it and experimenting further, I thought it might not be such an edge case and there could be a significant cost in some applications. Many thanks Raymond for everything you give to the community!
msg389491 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-03-24 22:33
New changeset 72789592a3491bab49f144bb292679b1484885d9 by Raymond Hettinger in branch 'master':
bpo-43198:  Revert 3dd2157 that removed freeslot tracking. (#25010)
https://github.com/python/cpython/commit/72789592a3491bab49f144bb292679b1484885d9
History
Date User Action Args
2021-03-24 22:33:57rhettingersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-03-24 22:33:36rhettingersetmessages: + msg389491
2021-03-24 06:18:39rhettingersetkeywords: + patch
stage: resolved -> patch review
pull_requests: + pull_request23767
2021-03-12 19:40:47terry.reedysettitle: Operations on sets more than hundred times less efficient with python3.9 than with previous versions -> One operation on sets < 1/100th as efficient in 3.9 than before
2021-03-09 10:50:25eamartinsetmessages: + msg388349
2021-03-09 02:49:32rhettingersetstatus: closed -> open
versions: + Python 3.10, - Python 3.9
nosy: + tim.peters

messages: + msg388326

resolution: not a bug -> (no value)
2021-02-12 03:40:53eamartinsetmessages: + msg386849
2021-02-12 03:04:30rhettingersetmessages: + msg386848
2021-02-12 02:52:36rhettingersetmessages: - msg386847
2021-02-12 02:45:54rhettingersetmessages: + msg386847
2021-02-11 19:27:12rhettingersetassignee: rhettinger
2021-02-11 05:40:22rhettingersetstatus: open -> closed
resolution: not a bug
messages: + msg386819

stage: resolved
2021-02-11 03:27:04Dennis Sweeneysetnosy: + Dennis Sweeney
messages: + msg386817
2021-02-11 00:31:49zach.waresetnosy: + rhettinger, zach.ware
messages: + msg386816
2021-02-11 00:15:07eamartincreate