This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author spiv
Recipients spiv
Date 2010-05-11.09:04:15
SpamBayes Score 0.0016999864
Marked as misclassified No
Message-id <1273568659.49.0.865922101958.issue8685@psf.upfronthosting.co.za>
In-reply-to
Content
set.difference(s), when s is also a set, basically does::

    res = set()
    for elem in self:
        if elem not in other:
            res.add(elem)

This is wasteful when len(self) is much greater than len(other):

$ python -m timeit -s "s = set(range(100000)); sd = s.difference; empty = set()" "sd(empty)"
100 loops, best of 3: 12.8 msec per loop
$ python -m timeit -s "s = set(range(10)); sd = s.difference; empty = set()" "sd(empty)"
1000000 loops, best of 3: 1.18 usec per loop

Here's a patch that compares the lengths of self and other before that loop, and if len(self) is greater, swaps them.  The new timeit results are:

$ python -m timeit -s "s = set(range(100000)); sd = s.difference; empty = set()" "sd(empty)"
1000000 loops, best of 3: 0.289 usec per loop
$ python -m timeit -s "s = set(range(10)); sd = s.difference; empty = set()" "sd(empty)"
1000000 loops, best of 3: 0.294 usec per loop
History
Date User Action Args
2010-05-11 09:04:19spivsetrecipients: + spiv
2010-05-11 09:04:19spivsetmessageid: <1273568659.49.0.865922101958.issue8685@psf.upfronthosting.co.za>
2010-05-11 09:04:17spivlinkissue8685 messages
2010-05-11 09:04:16spivcreate