This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author bru
Recipients bru, dhaffner, ezio.melotti, hhm, josh.r, pconnell, rhettinger, vstinner
Date 2014-11-27.13:38:46
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1417095527.36.0.696431757909.issue18032@psf.upfronthosting.co.za>
In-reply-to
Content
Here is an updated patch based on Dustin's work with Josh's comments. I also added a test which takes forever on an unpatched python interpreter.

Since it's a performance issue, I've benchmarked the results. They don't change for the most part (argument is a set or a dict) but they're way better for iterables.
For every type of argument I test 1 case where "set.issubset" returns True and 1 case where it returns False.


(a) simple argument (results unchanged)

$ ./python -m timeit -s "s1 = set(range(1000)); s2 = set(range(1000))" "s1.issubset(s2)"
Unpatched: 10000 loops, best of 3: 63.7 usec per loop
Patched:   10000 loops, best of 3: 63.5 usec per loop

$ ./python -m timeit -s "s1 = set(range(1000)); s2 = set(range(1, 1000))" "s1.issubset(s2)"
Unpatched: 1000000 loops, best of 3: 0.248 usec per loop
Patched:   1000000 loops, best of 3: 0.25 usec per loop

$ ./python -m timeit -s "s1 = set(range(1000)); s2 = dict(enumerate(range(1000)))" "s1.issubset(s2)"
Unpatched: 10000 loops, best of 3: 107 usec per loop
Patched: 10000 loops, best of 3: 108 usec per loop

$ ./python -m timeit -s "s1 = set(range(1000)); s2 = dict(enumerate(range(1, 1000)))" "s1.issubset(s2)"
Unpatched: 10000 loops, best of 3: 43.5 usec per loop
Patched:   10000 loops, best of 3: 42.6 usec per loop


(b) iterable argument (speed improvement)

1) no improvements/slight degradation when everything must be consumed

$ ./python -m timeit -s "s1 = set(range(1000))" "s1.issubset(range(1000))"
Unpatched: 1000 loops, best of 3: 263 usec per loop
Patched:   1000 loops, best of 3: 263 usec per loop

$ ./python -m timeit -s "s1 = set(range(1000))" "s1.issubset(range(1, 1000))"
Unpatched: 10000 loops, best of 3: 201 usec per loop
Patched:   1000 loops, best of 3: 259 usec per loop

$ ./python -m timeit -s "s1 = set(range(100))" "s1.issubset(range(1, 1000))"
Unpatched: 1000 loops, best of 3: 198 usec per loop
Patched:   1000 loops, best of 3: 218 usec per loop

2) tremendous improvements when it can return early

$ ./python -m timeit -s "s1 = set(range(100))" "s1.issubset(range(1000))"
Unpatched: 1000 loops, best of 3: 209 usec per loop
Patched:   100000 loops, best of 3: 12.1 usec per loop

$ ./python -m timeit -s "s1 = set('a'); s2 = ['a'] + ['b'] * 10000" "s1.issubset(s2)"
Unpatched: 1000 loops, best of 3: 368 usec per loop
Patched:   1000000 loops, best of 3: 0.934 usec per loop

$ ./python -m timeit -s "s1 = set('a'); from itertools import repeat" "s1.issubset(repeat('a'))"
Unpatched: NEVER FINISHES
Patched:   1000000 loops, best of 3: 1.33 usec per loop
History
Date User Action Args
2014-11-27 13:38:47brusetrecipients: + bru, rhettinger, vstinner, ezio.melotti, hhm, dhaffner, pconnell, josh.r
2014-11-27 13:38:47brusetmessageid: <1417095527.36.0.696431757909.issue18032@psf.upfronthosting.co.za>
2014-11-27 13:38:47brulinkissue18032 messages
2014-11-27 13:38:47brucreate