Message231760
Here is an updated patch based on Dustin's work with Josh's comments. I also added a test which takes forever on an unpatched python interpreter.
Since it's a performance issue, I've benchmarked the results. They don't change for the most part (argument is a set or a dict) but they're way better for iterables.
For every type of argument I test 1 case where "set.issubset" returns True and 1 case where it returns False.
(a) simple argument (results unchanged)
$ ./python -m timeit -s "s1 = set(range(1000)); s2 = set(range(1000))" "s1.issubset(s2)"
Unpatched: 10000 loops, best of 3: 63.7 usec per loop
Patched: 10000 loops, best of 3: 63.5 usec per loop
$ ./python -m timeit -s "s1 = set(range(1000)); s2 = set(range(1, 1000))" "s1.issubset(s2)"
Unpatched: 1000000 loops, best of 3: 0.248 usec per loop
Patched: 1000000 loops, best of 3: 0.25 usec per loop
$ ./python -m timeit -s "s1 = set(range(1000)); s2 = dict(enumerate(range(1000)))" "s1.issubset(s2)"
Unpatched: 10000 loops, best of 3: 107 usec per loop
Patched: 10000 loops, best of 3: 108 usec per loop
$ ./python -m timeit -s "s1 = set(range(1000)); s2 = dict(enumerate(range(1, 1000)))" "s1.issubset(s2)"
Unpatched: 10000 loops, best of 3: 43.5 usec per loop
Patched: 10000 loops, best of 3: 42.6 usec per loop
(b) iterable argument (speed improvement)
1) no improvements/slight degradation when everything must be consumed
$ ./python -m timeit -s "s1 = set(range(1000))" "s1.issubset(range(1000))"
Unpatched: 1000 loops, best of 3: 263 usec per loop
Patched: 1000 loops, best of 3: 263 usec per loop
$ ./python -m timeit -s "s1 = set(range(1000))" "s1.issubset(range(1, 1000))"
Unpatched: 10000 loops, best of 3: 201 usec per loop
Patched: 1000 loops, best of 3: 259 usec per loop
$ ./python -m timeit -s "s1 = set(range(100))" "s1.issubset(range(1, 1000))"
Unpatched: 1000 loops, best of 3: 198 usec per loop
Patched: 1000 loops, best of 3: 218 usec per loop
2) tremendous improvements when it can return early
$ ./python -m timeit -s "s1 = set(range(100))" "s1.issubset(range(1000))"
Unpatched: 1000 loops, best of 3: 209 usec per loop
Patched: 100000 loops, best of 3: 12.1 usec per loop
$ ./python -m timeit -s "s1 = set('a'); s2 = ['a'] + ['b'] * 10000" "s1.issubset(s2)"
Unpatched: 1000 loops, best of 3: 368 usec per loop
Patched: 1000000 loops, best of 3: 0.934 usec per loop
$ ./python -m timeit -s "s1 = set('a'); from itertools import repeat" "s1.issubset(repeat('a'))"
Unpatched: NEVER FINISHES
Patched: 1000000 loops, best of 3: 1.33 usec per loop |
|
Date |
User |
Action |
Args |
2014-11-27 13:38:47 | bru | set | recipients:
+ bru, rhettinger, vstinner, ezio.melotti, hhm, dhaffner, pconnell, josh.r |
2014-11-27 13:38:47 | bru | set | messageid: <1417095527.36.0.696431757909.issue18032@psf.upfronthosting.co.za> |
2014-11-27 13:38:47 | bru | link | issue18032 messages |
2014-11-27 13:38:47 | bru | create | |
|