New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speed of set.update( #46924
Comments
I was performance profiling some of my own code, and I ran into I double checked my findings with timeit: With python 2.4.3: $ python -m timeit -s 'x = set(range(10000))' 'x.update([])'
1000000 loops, best of 3: 0.296 usec per loop
$ python -m timeit -s 'x = set(range(10000))' 'x.update(y for y in [])'
1000000 loops, best of 3: 0.837 usec per loop
$ python -m timeit -s 'x = set(range(10000))' 'x.update([y for y in []])'
1000000 loops, best of 3: 0.462 usec per loop With 2.5.1 (on a different machine) So generally, it is about 2x faster to create the empty list expression |
This has nothing to do with set.update, the difference is due to the $ python -m timeit -s 'x = set(range(10000)); y = []' 'x.update(y)'
1000000 loops, best of 3: 0.38 usec per loop
$ python -m timeit -s 'x = set(range(10000)); y = (i for i in [])'
'x.update(y)'
1000000 loops, best of 3: 0.335 usec per loop |
I concur. The source code for set_update() in Objects/setobject.c |
-----BEGIN PGP SIGNED MESSAGE----- Alexander Belopolsky wrote:
That is true, though if I just force a generator overhead: % python -m timeit -s 'x = set(range(10000)); y = []' 'x.update(y)' So if you compare consuming a generator multiple times to creating it So why does: "(i for i in l); x.update(y)" take an additional 1.208 usec. (I'm certainly willing to believe that set.update() is generator/list John -----BEGIN PGP SIGNATURE----- iD8DBQFIENAoJdeBCYSNAAMRAk2yAJ4okAalR6zWD0/E5XHei/ckce+L7QCgstEQ |
On Thu, Apr 24, 2008 at 2:23 PM, John Arbash Meinel
I've seen a similar strangeness in timings: $ python -m timeit '(i for i in [])'
100000 loops, best of 3: 4.16 usec per loop but $ python -m timeit -s 'x = set()' 'x.update(i for i in [])'
1000000 loops, best of 3: 1.31 usec per loop on the other hand, $ python -m timeit -s 'x = []' 'x.extend(i for i in [])'
100000 loops, best of 3: 4.54 usec per loop How can x.update(i for i in []) take *less* time than simply creating a genexp? Note that there is no apparent bytecode tricks here: 1 0 LOAD_CONST 0 (<code object <genexpr> at
0xf7e88920, file "<stdin>", line 1>)
3 MAKE_FUNCTION 0
6 BUILD_LIST 0
9 GET_ITER
10 CALL_FUNCTION 1
13 RETURN_VALUE
>>> dis(lambda:x.update(i for i in []))
1 0 LOAD_GLOBAL 0 (x)
3 LOAD_ATTR 1 (update)
6 LOAD_CONST 0 (<code object <genexpr> at
0xf7e88920, file "<stdin>", line 1>)
9 MAKE_FUNCTION 0
12 BUILD_LIST 0
15 GET_ITER
16 CALL_FUNCTION 1
19 CALL_FUNCTION 1
22 RETURN_VALUE |
John, when y=[], the update method has to create a new list iterator on Also, when doing timings, it can be helpful to factor-out the attribute python -m timeit -s 'x=set(range(10000)); y=[]; xu=x.update' 'xu(y)' |
-----BEGIN PGP SIGNED MESSAGE----- Raymond Hettinger wrote:
Sure, I wasn't surprised at the "set.update(y)" versus "set.update([])" What I was surprised at is the time for: "(i for i in [])" being about 4x longer than Anyway, the original issue is probably closed, whether we want to track John -----BEGIN PGP SIGNATURE----- iD8DBQFIEP4EJdeBCYSNAAMRAq+MAKC6tLjEtIBX7YgLNoYEfqjRKB4DzACglXjh |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: