Issue2672
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008-04-23 02:44 by jameinel, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (7) | |||
---|---|---|---|
msg65694 - (view) | Author: John Arbash Meinel (jameinel) | Date: 2008-04-23 02:44 | |
I was performance profiling some of my own code, and I ran into something unexpected. Specifically, set.update(empty_generator_expression) was significantly slower than set.update(empty_list_expression). I double checked my findings with timeit: With python 2.4.3: $ python -m timeit -s 'x = set(range(10000))' 'x.update([])' 1000000 loops, best of 3: 0.296 usec per loop $ python -m timeit -s 'x = set(range(10000))' 'x.update(y for y in [])' 1000000 loops, best of 3: 0.837 usec per loop $ python -m timeit -s 'x = set(range(10000))' 'x.update([y for y in []])' 1000000 loops, best of 3: 0.462 usec per loop With 2.5.1 (on a different machine) $ python -m timeit -s 'x = set(range(10000))' 'x.update([])' 1000000 loops, best of 3: 0.265 usec per loop $ python -m timeit -s 'x = set(range(10000))' 'x.update(y for y in [])' 1000000 loops, best of 3: 0.717 usec per loop $ python -m timeit -s 'x = set(range(10000))' 'x.update([y for y in []])' 1000000 loops, best of 3: 0.39 usec per loop So generally, it is about 2x faster to create the empty list expression and pass it in than to use an empty generator. |
|||
msg65704 - (view) | Author: Alexander Belopolsky (belopolsky) * ![]() |
Date: 2008-04-23 20:19 | |
This has nothing to do with set.update, the difference is due to the time to setup the generator: $ python -m timeit -s 'x = set(range(10000)); y = []' 'x.update(y)' 1000000 loops, best of 3: 0.38 usec per loop $ python -m timeit -s 'x = set(range(10000)); y = (i for i in [])' 'x.update(y)' 1000000 loops, best of 3: 0.335 usec per loop |
|||
msg65705 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2008-04-23 20:38 | |
I concur. The source code for set_update() in Objects/setobject.c shows that both versions are accessed through the iterator protocol, so what you're seeing are the basic performance differences (start-up and per-item costs) for genexps vs list iterators. |
|||
msg65737 - (view) | Author: John Arbash Meinel (jameinel) | Date: 2008-04-24 18:23 | |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Alexander Belopolsky wrote: > Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment: > > This has nothing to do with set.update, the difference is due to the > time to setup the generator: > > $ python -m timeit -s 'x = set(range(10000)); y = []' 'x.update(y)' > 1000000 loops, best of 3: 0.38 usec per loop > $ python -m timeit -s 'x = set(range(10000)); y = (i for i in [])' > 'x.update(y)' > 1000000 loops, best of 3: 0.335 usec per loop > > ---------- > nosy: +belopolsky > > __________________________________ > Tracker <report@bugs.python.org> > <http://bugs.python.org/issue2672> > __________________________________ > That is true, though if I just force a generator overhead: % python -m timeit -s 'x = set(range(10000)); y = []' 'x.update(y)' 1000000 loops, best of 3: 0.204 usec per loop % python -m timeit -s 'x = set(range(10000)); y = (i for i in [])' 'x.update(y)' 10000000 loops, best of 3: 0.173 usec per loop % python -m timeit -s 'x = set(range(10000)); l = []' 'x.update(i for i in l)' 1000000 loops, best of 3: 0.662 usec per loop python -m timeit -s 'x = set(range(10000)); l = []; y = (i for i in l)' '(i for i in l); x.update(y)' 1000000 loops, best of 3: 1.87 usec per loop So if you compare consuming a generator multiple times to creating it each time, it is 0.662 usec - 0.173 usec = 0.489 usec to create a generator. So why does: "(i for i in l); x.update(y)" take an additional 1.208 usec. (I'm certainly willing to believe that set.update() is generator/list agnostic, but something weird is still happening.) John =:-> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIENAoJdeBCYSNAAMRAk2yAJ4okAalR6zWD0/E5XHei/ckce+L7QCgstEQ l+6+bl7oAJMhdJ70viqicnQ= =pLX6 -----END PGP SIGNATURE----- |
|||
msg65742 - (view) | Author: Alexander Belopolsky (belopolsky) * ![]() |
Date: 2008-04-24 19:29 | |
On Thu, Apr 24, 2008 at 2:23 PM, John Arbash Meinel <report@bugs.python.org> wrote: .. > So if you compare consuming a generator multiple times to creating it > each time, it is 0.662 usec - 0.173 usec = 0.489 usec to create a generator. > > So why does: "(i for i in l); x.update(y)" take an additional 1.208 usec. > > (I'm certainly willing to believe that set.update() is generator/list > agnostic, but something weird is still happening.) I've seen a similar strangeness in timings: $ python -m timeit '(i for i in [])' 100000 loops, best of 3: 4.16 usec per loop but $ python -m timeit -s 'x = set()' 'x.update(i for i in [])' 1000000 loops, best of 3: 1.31 usec per loop on the other hand, $ python -m timeit -s 'x = []' 'x.extend(i for i in [])' 100000 loops, best of 3: 4.54 usec per loop How can x.update(i for i in []) take *less* time than simply creating a genexp? Note that there is no apparent bytecode tricks here: 1 0 LOAD_CONST 0 (<code object <genexpr> at 0xf7e88920, file "<stdin>", line 1>) 3 MAKE_FUNCTION 0 6 BUILD_LIST 0 9 GET_ITER 10 CALL_FUNCTION 1 13 RETURN_VALUE >>> dis(lambda:x.update(i for i in [])) 1 0 LOAD_GLOBAL 0 (x) 3 LOAD_ATTR 1 (update) 6 LOAD_CONST 0 (<code object <genexpr> at 0xf7e88920, file "<stdin>", line 1>) 9 MAKE_FUNCTION 0 12 BUILD_LIST 0 15 GET_ITER 16 CALL_FUNCTION 1 19 CALL_FUNCTION 1 22 RETURN_VALUE |
|||
msg65752 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2008-04-24 21:24 | |
John, when y=[], the update method has to create a new list iterator on each invocation. But when y is a genexp, it is self-iterable (iow, iter (y) will return self, not a new object). Also, when doing timings, it can be helpful to factor-out the attribute lookup: python -m timeit -s 'x=set(range(10000)); y=[]; xu=x.update' 'xu(y)' |
|||
msg65753 - (view) | Author: John Arbash Meinel (jameinel) | Date: 2008-04-24 21:39 | |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Raymond Hettinger wrote: > Raymond Hettinger <rhettinger@users.sourceforge.net> added the comment: > > John, when y=[], the update method has to create a new list iterator on > each invocation. But when y is a genexp, it is self-iterable (iow, iter > (y) will return self, not a new object). > > Also, when doing timings, it can be helpful to factor-out the attribute > lookup: > > python -m timeit -s 'x=set(range(10000)); y=[]; xu=x.update' 'xu(y)' > > __________________________________ > Tracker <report@bugs.python.org> > <http://bugs.python.org/issue2672> > __________________________________ > Sure, I wasn't surprised at the "set.update(y)" versus "set.update([])" What I was surprised at is the time for: "(i for i in [])" being about 4x longer than "set.update(i for i in [])" Anyway, the original issue is probably closed, whether we want to track into the generator stuff or not is probably a different issue. John =:-> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIEP4EJdeBCYSNAAMRAq+MAKC6tLjEtIBX7YgLNoYEfqjRKB4DzACglXjh cEVLEP5Hu3vpeVgVYdTbAVc= =94ja -----END PGP SIGNATURE----- |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:33 | admin | set | github: 46924 |
2008-04-24 21:39:35 | jameinel | set | messages: + msg65753 |
2008-04-24 21:24:48 | rhettinger | set | messages: + msg65752 |
2008-04-24 19:29:34 | belopolsky | set | messages: + msg65742 |
2008-04-24 18:23:47 | jameinel | set | messages:
+ msg65737 title: speed of set.update([]) -> speed of set.update( |
2008-04-23 20:38:27 | rhettinger | set | status: open -> closed resolution: not a bug messages: + msg65705 |
2008-04-23 20:19:54 | belopolsky | set | nosy:
+ belopolsky messages: + msg65704 |
2008-04-23 03:59:35 | rhettinger | set | assignee: rhettinger nosy: + rhettinger |
2008-04-23 02:44:55 | jameinel | set | type: performance |
2008-04-23 02:44:42 | jameinel | create |