This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: speed of set.update(
Type: performance Stage:
Components: Interpreter Core Versions: Python 2.4, Python 2.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: belopolsky, jameinel, rhettinger
Priority: normal Keywords:

Created on 2008-04-23 02:44 by jameinel, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (7)
msg65694 - (view) Author: John Arbash Meinel (jameinel) Date: 2008-04-23 02:44
I was performance profiling some of my own code, and I ran into
something unexpected. Specifically,
set.update(empty_generator_expression) was significantly slower than
set.update(empty_list_expression).

I double checked my findings with timeit:

With python 2.4.3:

$ python -m timeit -s 'x = set(range(10000))' 'x.update([])'
1000000 loops, best of 3: 0.296 usec per loop
$ python -m timeit -s 'x = set(range(10000))' 'x.update(y for y in [])'
1000000 loops, best of 3: 0.837 usec per loop
$ python -m timeit -s 'x = set(range(10000))' 'x.update([y for y in []])'
1000000 loops, best of 3: 0.462 usec per loop

With 2.5.1 (on a different machine)
$ python -m timeit -s 'x = set(range(10000))' 'x.update([])'
1000000 loops, best of 3: 0.265 usec per loop
$ python -m timeit -s 'x = set(range(10000))' 'x.update(y for y in [])'
1000000 loops, best of 3: 0.717 usec per loop
$ python -m timeit -s 'x = set(range(10000))' 'x.update([y for y in []])'
1000000 loops, best of 3: 0.39 usec per loop

So generally, it is about 2x faster to create the empty list expression
and pass it in than to use an empty generator.
msg65704 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-04-23 20:19
This has nothing to do with set.update, the difference is due to the
time to setup the generator:

$ python -m timeit -s 'x = set(range(10000)); y = []' 'x.update(y)'
1000000 loops, best of 3: 0.38 usec per loop
$ python -m timeit -s 'x = set(range(10000)); y = (i for i in [])'
'x.update(y)'
1000000 loops, best of 3: 0.335 usec per loop
msg65705 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-04-23 20:38
I concur.  The source code for set_update() in Objects/setobject.c 
shows that both versions are accessed through the iterator protocol, so 
what you're seeing are the basic performance differences (start-up and 
per-item costs) for genexps vs list iterators.
msg65737 - (view) Author: John Arbash Meinel (jameinel) Date: 2008-04-24 18:23
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alexander Belopolsky wrote:
> Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:
> 
> This has nothing to do with set.update, the difference is due to the
> time to setup the generator:
> 
> $ python -m timeit -s 'x = set(range(10000)); y = []' 'x.update(y)'
> 1000000 loops, best of 3: 0.38 usec per loop
> $ python -m timeit -s 'x = set(range(10000)); y = (i for i in [])'
> 'x.update(y)'
> 1000000 loops, best of 3: 0.335 usec per loop
> 
> ----------
> nosy: +belopolsky
> 
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue2672>
> __________________________________
> 

That is true, though if I just force a generator overhead:

% python -m timeit -s 'x = set(range(10000)); y = []' 'x.update(y)'
1000000 loops, best of 3: 0.204 usec per loop
% python -m timeit -s 'x = set(range(10000)); y = (i for i in [])'
'x.update(y)'
10000000 loops, best of 3: 0.173 usec per loop
% python -m timeit -s 'x = set(range(10000)); l = []' 'x.update(i for i
in l)'
1000000 loops, best of 3: 0.662 usec per loop
 python -m timeit -s 'x = set(range(10000)); l = []; y = (i for i in l)'
'(i for i in l); x.update(y)'
1000000 loops, best of 3: 1.87 usec per loop

So if you compare consuming a generator multiple times to creating it
each time, it is 0.662 usec - 0.173 usec = 0.489 usec to create a generator.

So why does: "(i for i in l); x.update(y)" take an additional 1.208 usec.

(I'm certainly willing to believe that set.update() is generator/list
agnostic, but something weird is still happening.)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIENAoJdeBCYSNAAMRAk2yAJ4okAalR6zWD0/E5XHei/ckce+L7QCgstEQ
l+6+bl7oAJMhdJ70viqicnQ=
=pLX6
-----END PGP SIGNATURE-----
msg65742 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-04-24 19:29
On Thu, Apr 24, 2008 at 2:23 PM, John Arbash Meinel
<report@bugs.python.org> wrote:
..
>  So if you compare consuming a generator multiple times to creating it
>  each time, it is 0.662 usec - 0.173 usec = 0.489 usec to create a generator.
>
>  So why does: "(i for i in l); x.update(y)" take an additional 1.208 usec.
>
>  (I'm certainly willing to believe that set.update() is generator/list
>  agnostic, but something weird is still happening.)

I've seen a similar strangeness in timings:

$ python -m timeit '(i for i in [])'
100000 loops, best of 3: 4.16 usec per loop

but

$ python -m timeit -s 'x = set()' 'x.update(i for i in [])'
1000000 loops, best of 3: 1.31 usec per loop

on the other hand,

$ python -m timeit -s 'x = []' 'x.extend(i for i in [])'
100000 loops, best of 3: 4.54 usec per loop

How can x.update(i for i in []) take *less* time than simply creating a genexp?

Note that there is no apparent bytecode tricks here:

  1           0 LOAD_CONST               0 (<code object <genexpr> at
0xf7e88920, file "<stdin>", line 1>)
              3 MAKE_FUNCTION            0
              6 BUILD_LIST               0
              9 GET_ITER
             10 CALL_FUNCTION            1
             13 RETURN_VALUE
>>> dis(lambda:x.update(i for i in []))
  1           0 LOAD_GLOBAL              0 (x)
              3 LOAD_ATTR                1 (update)
              6 LOAD_CONST               0 (<code object <genexpr> at
0xf7e88920, file "<stdin>", line 1>)
              9 MAKE_FUNCTION            0
             12 BUILD_LIST               0
             15 GET_ITER
             16 CALL_FUNCTION            1
             19 CALL_FUNCTION            1
             22 RETURN_VALUE
msg65752 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-04-24 21:24
John, when y=[], the update method has to create a new list iterator on 
each invocation.  But when y is a genexp, it is self-iterable (iow, iter
(y) will return self, not a new object).

Also, when doing timings, it can be helpful to factor-out the attribute 
lookup:

python -m timeit -s 'x=set(range(10000)); y=[]; xu=x.update' 'xu(y)'
msg65753 - (view) Author: John Arbash Meinel (jameinel) Date: 2008-04-24 21:39
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Raymond Hettinger wrote:
> Raymond Hettinger <rhettinger@users.sourceforge.net> added the comment:
> 
> John, when y=[], the update method has to create a new list iterator on 
> each invocation.  But when y is a genexp, it is self-iterable (iow, iter
> (y) will return self, not a new object).
> 
> Also, when doing timings, it can be helpful to factor-out the attribute 
> lookup:
> 
> python -m timeit -s 'x=set(range(10000)); y=[]; xu=x.update' 'xu(y)'
> 
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue2672>
> __________________________________
> 

Sure, I wasn't surprised at the "set.update(y)" versus "set.update([])"

What I was surprised at is the time for:

"(i for i in [])" being about 4x longer than
"set.update(i for i in [])"

Anyway, the original issue is probably closed, whether we want to track
into the generator stuff or not is probably a different issue.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIEP4EJdeBCYSNAAMRAq+MAKC6tLjEtIBX7YgLNoYEfqjRKB4DzACglXjh
cEVLEP5Hu3vpeVgVYdTbAVc=
=94ja
-----END PGP SIGNATURE-----
History
Date User Action Args
2022-04-11 14:56:33adminsetgithub: 46924
2008-04-24 21:39:35jameinelsetmessages: + msg65753
2008-04-24 21:24:48rhettingersetmessages: + msg65752
2008-04-24 19:29:34belopolskysetmessages: + msg65742
2008-04-24 18:23:47jameinelsetmessages: + msg65737
title: speed of set.update([]) -> speed of set.update(
2008-04-23 20:38:27rhettingersetstatus: open -> closed
resolution: not a bug
messages: + msg65705
2008-04-23 20:19:54belopolskysetnosy: + belopolsky
messages: + msg65704
2008-04-23 03:59:35rhettingersetassignee: rhettinger
nosy: + rhettinger
2008-04-23 02:44:55jameinelsettype: performance
2008-04-23 02:44:42jameinelcreate