speed of set.update( #46924

jameinel · 2008-04-23T02:44:43Z

BPO	2672
Nosy	@rhettinger, @abalkin

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/rhettinger'
closed_at = <Date 2008-04-23.20:38:27.084>
created_at = <Date 2008-04-23.02:44:42.915>
labels = ['interpreter-core', 'invalid', 'performance']
title = 'speed of set.update('
updated_at = <Date 2008-04-24.21:39:35.411>
user = 'https://bugs.python.org/jameinel'

bugs.python.org fields:

activity = <Date 2008-04-24.21:39:35.411>
actor = 'jameinel'
assignee = 'rhettinger'
closed = True
closed_date = <Date 2008-04-23.20:38:27.084>
closer = 'rhettinger'
components = ['Interpreter Core']
creation = <Date 2008-04-23.02:44:42.915>
creator = 'jameinel'
dependencies = []
files = []
hgrepos = []
issue_num = 2672
keywords = []
message_count = 7.0
messages = ['65694', '65704', '65705', '65737', '65742', '65752', '65753']
nosy_count = 3.0
nosy_names = ['rhettinger', 'belopolsky', 'jameinel']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = None
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue2672'
versions = ['Python 2.5', 'Python 2.4']

jameinel · 2008-04-23T02:44:42Z

I was performance profiling some of my own code, and I ran into
something unexpected. Specifically,
set.update(empty_generator_expression) was significantly slower than
set.update(empty_list_expression).

I double checked my findings with timeit:

With python 2.4.3:

$ python -m timeit -s 'x = set(range(10000))' 'x.update([])'
1000000 loops, best of 3: 0.296 usec per loop
$ python -m timeit -s 'x = set(range(10000))' 'x.update(y for y in [])'
1000000 loops, best of 3: 0.837 usec per loop
$ python -m timeit -s 'x = set(range(10000))' 'x.update([y for y in []])'
1000000 loops, best of 3: 0.462 usec per loop

With 2.5.1 (on a different machine)
$ python -m timeit -s 'x = set(range(10000))' 'x.update([])'
1000000 loops, best of 3: 0.265 usec per loop
$ python -m timeit -s 'x = set(range(10000))' 'x.update(y for y in [])'
1000000 loops, best of 3: 0.717 usec per loop
$ python -m timeit -s 'x = set(range(10000))' 'x.update([y for y in []])'
1000000 loops, best of 3: 0.39 usec per loop

So generally, it is about 2x faster to create the empty list expression
and pass it in than to use an empty generator.

abalkin · 2008-04-23T20:19:53Z

This has nothing to do with set.update, the difference is due to the
time to setup the generator:

$ python -m timeit -s 'x = set(range(10000)); y = []' 'x.update(y)'
1000000 loops, best of 3: 0.38 usec per loop
$ python -m timeit -s 'x = set(range(10000)); y = (i for i in [])'
'x.update(y)'
1000000 loops, best of 3: 0.335 usec per loop

rhettinger · 2008-04-23T20:38:26Z

I concur. The source code for set_update() in Objects/setobject.c
shows that both versions are accessed through the iterator protocol, so
what you're seeing are the basic performance differences (start-up and
per-item costs) for genexps vs list iterators.

jameinel · 2008-04-24T18:23:47Z

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alexander Belopolsky wrote:

Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:

This has nothing to do with set.update, the difference is due to the
time to setup the generator:

$ python -m timeit -s 'x = set(range(10000)); y = []' 'x.update(y)'
1000000 loops, best of 3: 0.38 usec per loop
$ python -m timeit -s 'x = set(range(10000)); y = (i for i in [])'
'x.update(y)'
1000000 loops, best of 3: 0.335 usec per loop

----------
nosy: +belopolsky

Tracker <report@bugs.python.org>
<http://bugs.python.org/issue2672\>

That is true, though if I just force a generator overhead:

% python -m timeit -s 'x = set(range(10000)); y = []' 'x.update(y)'
1000000 loops, best of 3: 0.204 usec per loop
% python -m timeit -s 'x = set(range(10000)); y = (i for i in [])'
'x.update(y)'
10000000 loops, best of 3: 0.173 usec per loop
% python -m timeit -s 'x = set(range(10000)); l = []' 'x.update(i for i
in l)'
1000000 loops, best of 3: 0.662 usec per loop
python -m timeit -s 'x = set(range(10000)); l = []; y = (i for i in l)'
'(i for i in l); x.update(y)'
1000000 loops, best of 3: 1.87 usec per loop

So if you compare consuming a generator multiple times to creating it
each time, it is 0.662 usec - 0.173 usec = 0.489 usec to create a generator.

So why does: "(i for i in l); x.update(y)" take an additional 1.208 usec.

(I'm certainly willing to believe that set.update() is generator/list
agnostic, but something weird is still happening.)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIENAoJdeBCYSNAAMRAk2yAJ4okAalR6zWD0/E5XHei/ckce+L7QCgstEQ
l+6+bl7oAJMhdJ70viqicnQ=
=pLX6
-----END PGP SIGNATURE-----

abalkin · 2008-04-24T19:29:34Z

On Thu, Apr 24, 2008 at 2:23 PM, John Arbash Meinel
<report@bugs.python.org> wrote:
..

So if you compare consuming a generator multiple times to creating it
each time, it is 0.662 usec - 0.173 usec = 0.489 usec to create a generator.

So why does: "(i for i in l); x.update(y)" take an additional 1.208 usec.

(I'm certainly willing to believe that set.update() is generator/list
agnostic, but something weird is still happening.)

I've seen a similar strangeness in timings:

$ python -m timeit '(i for i in [])'
100000 loops, best of 3: 4.16 usec per loop

but

$ python -m timeit -s 'x = set()' 'x.update(i for i in [])'
1000000 loops, best of 3: 1.31 usec per loop

on the other hand,

$ python -m timeit -s 'x = []' 'x.extend(i for i in [])'
100000 loops, best of 3: 4.54 usec per loop

How can x.update(i for i in []) take *less* time than simply creating a genexp?

Note that there is no apparent bytecode tricks here:

  1           0 LOAD_CONST               0 (<code object <genexpr> at
0xf7e88920, file "<stdin>", line 1>)
              3 MAKE_FUNCTION            0
              6 BUILD_LIST               0
              9 GET_ITER
             10 CALL_FUNCTION            1
             13 RETURN_VALUE
>>> dis(lambda:x.update(i for i in []))
  1           0 LOAD_GLOBAL              0 (x)
              3 LOAD_ATTR                1 (update)
              6 LOAD_CONST               0 (<code object <genexpr> at
0xf7e88920, file "<stdin>", line 1>)
              9 MAKE_FUNCTION            0
             12 BUILD_LIST               0
             15 GET_ITER
             16 CALL_FUNCTION            1
             19 CALL_FUNCTION            1
             22 RETURN_VALUE

rhettinger · 2008-04-24T21:24:48Z

John, when y=[], the update method has to create a new list iterator on
each invocation. But when y is a genexp, it is self-iterable (iow, iter
(y) will return self, not a new object).

Also, when doing timings, it can be helpful to factor-out the attribute
lookup:

python -m timeit -s 'x=set(range(10000)); y=[]; xu=x.update' 'xu(y)'

jameinel · 2008-04-24T21:39:35Z

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Raymond Hettinger wrote:

Raymond Hettinger <rhettinger@users.sourceforge.net> added the comment:

John, when y=[], the update method has to create a new list iterator on
each invocation. But when y is a genexp, it is self-iterable (iow, iter
(y) will return self, not a new object).

Also, when doing timings, it can be helpful to factor-out the attribute
lookup:

python -m timeit -s 'x=set(range(10000)); y=[]; xu=x.update' 'xu(y)'

Tracker <report@bugs.python.org>
<http://bugs.python.org/issue2672\>

Sure, I wasn't surprised at the "set.update(y)" versus "set.update([])"

What I was surprised at is the time for:

"(i for i in [])" being about 4x longer than
"set.update(i for i in [])"

Anyway, the original issue is probably closed, whether we want to track
into the generator stuff or not is probably a different issue.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIEP4EJdeBCYSNAAMRAq+MAKC6tLjEtIBX7YgLNoYEfqjRKB4DzACglXjh
cEVLEP5Hu3vpeVgVYdTbAVc=
=94ja
-----END PGP SIGNATURE-----

jameinel mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Apr 23, 2008

rhettinger self-assigned this Apr 23, 2008

rhettinger closed this as completed Apr 23, 2008

rhettinger added the invalid label Apr 23, 2008

jameinel mannequin changed the title ~~speed of set.update([])~~ speed of set.update( Apr 24, 2008

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed of set.update( #46924

speed of set.update( #46924

jameinel mannequin commented Apr 23, 2008

jameinel mannequin commented Apr 23, 2008

abalkin commented Apr 23, 2008

rhettinger commented Apr 23, 2008

jameinel mannequin commented Apr 24, 2008

abalkin commented Apr 24, 2008

rhettinger commented Apr 24, 2008

jameinel mannequin commented Apr 24, 2008

speed of set.update( #46924

speed of set.update( #46924

Comments

jameinel mannequin commented Apr 23, 2008

jameinel mannequin commented Apr 23, 2008

abalkin commented Apr 23, 2008

rhettinger commented Apr 23, 2008

jameinel mannequin commented Apr 24, 2008

abalkin commented Apr 24, 2008

rhettinger commented Apr 24, 2008

jameinel mannequin commented Apr 24, 2008