Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed of set.update( #46924

Closed
jameinel mannequin opened this issue Apr 23, 2008 · 7 comments
Closed

speed of set.update( #46924

jameinel mannequin opened this issue Apr 23, 2008 · 7 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@jameinel
Copy link
Mannequin

jameinel mannequin commented Apr 23, 2008

BPO 2672
Nosy @rhettinger, @abalkin

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/rhettinger'
closed_at = <Date 2008-04-23.20:38:27.084>
created_at = <Date 2008-04-23.02:44:42.915>
labels = ['interpreter-core', 'invalid', 'performance']
title = 'speed of set.update('
updated_at = <Date 2008-04-24.21:39:35.411>
user = 'https://bugs.python.org/jameinel'

bugs.python.org fields:

activity = <Date 2008-04-24.21:39:35.411>
actor = 'jameinel'
assignee = 'rhettinger'
closed = True
closed_date = <Date 2008-04-23.20:38:27.084>
closer = 'rhettinger'
components = ['Interpreter Core']
creation = <Date 2008-04-23.02:44:42.915>
creator = 'jameinel'
dependencies = []
files = []
hgrepos = []
issue_num = 2672
keywords = []
message_count = 7.0
messages = ['65694', '65704', '65705', '65737', '65742', '65752', '65753']
nosy_count = 3.0
nosy_names = ['rhettinger', 'belopolsky', 'jameinel']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = None
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue2672'
versions = ['Python 2.5', 'Python 2.4']

@jameinel
Copy link
Mannequin Author

jameinel mannequin commented Apr 23, 2008

I was performance profiling some of my own code, and I ran into
something unexpected. Specifically,
set.update(empty_generator_expression) was significantly slower than
set.update(empty_list_expression).

I double checked my findings with timeit:

With python 2.4.3:

$ python -m timeit -s 'x = set(range(10000))' 'x.update([])'
1000000 loops, best of 3: 0.296 usec per loop
$ python -m timeit -s 'x = set(range(10000))' 'x.update(y for y in [])'
1000000 loops, best of 3: 0.837 usec per loop
$ python -m timeit -s 'x = set(range(10000))' 'x.update([y for y in []])'
1000000 loops, best of 3: 0.462 usec per loop

With 2.5.1 (on a different machine)
$ python -m timeit -s 'x = set(range(10000))' 'x.update([])'
1000000 loops, best of 3: 0.265 usec per loop
$ python -m timeit -s 'x = set(range(10000))' 'x.update(y for y in [])'
1000000 loops, best of 3: 0.717 usec per loop
$ python -m timeit -s 'x = set(range(10000))' 'x.update([y for y in []])'
1000000 loops, best of 3: 0.39 usec per loop

So generally, it is about 2x faster to create the empty list expression
and pass it in than to use an empty generator.

@jameinel jameinel mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Apr 23, 2008
@rhettinger rhettinger self-assigned this Apr 23, 2008
@abalkin
Copy link
Member

abalkin commented Apr 23, 2008

This has nothing to do with set.update, the difference is due to the
time to setup the generator:

$ python -m timeit -s 'x = set(range(10000)); y = []' 'x.update(y)'
1000000 loops, best of 3: 0.38 usec per loop
$ python -m timeit -s 'x = set(range(10000)); y = (i for i in [])'
'x.update(y)'
1000000 loops, best of 3: 0.335 usec per loop

@rhettinger
Copy link
Contributor

I concur. The source code for set_update() in Objects/setobject.c
shows that both versions are accessed through the iterator protocol, so
what you're seeing are the basic performance differences (start-up and
per-item costs) for genexps vs list iterators.

@jameinel
Copy link
Mannequin Author

jameinel mannequin commented Apr 24, 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alexander Belopolsky wrote:

Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:

This has nothing to do with set.update, the difference is due to the
time to setup the generator:

$ python -m timeit -s 'x = set(range(10000)); y = []' 'x.update(y)'
1000000 loops, best of 3: 0.38 usec per loop
$ python -m timeit -s 'x = set(range(10000)); y = (i for i in [])'
'x.update(y)'
1000000 loops, best of 3: 0.335 usec per loop

----------
nosy: +belopolsky


Tracker <report@bugs.python.org>
<http://bugs.python.org/issue2672\>


That is true, though if I just force a generator overhead:

% python -m timeit -s 'x = set(range(10000)); y = []' 'x.update(y)'
1000000 loops, best of 3: 0.204 usec per loop
% python -m timeit -s 'x = set(range(10000)); y = (i for i in [])'
'x.update(y)'
10000000 loops, best of 3: 0.173 usec per loop
% python -m timeit -s 'x = set(range(10000)); l = []' 'x.update(i for i
in l)'
1000000 loops, best of 3: 0.662 usec per loop
python -m timeit -s 'x = set(range(10000)); l = []; y = (i for i in l)'
'(i for i in l); x.update(y)'
1000000 loops, best of 3: 1.87 usec per loop

So if you compare consuming a generator multiple times to creating it
each time, it is 0.662 usec - 0.173 usec = 0.489 usec to create a generator.

So why does: "(i for i in l); x.update(y)" take an additional 1.208 usec.

(I'm certainly willing to believe that set.update() is generator/list
agnostic, but something weird is still happening.)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIENAoJdeBCYSNAAMRAk2yAJ4okAalR6zWD0/E5XHei/ckce+L7QCgstEQ
l+6+bl7oAJMhdJ70viqicnQ=
=pLX6
-----END PGP SIGNATURE-----

@jameinel jameinel mannequin changed the title speed of set.update([]) speed of set.update( Apr 24, 2008
@abalkin
Copy link
Member

abalkin commented Apr 24, 2008

On Thu, Apr 24, 2008 at 2:23 PM, John Arbash Meinel
<report@bugs.python.org> wrote:
..

So if you compare consuming a generator multiple times to creating it
each time, it is 0.662 usec - 0.173 usec = 0.489 usec to create a generator.

So why does: "(i for i in l); x.update(y)" take an additional 1.208 usec.

(I'm certainly willing to believe that set.update() is generator/list
agnostic, but something weird is still happening.)

I've seen a similar strangeness in timings:

$ python -m timeit '(i for i in [])'
100000 loops, best of 3: 4.16 usec per loop

but

$ python -m timeit -s 'x = set()' 'x.update(i for i in [])'
1000000 loops, best of 3: 1.31 usec per loop

on the other hand,

$ python -m timeit -s 'x = []' 'x.extend(i for i in [])'
100000 loops, best of 3: 4.54 usec per loop

How can x.update(i for i in []) take *less* time than simply creating a genexp?

Note that there is no apparent bytecode tricks here:

  1           0 LOAD_CONST               0 (<code object <genexpr> at
0xf7e88920, file "<stdin>", line 1>)
              3 MAKE_FUNCTION            0
              6 BUILD_LIST               0
              9 GET_ITER
             10 CALL_FUNCTION            1
             13 RETURN_VALUE
>>> dis(lambda:x.update(i for i in []))
  1           0 LOAD_GLOBAL              0 (x)
              3 LOAD_ATTR                1 (update)
              6 LOAD_CONST               0 (<code object <genexpr> at
0xf7e88920, file "<stdin>", line 1>)
              9 MAKE_FUNCTION            0
             12 BUILD_LIST               0
             15 GET_ITER
             16 CALL_FUNCTION            1
             19 CALL_FUNCTION            1
             22 RETURN_VALUE

@rhettinger
Copy link
Contributor

John, when y=[], the update method has to create a new list iterator on
each invocation. But when y is a genexp, it is self-iterable (iow, iter
(y) will return self, not a new object).

Also, when doing timings, it can be helpful to factor-out the attribute
lookup:

python -m timeit -s 'x=set(range(10000)); y=[]; xu=x.update' 'xu(y)'

@jameinel
Copy link
Mannequin Author

jameinel mannequin commented Apr 24, 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Raymond Hettinger wrote:

Raymond Hettinger <rhettinger@users.sourceforge.net> added the comment:

John, when y=[], the update method has to create a new list iterator on
each invocation. But when y is a genexp, it is self-iterable (iow, iter
(y) will return self, not a new object).

Also, when doing timings, it can be helpful to factor-out the attribute
lookup:

python -m timeit -s 'x=set(range(10000)); y=[]; xu=x.update' 'xu(y)'


Tracker <report@bugs.python.org>
<http://bugs.python.org/issue2672\>


Sure, I wasn't surprised at the "set.update(y)" versus "set.update([])"

What I was surprised at is the time for:

"(i for i in [])" being about 4x longer than
"set.update(i for i in [])"

Anyway, the original issue is probably closed, whether we want to track
into the generator stuff or not is probably a different issue.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIEP4EJdeBCYSNAAMRAq+MAKC6tLjEtIBX7YgLNoYEfqjRKB4DzACglXjh
cEVLEP5Hu3vpeVgVYdTbAVc=
=94ja
-----END PGP SIGNATURE-----

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

2 participants