Issue 29724: Itertools docs propose a harmful “speedup” without any explanation

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/73910

classification

Title:	Itertools docs propose a harmful “speedup” without any explanation
Type:	performance	Stage:	resolved
Components:	Documentation	Versions:	Python 3.7, Python 3.6, Python 3.3, Python 3.4, Python 3.5, Python 2.7

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:	rhettinger	Nosy List:	Kwpolska, docs@python, lemburg, leovp, rhettinger, serhiy.storchaka, steven.daprano, vstinner
Priority:	normal	Keywords:

Created on 2017-03-05 16:32 by Kwpolska, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (9)
msg289020 - (view)	Author: Chris Warrick (Kwpolska)	Date: 2017-03-05 16:32
The itertools recipes list [0] ends with the following dubious advice: > Note, many of the above recipes can be optimized by replacing global lookups with local variables defined as default values. For example, the dotproduct recipe can be written as: > > def dotproduct(vec1, vec2, sum=sum, map=map, mul=operator.mul): > return sum(map(mul, vec1, vec2)) This is presented in the document without any explanation. It may confuse beginners into always doing it in their code (as evidenced in #python today), leading to unreadable code and function signatures. There is also no proof of there being a significant speed difference by using this “trick”. In my opinion, this should not be part of the documentation, or should provide proof that it can provide a real, noticeable speedup and is not premature optimization. (Added in [1] by Raymond Hettinger — added to nosy list) [0]: https://docs.python.org/3/library/itertools.html#itertools-recipes [1]: https://github.com/python/cpython/commit/fc91aa28fd8dad5280fd4d3a4747b5e08ee37ac0
msg289028 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2017-03-05 17:32
On my computer, running Python 3.5 and continuing to do other tasks while the tests are running, I get a reproducible 5% speedup by using the "default values" trick. Here's my code: import operator def dotproduct(vec1, vec2): return sum(map(operator.mul, vec1, vec2)) def dotproduct2(vec1, vec2, sum=sum, map=map, mul=operator.mul): return sum(map(mul, vec1, vec2)) setup = 'from __main__ import a, b, dotproduct, dotproduct2' from random import random as r a = [[r(), r(), r()] for i in range(10000)] b = [[r(), r(), r()] for i in range(10000)] t1 = Timer('for v1, v2 in zip(a, b): dotproduct(v1, v2)', setup) t2 = Timer('for v1, v2 in zip(a, b): dotproduct2(v1, v2)', setup) I then ran and compared min(t1.repeat(number=200, repeat=10)) min(t2.repeat(number=200, repeat=10)) a few times while reading email and doing local editing of files. Normal desktop activity. Each time, t2 (the dotproduct with the micro-optimizations) was about 5% faster. Victor will probably tell me I'm micro-benchmarking this the wrong way, so to satisfy him I did one more run: py> import statistics py> d1 = t1.repeat(number=200, repeat=10) py> d2 = t2.repeat(number=200, repeat=10) py> py> statistics.mean(d1); statistics.stdev(d1) 5.277554708393291 0.15216686556059497 py> statistics.mean(d2); statistics.stdev(d2) 4.929395379964262 0.05397586490809523 So I'm satisfied that this trick gives a real, if small, speed up for at least the example given. YMMV.
msg289029 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-03-05 17:45
How it comparing with return vec1[0]vec2[0]+vec1[1]vec2[1]+vec1[2]vec2[2] and x1, y1, z1 = vec1 x2, y2, z2 = vec2 return x1x2+y1y2+z1z2 and x1, y1, z1 = vec1 x2, y2, z2 = vec2 return sum(a*b for a, b in zip(vec1, vec2)) ? A 5% speed up may be not enough high for cluttering the educational example.
msg289030 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2017-03-05 17:47
The localization using keyword parameters is a very old trick to avoid global lookups. It does give a noticeable speedup, esp. when the localized variables are used in tight loops or the function itself is used in such loops. The 5% speedup Steven measured matches my experience with this trick as well. In some cases, it can provide a more dramatic speedup, but this depends a lot on how the code is written.
msg289031 - (view)	Author: Leonid R. (leovp)	Date: 2017-03-05 17:56
I think this is a "last resort" optimization and shouldn't be encouraged, especially in the official documentation. There are plenty of real optimizations you can do before doing micro-optimizations like this, and there is no reason it should be mentioned in the docs.
msg289034 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-03-05 18:04
I have got 15% speed up. But two of my variants (the first and the second) speed up the calculation by 2.8 and 3.6 times correspondingly. It doesn't make sense to talk about microoptimization of the example if rewriting it without using iterator tools can get much larger speed up. The example still is good for the case when use vectors of large dimensions. But in that case the effect of the microoptimization is dwarfed by the time of iteration and calculation.
msg289038 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2017-03-05 18:32
The reason that this comment was added was to note the recipes themselves are unoptimized and to provide some guideance on how that could be done. The recipes themselves are provided in the cleaner looking form. FWIW, localizing variables is one of the few code optimizations available to python programmers. It is not uncommon in the standard library eventhough there are some people who just don't like it (including apparently the OP). Sorry Chris, I'm going to reject this. The note is correct, the optimization is valid, and so is the original purpose for adding it.
msg289042 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2017-03-05 19:09
Additional note: For folks interested in performance, it is common for people not to realize that function call overhead can dominate their timings (see this recent SO question as a recent, but typical example http://stackoverflow.com/questions/41772054 ). In any case, the note needs to remain because people commonly move these recipes into production code and need to be warned that basic optimizations were skipped. This is important in a module where improved performance is one of the primary goals.
msg289047 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-03-05 22:01
> Victor will probably tell me I'm micro-benchmarking this the wrong way, so to satisfy him I did one more run: You are doing it the wrong way :-D Please replace Timeit(...) with runner=perf.Runner() and then runner.timeit('bench1', ...). Runner spaws multiples processes and then test if a comparison is significant (perf compare_to a.json b.json). Moreover, for microbenchmaks, I highly recommand to tune your system for benchmarks: python3 -m perf system tune, and see also perf doc ;-) 5% is not significant on a microbenchmark, it can be pure noise.

History
Date	User	Action	Args
2022-04-11 14:58:43	admin	set	github: 73910
2017-03-05 22:01:01	vstinner	set	messages: + msg289047
2017-03-05 19:09:12	rhettinger	set	messages: + msg289042
2017-03-05 18:32:37	rhettinger	set	status: open -> closed resolution: not a bug messages: + msg289038 stage: resolved
2017-03-05 18:26:44	rhettinger	set	assignee: docs@python -> rhettinger
2017-03-05 18:04:26	serhiy.storchaka	set	messages: + msg289034
2017-03-05 17:56:02	leovp	set	nosy: + leovp messages: + msg289031
2017-03-05 17:47:35	lemburg	set	nosy: + lemburg messages: + msg289030
2017-03-05 17:45:07	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg289029
2017-03-05 17:32:38	steven.daprano	set	nosy: + vstinner type: performance messages: + msg289028
2017-03-05 16:57:56	steven.daprano	set	nosy: + steven.daprano
2017-03-05 16:32:04	Kwpolska	create