Title: Itertools docs propose a harmful “speedup” without any explanation
Type: performance Stage: resolved
Components: Documentation Versions: Python 3.7, Python 3.6, Python 3.5, Python 3.3, Python 3.4, Python 2.7
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Kwpolska, docs@python, lemburg, leovp, rhettinger, serhiy.storchaka, steven.daprano, vstinner
Priority: normal Keywords:

Created on 2017-03-05 16:32 by Kwpolska, last changed 2017-03-05 22:01 by vstinner. This issue is now closed.

Messages (9)
msg289020 - (view) Author: Chris Warrick (Kwpolska) Date: 2017-03-05 16:32
The itertools recipes list [0] ends with the following dubious advice:

> Note, many of the above recipes can be optimized by replacing global lookups with local variables defined as default values. For example, the dotproduct recipe can be written as:
> def dotproduct(vec1, vec2, sum=sum, map=map, mul=operator.mul):
>     return sum(map(mul, vec1, vec2))

This is presented in the document without any explanation. It may confuse beginners into always doing it in their code (as evidenced in #python today), leading to unreadable code and function signatures. There is also no proof of there being a significant speed difference by using this “trick”. In my opinion, this should not be part of the documentation, or should provide proof that it can provide a real, noticeable speedup and is not premature optimization.

(Added in [1] by Raymond Hettinger — added to nosy list)

msg289028 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2017-03-05 17:32
On my computer, running Python 3.5 and continuing to do other tasks while the tests are running, I get a reproducible 5% speedup by using the "default values" trick. Here's my code:

import operator
def dotproduct(vec1, vec2):
    return sum(map(operator.mul, vec1, vec2))

def dotproduct2(vec1, vec2, sum=sum, map=map, mul=operator.mul):
    return sum(map(mul, vec1, vec2))

setup = 'from __main__ import a, b, dotproduct, dotproduct2'
from random import random as r
a = [[r(), r(), r()] for i in range(10000)]
b = [[r(), r(), r()] for i in range(10000)]
t1 = Timer('for v1, v2 in zip(a, b): dotproduct(v1, v2)', setup)
t2 = Timer('for v1, v2 in zip(a, b): dotproduct2(v1, v2)', setup)

I then ran and compared 

min(t1.repeat(number=200, repeat=10))
min(t2.repeat(number=200, repeat=10))

a few times while reading email and doing local editing of files. Normal desktop activity. Each time, t2 (the dotproduct with the micro-optimizations) was about 5% faster.

Victor will probably tell me I'm micro-benchmarking this the wrong way, so to satisfy him I did one more run:

py> import statistics
py> d1 = t1.repeat(number=200, repeat=10)
py> d2 = t2.repeat(number=200, repeat=10)
py> statistics.mean(d1); statistics.stdev(d1)
py> statistics.mean(d2); statistics.stdev(d2)

So I'm satisfied that this trick gives a real, if small, speed up for at least the example given. YMMV.
msg289029 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-05 17:45
How it comparing with

    return vec1[0]*vec2[0]+vec1[1]*vec2[1]+vec1[2]*vec2[2]


    x1, y1, z1 = vec1
    x2, y2, z2 = vec2
    return x1*x2+y1*y2+z1*z2


    x1, y1, z1 = vec1
    x2, y2, z2 = vec2
    return sum(a*b for a, b in zip(vec1, vec2))


A 5% speed up may be not enough high for cluttering the educational example.
msg289030 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2017-03-05 17:47
The localization using keyword parameters is a very old trick to avoid global lookups. It does give a noticeable speedup, esp. when the localized variables are used in tight loops or the function itself is used in such loops.

The 5% speedup Steven measured matches my experience with this trick as well. In some cases, it can provide a more dramatic speedup, but this depends a lot on how the code is written.
msg289031 - (view) Author: Leonid R. (leovp) Date: 2017-03-05 17:56
I think this is a "last resort" optimization and shouldn't be encouraged, especially in the official documentation. There are plenty of real optimizations you can do before doing micro-optimizations like this, and there is no reason it should be mentioned in the docs.
msg289034 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-05 18:04
I have got 15% speed up.

But two of my variants (the first and the second) speed up the calculation by 2.8 and 3.6 times correspondingly. It doesn't make sense to talk about microoptimization of the example if rewriting it without using iterator tools can get much larger speed up.

The example still is good for the case when use vectors of large dimensions. But in that case the effect of the microoptimization is dwarfed by the time of iteration and calculation.
msg289038 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-03-05 18:32
The reason that this comment was added was to note the recipes themselves are unoptimized and to provide some guideance on how that could be done.  The recipes themselves are provided in the cleaner looking form.

FWIW, localizing variables is one of the few code optimizations available to python programmers.  It is not uncommon in the standard library eventhough there are some people who just don't like it (including apparently the OP).

Sorry Chris, I'm going to reject this.  The note is correct, the optimization is valid, and so is the original purpose for adding it.
msg289042 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-03-05 19:09
Additional note:  For folks interested in performance, it is common for people not to realize that function call overhead can dominate their timings (see this recent SO question as a recent, but typical example ).

In any case, the note needs to remain because people commonly move these recipes into production code and need to be warned that basic optimizations were skipped.  This is important in a module where improved performance is one of the primary goals.
msg289047 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-05 22:01
> Victor will probably tell me I'm micro-benchmarking this the wrong way,
so to satisfy him I did one more run:

You are doing it the wrong way :-D Please replace Timeit(...) with
runner=perf.Runner() and then runner.timeit('bench1', ...).

Runner spaws multiples processes and then test if a comparison is
significant (perf compare_to a.json b.json).

Moreover, for microbenchmaks, I highly recommand to tune your system for
benchmarks: python3 -m perf system tune, and see also perf doc ;-)

5% is not significant on a microbenchmark, it can be pure noise.
Date User Action Args
2017-03-05 22:01:01vstinnersetmessages: + msg289047
2017-03-05 19:09:12rhettingersetmessages: + msg289042
2017-03-05 18:32:37rhettingersetstatus: open -> closed
resolution: not a bug
messages: + msg289038

stage: resolved
2017-03-05 18:26:44rhettingersetassignee: docs@python -> rhettinger
2017-03-05 18:04:26serhiy.storchakasetmessages: + msg289034
2017-03-05 17:56:02leovpsetnosy: + leovp
messages: + msg289031
2017-03-05 17:47:35lemburgsetnosy: + lemburg
messages: + msg289030
2017-03-05 17:45:07serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg289029
2017-03-05 17:32:38steven.dapranosetnosy: + vstinner
type: performance
messages: + msg289028
2017-03-05 16:57:56steven.dapranosetnosy: + steven.daprano
2017-03-05 16:32:04Kwpolskacreate