classification
Title: Decrease iterating overhead in timeit
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: Guido.van.Rossum, alex, arigo, georg.brandl, gvanrossum, pitrou, r.david.murray, rhettinger, serhiy.storchaka, steven.daprano, tim.peters, vstinner
Priority: normal Keywords: patch

Created on 2014-07-16 09:06 by serhiy.storchaka, last changed 2014-07-22 12:10 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
timeit_unroll_loops.patch serhiy.storchaka, 2014-07-16 09:06 review
Messages (14)
msg223185 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-07-16 09:06
Currently timeit has significant iterating overhead when tests fast statements. Such overhead makes hard to measure effects of microoptimizations. To decrease overhead and get more precise results we should repeat tested statement many times:

$ ./python -m timeit -s "x=10"  "x+x"
1000000 loops, best of 3: 0.2 usec per loop
$ ./python -m timeit -s "x=10"  "x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x"
100000 loops, best of 3: 14.6 usec per loop

Proposed patch makes it automatically for user. It unrolls and vectorize the loop, and decreases iterating overhead 1000 times:

$ ./python -m timeit -s "x=10"  "x+x"
10000000 loops, best of 3: 0.141 usec per loop

An user gets precision value without explicit cumbersome repeating.
msg223187 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2014-07-16 11:06
Looks good, but I think it is better to have an "unroll" option rather than do it automatically. I'm okay with the default being to unroll, but sometimes I want to compare the speed between different versions of Python, and having unroll=False to ensure the same behaviour between versions would be good.
msg223190 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-07-16 11:59
Indeed, what's good for CPython may be quite annoying for e.g. a JIT-enabled Python implementation. I wonder what the PyPy developers think about this.
msg223212 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2014-07-16 14:39
I think this is likely to make timeit less representative of how code actually performs in the real world on systems with a JIT. This is because of the cost of sequential operations is not strictly "additive" on PyPy.

If you have statements `a` and `b`, and you run `a; b` on PyPy, the performance of `a; b` is usually faster than the sum of `a`, `b`, assuming they are not 100% independent.

This is because the JIT will be able to remove type checks that were already performed. Since this just repeats the same statement, the cost of the unrolled iterations beyond the first will be massively lower in many cases, producing confusing results.
msg223213 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-07-16 14:40
Thanks, then I guess I'm -1 on the patch.
msg223215 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2014-07-16 14:47
The opposite argument might be relevant too: in some cases, a tracing JIT compiler seeing a long block of code might perform artificially worse.  If each repeated line creates a branching path with two outcomes of roughly equal likeliness, then if the line is repeated 20 times, the JIT will need to compile 2**20 different paths before it has fully warmed up.  In practice, it will never fully warm up and will run with the constant huge overhead of finding and compiling more paths.
msg223216 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2014-07-16 14:49
...but I don't think PyPy should be by itself a good enough reason to reject this patch.  It would be fine if timeit detects which interpreter it runs on, and only tries to unroll on CPython, for example.
msg223231 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2014-07-16 16:43
On Wed, Jul 16, 2014 at 02:49:31PM +0000, Armin Rigo wrote:
> ...but I don't think PyPy should be by itself a good enough reason to 
> reject this patch.  It would be fine if timeit detects which 
> interpreter it runs on, and only tries to unroll on CPython, for 
> example.

I would *much* rather a parameter to timeit which controls whether or 
not to unroll, rather than timeit trying to guess whether you want it to 
unroll or not. PyPy can default to off, CPython to on, and other 
implementations can choose whichever default makes sense for them.
msg223240 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-07-16 17:43
Le 16/07/2014 12:43, Steven D'Aprano a écrit :
>
> I would *much* rather a parameter to timeit which controls whether or
> not to unroll, rather than timeit trying to guess whether you want it to
> unroll or not. PyPy can default to off, CPython to on, and other
> implementations can choose whichever default makes sense for them.

I think it is overkill. Apart from rather silly microbenchmarks, there 
isn't much point in adding the loop unrolling facility. In real world, 
even cheap operations such as "x = x + 1" will be surrounded by less 
cheap operations, so if an improvement cannot yield tangible benefits 
inside a simple for loop, then it doesn't deserve to be committed.
msg223311 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2014-07-17 03:10
I'm afraid "microoptimizations" aren't worth measuring to begin with, since, well, they're "micro" ;-)  Seriously, switch compilers, compilation flags, or move to a new release of a single compiler, and a micro-optimization often turns into a micro-pessimization.  If I _want_ to measure something unrolled, I'll unroll it myself.  But since I almost never want to measure something unrolled anyway, an option to do so would just be "yet another attractive nuisance" to me.
msg223312 - (view) Author: Guido van Rossum (Guido.van.Rossum) Date: 2014-07-17 03:24
I don't see the value in this complication. Please close as won't fix.
msg223313 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-07-17 03:30
(Had to switch identities to close it.)
msg223433 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-18 20:57
Guido: I've added developer privs to your Guido.van.Rossum account.
msg223666 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-07-22 12:10
OK. In any case I don't like this patch, it breaks simplicity and elegance of current code.
History
Date User Action Args
2014-07-22 12:10:22serhiy.storchakasetmessages: + msg223666
2014-07-18 20:57:19r.david.murraysetnosy: + r.david.murray
messages: + msg223433
2014-07-17 03:41:58ezio.melottisetstage: patch review -> resolved
2014-07-17 03:30:19gvanrossumsetstatus: open -> closed
resolution: wont fix
messages: + msg223313
2014-07-17 03:24:09Guido.van.Rossumsetnosy: + Guido.van.Rossum
messages: + msg223312
2014-07-17 03:10:49tim.peterssetmessages: + msg223311
2014-07-17 03:01:17rhettingersetnosy: + gvanrossum, tim.peters, rhettinger
2014-07-16 17:43:38pitrousetmessages: + msg223240
2014-07-16 16:43:00steven.dapranosetmessages: + msg223231
2014-07-16 14:49:31arigosetmessages: + msg223216
2014-07-16 14:47:25arigosetnosy: + arigo
messages: + msg223215
2014-07-16 14:40:24pitrousetmessages: + msg223213
2014-07-16 14:39:31alexsetmessages: + msg223212
2014-07-16 11:59:50pitrousetnosy: + alex, pitrou
messages: + msg223190
2014-07-16 11:06:35steven.dapranosetnosy: + steven.daprano
messages: + msg223187
2014-07-16 09:31:18serhiy.storchakasettitle: Decrease iterating overhead it timeit -> Decrease iterating overhead in timeit
2014-07-16 09:06:24serhiy.storchakacreate