msg223185 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2014-07-16 09:06 |
Currently timeit has significant iterating overhead when tests fast statements. Such overhead makes hard to measure effects of microoptimizations. To decrease overhead and get more precise results we should repeat tested statement many times:
$ ./python -m timeit -s "x=10" "x+x"
1000000 loops, best of 3: 0.2 usec per loop
$ ./python -m timeit -s "x=10" "x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x; x+x"
100000 loops, best of 3: 14.6 usec per loop
Proposed patch makes it automatically for user. It unrolls and vectorize the loop, and decreases iterating overhead 1000 times:
$ ./python -m timeit -s "x=10" "x+x"
10000000 loops, best of 3: 0.141 usec per loop
An user gets precision value without explicit cumbersome repeating.
|
msg223187 - (view) |
Author: Steven D'Aprano (steven.daprano) *  |
Date: 2014-07-16 11:06 |
Looks good, but I think it is better to have an "unroll" option rather than do it automatically. I'm okay with the default being to unroll, but sometimes I want to compare the speed between different versions of Python, and having unroll=False to ensure the same behaviour between versions would be good.
|
msg223190 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2014-07-16 11:59 |
Indeed, what's good for CPython may be quite annoying for e.g. a JIT-enabled Python implementation. I wonder what the PyPy developers think about this.
|
msg223212 - (view) |
Author: Alex Gaynor (alex) *  |
Date: 2014-07-16 14:39 |
I think this is likely to make timeit less representative of how code actually performs in the real world on systems with a JIT. This is because of the cost of sequential operations is not strictly "additive" on PyPy.
If you have statements `a` and `b`, and you run `a; b` on PyPy, the performance of `a; b` is usually faster than the sum of `a`, `b`, assuming they are not 100% independent.
This is because the JIT will be able to remove type checks that were already performed. Since this just repeats the same statement, the cost of the unrolled iterations beyond the first will be massively lower in many cases, producing confusing results.
|
msg223213 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2014-07-16 14:40 |
Thanks, then I guess I'm -1 on the patch.
|
msg223215 - (view) |
Author: Armin Rigo (arigo) *  |
Date: 2014-07-16 14:47 |
The opposite argument might be relevant too: in some cases, a tracing JIT compiler seeing a long block of code might perform artificially worse. If each repeated line creates a branching path with two outcomes of roughly equal likeliness, then if the line is repeated 20 times, the JIT will need to compile 2**20 different paths before it has fully warmed up. In practice, it will never fully warm up and will run with the constant huge overhead of finding and compiling more paths.
|
msg223216 - (view) |
Author: Armin Rigo (arigo) *  |
Date: 2014-07-16 14:49 |
...but I don't think PyPy should be by itself a good enough reason to reject this patch. It would be fine if timeit detects which interpreter it runs on, and only tries to unroll on CPython, for example.
|
msg223231 - (view) |
Author: Steven D'Aprano (steven.daprano) *  |
Date: 2014-07-16 16:43 |
On Wed, Jul 16, 2014 at 02:49:31PM +0000, Armin Rigo wrote:
> ...but I don't think PyPy should be by itself a good enough reason to
> reject this patch. It would be fine if timeit detects which
> interpreter it runs on, and only tries to unroll on CPython, for
> example.
I would *much* rather a parameter to timeit which controls whether or
not to unroll, rather than timeit trying to guess whether you want it to
unroll or not. PyPy can default to off, CPython to on, and other
implementations can choose whichever default makes sense for them.
|
msg223240 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2014-07-16 17:43 |
Le 16/07/2014 12:43, Steven D'Aprano a écrit :
>
> I would *much* rather a parameter to timeit which controls whether or
> not to unroll, rather than timeit trying to guess whether you want it to
> unroll or not. PyPy can default to off, CPython to on, and other
> implementations can choose whichever default makes sense for them.
I think it is overkill. Apart from rather silly microbenchmarks, there
isn't much point in adding the loop unrolling facility. In real world,
even cheap operations such as "x = x + 1" will be surrounded by less
cheap operations, so if an improvement cannot yield tangible benefits
inside a simple for loop, then it doesn't deserve to be committed.
|
msg223311 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2014-07-17 03:10 |
I'm afraid "microoptimizations" aren't worth measuring to begin with, since, well, they're "micro" ;-) Seriously, switch compilers, compilation flags, or move to a new release of a single compiler, and a micro-optimization often turns into a micro-pessimization. If I _want_ to measure something unrolled, I'll unroll it myself. But since I almost never want to measure something unrolled anyway, an option to do so would just be "yet another attractive nuisance" to me.
|
msg223312 - (view) |
Author: Guido van Rossum (Guido.van.Rossum) |
Date: 2014-07-17 03:24 |
I don't see the value in this complication. Please close as won't fix.
|
msg223313 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2014-07-17 03:30 |
(Had to switch identities to close it.)
|
msg223433 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2014-07-18 20:57 |
Guido: I've added developer privs to your Guido.van.Rossum account.
|
msg223666 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2014-07-22 12:10 |
OK. In any case I don't like this patch, it breaks simplicity and elegance of current code.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:06 | admin | set | github: 66187 |
2014-07-22 12:10:22 | serhiy.storchaka | set | messages:
+ msg223666 |
2014-07-18 20:57:19 | r.david.murray | set | nosy:
+ r.david.murray messages:
+ msg223433
|
2014-07-17 03:41:58 | ezio.melotti | set | stage: patch review -> resolved |
2014-07-17 03:30:19 | gvanrossum | set | status: open -> closed resolution: wont fix messages:
+ msg223313
|
2014-07-17 03:24:09 | Guido.van.Rossum | set | nosy:
+ Guido.van.Rossum messages:
+ msg223312
|
2014-07-17 03:10:49 | tim.peters | set | messages:
+ msg223311 |
2014-07-17 03:01:17 | rhettinger | set | nosy:
+ gvanrossum, tim.peters, rhettinger
|
2014-07-16 17:43:38 | pitrou | set | messages:
+ msg223240 |
2014-07-16 16:43:00 | steven.daprano | set | messages:
+ msg223231 |
2014-07-16 14:49:31 | arigo | set | messages:
+ msg223216 |
2014-07-16 14:47:25 | arigo | set | nosy:
+ arigo messages:
+ msg223215
|
2014-07-16 14:40:24 | pitrou | set | messages:
+ msg223213 |
2014-07-16 14:39:31 | alex | set | messages:
+ msg223212 |
2014-07-16 11:59:50 | pitrou | set | nosy:
+ alex, pitrou messages:
+ msg223190
|
2014-07-16 11:06:35 | steven.daprano | set | nosy:
+ steven.daprano messages:
+ msg223187
|
2014-07-16 09:31:18 | serhiy.storchaka | set | title: Decrease iterating overhead it timeit -> Decrease iterating overhead in timeit |
2014-07-16 09:06:24 | serhiy.storchaka | create | |