Message 277148 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	steven.daprano
Recipients	brett.cannon, fijall, ned.deily, pitrou, serhiy.storchaka, steven.daprano, tim.peters, vstinner, yselivanov
Date	2016-09-21.14:33:27
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<20160921143321.GV22471@ando.pearwood.info>
In-reply-to	<1474466930.94.0.352912034658.issue28240@psf.upfronthosting.co.za>

Content
> * Display the average, rather than the minimum, of the timings and > display the standard deviation. It should help a little bit to get > more reproductible results. I'm still not convinced that the average is the right statistic to use here. I cannot comment about Victor's perf project, but for timeit, it seems to me that Tim's original warning that the mean is not useful is correct. Fundamentally, the problem with taking an average is that the timing errors are all one sided. If the unknown "true" or "real" time taken by a piece of code is T, then the random error epsilon is always positive: we're measuring T + ε, not T ± ε. If the errors are evenly divided into positive and negative, then on average the mean() or median() of the measurements will tend to cancel the errors, and you get a good estimate of T. But if the errors are all one-sided, then they don't cancel and you are actually estimating T plus some unknown, average error. In that case, min() is the estimate which is closest to T. Unless you know that average error is tiny compared to T, I don't think the average is very useful. Since these are typically micro-benchmarks, the error is often quite large relative to the unknown T. > * Change the default repeat from 3 to 5 to have a better distribution > of timings. It makes the timeit CLI 66% slower (ex: 1 second instead > of 600 ms). That's the price of stable benchmarks :-) I nearly always run with repeat=5, so I agree with this. > * Don't disable the garbage collector anymore! Disabling the GC is not > fair: real applications use it. But that's just adding noise: you're not timing code snippet, you're timing code snippet plus garbage collector. I disagree with this change, although I would accept it if there was an optional flag to control the gc. > * autorange: start with 1 loop instead of 10 for slow benchmarks like > time.sleep(1) That seems reasonable. > * Display large number of loops as power of 10 for readability, ex: > "10^6" instead of "1000000". Also accept "10^6" syntax for the --num > parameter. Shouldn't we use 106 or 1e6 rather than bitwise XOR? :-) This is aimed at Python programmers. We expect to mean exponentiation, not ^. > * Add support for "ns" unit: nanoseconds (10^-9 second) Seems reasonable.

> * Display the average, rather than the minimum, of the timings *and* 
> display the standard deviation. It should help a little bit to get 
> more reproductible results.

I'm still not convinced that the average is the right statistic to use 
here. I cannot comment about Victor's perf project, but for timeit, it 
seems to me that Tim's original warning that the mean is not useful is 
correct.

Fundamentally, the problem with taking an average is that the timing 
errors are all one sided. If the unknown "true" or "real" time taken by 
a piece of code is T, then the random error epsilon is always positive: 
we're measuring T + ε, not T ± ε.

If the errors are evenly divided into positive and negative, then on 
average the mean() or median() of the measurements will tend to cancel 
the errors, and you get a good estimate of T. But if the errors are all 
one-sided, then they don't cancel and you are actually estimating T plus 
some unknown, average error. In that case, min() is the estimate which 
is closest to T.

Unless you know that average error is tiny compared to T, I don't think 
the average is very useful. Since these are typically micro-benchmarks, 
the error is often quite large relative to the unknown T.

> * Change the default repeat from 3 to 5 to have a better distribution 
> of timings. It makes the timeit CLI 66% slower (ex: 1 second instead 
> of 600 ms). That's the price of stable benchmarks :-)

I nearly always run with repeat=5, so I agree with this.

> * Don't disable the garbage collector anymore! Disabling the GC is not 
> fair: real applications use it.

But that's just adding noise: you're not timing code snippet, you're 
timing code snippet plus garbage collector.

I disagree with this change, although I would accept it if there was an 
optional flag to control the gc.

> * autorange: start with 1 loop instead of 10 for slow benchmarks like 
> time.sleep(1)

That seems reasonable.

> * Display large number of loops as power of 10 for readability, ex: 
> "10^6" instead of "1000000". Also accept "10^6" syntax for the --num 
> parameter.

Shouldn't we use 10**6 or 1e6 rather than bitwise XOR? :-)

This is aimed at Python programmers. We expect ** to mean 
exponentiation, not ^.

> * Add support for "ns" unit: nanoseconds (10^-9 second)

Seems reasonable.

History
Date	User	Action	Args
2016-09-21 14:33:27	steven.daprano	set	recipients: + steven.daprano, tim.peters, brett.cannon, pitrou, vstinner, ned.deily, fijall, serhiy.storchaka, yselivanov
2016-09-21 14:33:27	steven.daprano	link	issue28240 messages
2016-09-21 14:33:27	steven.daprano	create