This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Enhance the timeit module: display average +- std dev instead of minimum
Type: performance Stage: resolved
Components: Benchmarks Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, jcea, lemburg, ned.deily, pitrou, python-dev, rhettinger, serhiy.storchaka, steven.daprano, tim.peters, vstinner
Priority: normal Keywords: patch

Created on 2016-09-21 14:08 by vstinner, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
timeit.patch vstinner, 2016-09-21 14:08 review
Pull Requests
URL Status Linked Edit
PR 552 closed dstufft, 2017-03-31 16:36
PR 7419 merged vstinner, 2018-06-05 10:43
PR 7457 merged miss-islington, 2018-06-06 15:56
Messages (38)
msg277142 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-21 14:08
Attached patch makes different changes to the timeit module:

* Display the average, rather than the minimum, of the timings *and* display the standard deviation. It should help a little bit to get more reproductible results.
* Change the default repeat from 3 to 5 to have a better distribution of timings. It makes the timeit CLI 66% slower (ex: 1 second instead of 600 ms). That's the price of stable benchmarks :-)
* Don't disable the garbage collector anymore! Disabling the GC is not fair: real applications use it.
* autorange: start with 1 loop instead of 10 for slow benchmarks like time.sleep(1)
* Display large number of loops as power of 10 for readability, ex: "10^6" instead of "1000000". Also accept "10^6" syntax for the --num parameter.
* Add support for "ns" unit: nanoseconds (10^-9 second)

I consider that these changes are well contained enough to still be ok for 3.6 beta 2. But I add Ned Deily as CC to double check ;-)


This patch is related to my work on Python benchmarks, see:

* http://perf.readthedocs.io/en/latest/
* https://github.com/python/performance

The perf module runs benchmarks in multiple child processes to test different memory layouts (Linux uses ASRL by default) and different hash functions. It helps to get more stable benchmark results, but it's probably overkill for the tiny timeit module. By the way, the "pyperf timeit" command reuses the timeit module of the stdlib.


Note: The timeit module still uses the old getopt module which is very strict. For example "python3 -m timeit pass -v" is not recognized ("-v" is read as a statement part of the benchmark, not as --verbose). But I was too lazy to also modify this part, I may do it later ;-)
msg277143 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-21 14:13
The perf module displays the median rather than the mean (arithmeric average). The difference between median and mean is probably too subtle for most users :-/ The term "median" is also probably unknown by most users...

I chose to use average: well known, easy to understand (we learn the formula at school: sum/count).
msg277144 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-21 14:16
> * Display the average, rather than the minimum, of the timings *and* display the standard deviation. It should help a little bit to get more reproductible results.

Rationale:

* http://blog.kevmod.com/2016/06/benchmarking-minimum-vs-average/
* https://haypo.github.io/journey-to-stable-benchmark-average.html
msg277145 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2016-09-21 14:19
> Display the average, rather than the minimum, of the timings *and* display the standard deviation. It should help a little bit to get more reproductible results

That entirely depends on which benchmark you are running.

> Disabling the GC is not fair: real applications use it

timeit was never meant to benchmark real applications, it is a micro-benchmarking tool.
msg277146 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-21 14:20
Maciej Fijalkowski also sent me the following article a few months ago, it also explains indirectly why using the minimum for benchmarks is not reliable:

"Virtual Machine Warmup Blows Hot and Cold"
http://arxiv.org/pdf/1602.00602.pdf

Even if the article is more focused on JIT compilers, it shows that benchmarks are not straightforward but always full of bad surprises.

A benchmark doesn't have a single value but a *distribution*. The best question is how to summarize the full distribution without loosing too much information.

In the perf module I decided to not take a decision: a JSON file stores *all* data :-D But by default, perf displays mean +- std dev.
msg277147 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2016-09-21 14:24
Another point: timeit is often used to compare performance between Python versions. By changing the behaviour of timeit in a given Python version, you'll make it more difficult to compare results.
msg277148 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2016-09-21 14:33
> * Display the average, rather than the minimum, of the timings *and* 
> display the standard deviation. It should help a little bit to get 
> more reproductible results.

I'm still not convinced that the average is the right statistic to use 
here. I cannot comment about Victor's perf project, but for timeit, it 
seems to me that Tim's original warning that the mean is not useful is 
correct.

Fundamentally, the problem with taking an average is that the timing 
errors are all one sided. If the unknown "true" or "real" time taken by 
a piece of code is T, then the random error epsilon is always positive: 
we're measuring T + ε, not T ± ε.

If the errors are evenly divided into positive and negative, then on 
average the mean() or median() of the measurements will tend to cancel 
the errors, and you get a good estimate of T. But if the errors are all 
one-sided, then they don't cancel and you are actually estimating T plus 
some unknown, average error. In that case, min() is the estimate which 
is closest to T.

Unless you know that average error is tiny compared to T, I don't think 
the average is very useful. Since these are typically micro-benchmarks, 
the error is often quite large relative to the unknown T.

> * Change the default repeat from 3 to 5 to have a better distribution 
> of timings. It makes the timeit CLI 66% slower (ex: 1 second instead 
> of 600 ms). That's the price of stable benchmarks :-)

I nearly always run with repeat=5, so I agree with this.

> * Don't disable the garbage collector anymore! Disabling the GC is not 
> fair: real applications use it.

But that's just adding noise: you're not timing code snippet, you're 
timing code snippet plus garbage collector.

I disagree with this change, although I would accept it if there was an 
optional flag to control the gc.

> * autorange: start with 1 loop instead of 10 for slow benchmarks like 
> time.sleep(1)

That seems reasonable.

> * Display large number of loops as power of 10 for readability, ex: 
> "10^6" instead of "1000000". Also accept "10^6" syntax for the --num 
> parameter.

Shouldn't we use 10**6 or 1e6 rather than bitwise XOR? :-)

This is aimed at Python programmers. We expect ** to mean 
exponentiation, not ^.

> * Add support for "ns" unit: nanoseconds (10^-9 second)

Seems reasonable.
msg277149 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-21 14:41
> Another point: timeit is often used to compare performance between Python versions. By changing the behaviour of timeit in a given Python version, you'll make it more difficult to compare results.

Hum, that's a good argument against my change :-)

So to be able to compare Python 3.5 vs 3.6 or Python 2.7 vs Python 3.6, we need to backport somehow the average feature to the timeit module of older Python versions. One option would be to put the timeit module on the Python Cheeseshop (PyPI). Hum, but there is already such module: my perf module.

A solution would be to redirect users to the perf module in the timeit documentation, and maybe also document that timeit results are not reliable?

A different solution would be to add a --python parameter to timeit to run the benchmark on a specific Python version (ex: "python3 -m timeit --python=python2 ..."). But this solution is more complex to be developed since we have to make the timeit.py compatible with Python 2.7 and find a reliable way to load it in the other tested Python program.

Note: I plan to add a --python parameter in my perf module, but I didn't implemented yet. Since my perf module spawn child processes and the perf module is a third party module, it is simpler to implement this option.

--

A more general remark: the timeit is commonly used to compare performances of two Python versions. They run timeit twice and then compare manually results. But only two numbers are compared. It would be more reliable to compare all timings and make sure that the comparison is significant. Again, the perf module implements such function:
http://perf.readthedocs.io/en/latest/api.html#perf.is_significant

I didn't implement a full CLI for perf timeit to directly compares two Python versions. You have to run timeit twice and store all timings in JSON files and then use the "perf compare" command to reload timings and compare them.
msg277157 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-21 15:17
> * Display the average, rather than the minimum, of the timings *and* display the standard deviation. It should help a little bit to get more reproductible results.

This makes hard to compare results with older Python versions.

> * Change the default repeat from 3 to 5 to have a better distribution of timings. It makes the timeit CLI 66% slower (ex: 1 second instead of 600 ms). That's the price of stable benchmarks :-)

For now default timeit run takes from 0.8 to 8 sec. Adding yet 5 sec makes a user more angry.

> * Don't disable the garbage collector anymore! Disabling the GC is not fair: real applications use it.

But this makes short microbenchmarks less stable.

> * autorange: start with 1 loop instead of 10 for slow benchmarks like time.sleep(1)

This is good if you run relatively slow benchmark, but it makes the result less reliable. You always can specify -n1, but on your own risk.

> * Display large number of loops as power of 10 for readability, ex: "10^6" instead of "1000000". Also accept "10^6" syntax for the --num parameter.

10^6 syntax doesn't look Pythonic. And this change breaks third-party scripts that run timeit.

> * Add support for "ns" unit: nanoseconds (10^-9 second)

Even "pass" takes at least 0.02 usec on my computer. What you want to measure that takes < 1 ns? I think timeit is just wrong tool for this.

The patch also makes a warning about unreliable results output to stdout and always visible. This is yet one compatibility break. Current code allows the user to control the visibility of the warning by the -W Python option, and don't mix the warning with result output.
msg277161 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-21 15:28
Serhiy Storchaka added the comment:
>> * Change the default repeat from 3 to 5 to have a better distribution of timings. It makes the timeit CLI 66% slower (ex: 1 second instead of 600 ms). That's the price of stable benchmarks :-)
>
> For now default timeit run takes from 0.8 to 8 sec. Adding yet 5 sec makes a user more angry.

Ah yes, I forgot that timeit uses power of 10 to have a nice looking
"xxx loops". I chose to use power of 2 in the perf module to have
shorter benchmarks, but powers of 2 are displayed as 2^n to remain
readable.

>> * Display large number of loops as power of 10 for readability, ex: "10^6" instead of "1000000". Also accept "10^6" syntax for the --num parameter.
>
> 10^6 syntax doesn't look Pythonic. And this change breaks third-party scripts that run timeit.

Do you mean scripts parsing the timeit output (stdout)?

>> * Add support for "ns" unit: nanoseconds (10^-9 second)
>
> Even "pass" takes at least 0.02 usec on my computer. What you want to measure that takes < 1 ns?

IMO 20 ns is more readable than 0.02 usec.

> I think timeit is just wrong tool for this.

Even if timeit is not reliable, it *is* used to benchmark operations
taking less than 1 us.

> The patch also makes a warning about unreliable results output to stdout and always visible. This is yet one compatibility break. Current code allows the user to control the visibility of the warning by the -W Python option, and don't mix the warning with result output.

Oh, I forgot to documen this change. I made it because the old code
displays a surprising ":0: " prefix. I chose to use print instead. I
don't think that it's a common usage to hide the warning using
-Wignore.
msg277177 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-09-21 18:35
We had a similar discussion a while back for pybench. Consensus
then was to use the minimum as basis for benchmarking:

https://mail.python.org/pipermail/python-dev/2006-June/065525.html

I had used the average before this discussion in pybench 1.0:

https://mail.python.org/pipermail/python-announce-list/2001-November/001081.html

There are arguments both pro and con using min or avg values.

I'd suggest to display all values and base the findings on
all available values, rather than just one:
min, max, avg, median, stddev.
msg277186 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2016-09-22 00:12
> I'd suggest to display all values and base the findings on
> all available values, rather than just one:
> min, max, avg, median, stddev.

If we're going to go down that path, I suggest using something like:

https://en.wikipedia.org/wiki/Five-number_summary

But at this point, we're surely looking at 3.7?
msg277251 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-23 08:21
Marc-Andre: "Consensus then was to use the minimum as basis for benchmarking: (...) There are arguments both pro and con using min or avg values."

To be honest, I expect that most developer are already aware that minimum is evil and so I wouldn't have to convince you. I already posted two links for the rationale. It seems that you are not convinced yet, it seems like I have to prepare a better rationale :-)

Quick rationale: the purpose of displaying the average rather than the minimum in timeit is to make timeit more *reliable*. My goal is that running timeit 5 times would give exactly the same result. With the current design of timeit (only run one process), it's just impossible (for different reasons listed in my first article linked on this issue). But displaying the average is less worse than displaying the minimum to make results more reproductible.
msg277259 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2016-09-23 09:11
Le 23/09/2016 à 10:21, STINNER Victor a écrit :
> 
> Quick rationale: the purpose of displaying the average rather than
> the
minimum in timeit is to make timeit more *reliable*. My goal is that
running timeit 5 times would give exactly the same result.

Why would it? System noise can vary from run to run.
msg277271 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-09-23 12:19
> Marc-Andre: "Consensus then was to use the minimum as basis for benchmarking: (...) There are arguments both pro and con using min or avg values."
> 
> To be honest, I expect that most developer are already aware that minimum is evil and so I wouldn't have to convince you. I already posted two links for the rationale. It seems that you are not convinced yet, it seems like I have to prepare a better rationale :-)

I'm not sure I follow. The first link clearly says that "So for better or worse, the choice of which one is better comes down to what we think the underlying distribution will be like." and it ends with "So personally I use the minimum when I benchmark.".

http://blog.kevmod.com/2016/06/benchmarking-minimum-vs-average/

If we display all available numbers, people who run timeit can then see where things vary and possibly look deeper to find the reason.

As I said and the above articles also underlines: there are cases where min is better and others where avg is better. So in the end, having both numbers available gives you all the relevant information.

I have focused on average in pybench 1.0 and then switched to minimum for pybench 2.0. Using minimum resulted in more reproducible results at least on the computers I ran pybench on, but do note that pybench 2.0 still does print out the average values as well. The latter mostly due to some test runs I found where (probably due to CPU timers not working correctly), the min value sometimes dropped to very low values which did not really make sense compared to the average values.
msg277292 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-09-23 19:03
I concur with all of Mark-Andre's comments.

FWIW, when I do timings with the existing timeit, I use a repeat of 7.  It gives stable and consistent results.  The whole idea of using the minimum of those runs is to pick the least noisy measurement.
msg277303 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2016-09-23 21:57
Until we have a consensus on this change and a final, reviewed patch, it is premature to consider inclusion in 3.6.  If there is such prior to 360b2, we can reconsider.
msg278133 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-10-05 15:39
Oh, cfbolz just modified timeit in PyPy to display average (mean) and standard deviation:
https://bitbucket.org/pypy/pypy/commits/fb6bb835369e

Moreover, PyPy timeit now displays the following warning:
---
WARNING: timeit is a very unreliable tool. use perf or something else for real measurements
---
msg278890 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-10-18 15:14
New changeset 3aba5552b976 by Victor Stinner in branch 'default':
timeit: start autorange with 1 iteration, not 10
https://hg.python.org/cpython/rev/3aba5552b976

New changeset 2dafb2f3e7ff by Victor Stinner in branch 'default':
timeit: change default repeat to 5, instead of 3
https://hg.python.org/cpython/rev/2dafb2f3e7ff
msg278891 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-10-18 15:21
New changeset 975df4c13db6 by Victor Stinner in branch 'default':
timeit: remove --clock and --time options
https://hg.python.org/cpython/rev/975df4c13db6
msg278895 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-10-18 15:59
New changeset 4d611957732b by Victor Stinner in branch 'default':
timeit: enhance format of raw timings (in verbose mode)
https://hg.python.org/cpython/rev/4d611957732b

New changeset c3a93069111d by Victor Stinner in branch 'default':
timeit: add nsec (nanosecond) unit for format timings
https://hg.python.org/cpython/rev/c3a93069111d

New changeset 40e97c9dae7a by Victor Stinner in branch 'default':
timeit: add newlines to output for readability
https://hg.python.org/cpython/rev/40e97c9dae7a
msg278896 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-10-18 16:02
Steven D'Aprano:
>> * Don't disable the garbage collector anymore! Disabling the GC is not 
>> fair: real applications use it.

> But that's just adding noise: you're not timing code snippet, you're 
timing code snippet plus garbage collector.
>
> I disagree with this change, although I would accept it if there was an 
optional flag to control the gc.

IMO it's a lie to display the minimum timing with the garbage collector disabled. The garbage collector is enabled in all applications.
msg278898 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-10-18 16:05
Steven D'Aprano:
>> * Display large number of loops as power of 10 for readability, ex: 
>> "10^6" instead of "1000000". Also accept "10^6" syntax for the --num 
>> parameter.
>
> Shouldn't we use 10**6 or 1e6 rather than bitwise XOR? :-)

Hum, with "10**6" syntax, I see a risk of typo: "10*6" instead of "10**6".

I don't know if the x^y syntax is common or not, but I like it. LaTeX uses it for example.
msg278901 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-10-18 16:16
> For now default timeit run takes from 0.8 to 8 sec. Adding yet 5 sec makes a user more angry.

See the issue #28469 "timeit: use powers of 2 in autorange(), instead of powers of 10" for a simple fix to reduce the total duration of the worst case (reduce it to 2.4 seconds).
msg278903 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-10-18 16:21
Serhiy Storchaka:
>> * autorange: start with 1 loop instead of 10 for slow benchmarks like time.sleep(1)
> This is good if you run relatively slow benchmark, but it makes the result less reliable. You always can specify -n1, but on your own risk.

Sorry, I don't understand how running 1 iteration instead of 10 makes the benchmark less reliable. IMO the reliability is more impacted by the number of repeatitions (-r). I changed the default from 3 to 5 repetitions, so timeit should be *more* reliable in Python 3.7 than 3.6.

> Even "pass" takes at least 0.02 usec on my computer. What you want to measure that takes < 1 ns? I think timeit is just wrong tool for this.

It's just a matter of formatting. IMO clocks have a precision good enough to display nanoseconds when the benchmark uses many iterations (which is the case by default since autorange uses a minimum of 200 ms per benchmark).

Before:

$ python3.6 -m timeit 'pass'
100000000 loops, best of 3: 0.0339 usec per loop

After:

$ python3.7 -m timeit 'pass'
10000000 loops, best of 5: 33.9 nsec per loop

IMO "33.9" is more readable than "0.0339".
msg278907 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-10-18 16:30
I'm disappointed by the discussion on minumum vs average. Using the perf module (python3 -m perf timeit), it's very easy to show that the average is more reliable than the minimum. The perf module runs 20 worker processes by default: with so many processes, it's easy to see that each process has a different timing because of random address space layout and the randomized Python hash function.

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) 	Date: 2016-09-21 15:17

Serhiy: "This makes hard to compare results with older Python versions."

Serhiy is right. I see two options: display average _and_ minimum (which can be confusing for users!) or display the same warning than PyPy:

"WARNING: timeit is a very unreliable tool. use perf or something else for real measurements"

But since I'm grumpy now, I will now just close the issue :-) I pushed enough changes to timeit for today ;-)
msg278913 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-18 17:00
> Sorry, I don't understand how running 1 iteration instead of 10 makes the benchmark less reliable. IMO the reliability is more impacted by the number of repeatitions (-r). I changed the default from 3 to 5 repetitions, so timeit should be *more* reliable in Python 3.7 than 3.6.

Caches. Not high-level caching that can make the measurement senseless, but low-level caching, for example memory caching, that can cause small difference (but this difference can be larger than the effect that you measure). On every repetition you first run a setup code, and then run testing code in loops. After the first loop the memory cache is filled with used data and next loops can be faster. On next repetition running a setup code can unload this data from the memory cache, and the next loop will need to load it back from slow memory. Thus on every repetition the first loop is slower that the followings. If you run 10 or 100 loops the difference can be negligible, but if run the only one loop, the result can differs on 10% or more.

> $ python3.6 -m timeit 'pass'
> 100000000 loops, best of 3: 0.0339 usec per loop

This is a senseless example. 0.0339 usec is not a time of executing "pass", it is an overhead of the iteration. You can't use timeit for measuring the performance of the code that takes such small time. You just can't get the reliable result for it. Even for code that takes an order larger time the result is not very reliable. Thus no need to worry about timing much less than 1 usec.
msg278914 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-10-18 17:03
Serhiy Storchaka:
> This is a senseless example. 0.0339 usec is not a time of executing "pass", it is an overhead of the iteration. You can't use timeit for measuring the performance of the code that takes such small time. You just can't get the reliable result for it. Even for code that takes an order larger time the result is not very reliable. Thus no need to worry about timing much less than 1 usec.

I will not argue about the reliability of the timeit module.

It's common to see code snippets using timeit for short
microbenchmarks taking less than 1 us, especially on
micro-optimization on CPython.
msg278917 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-18 17:08
It may be worth to emit a warning in case of using timeit for too short code.
msg278918 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-10-18 17:10
Serhiy: "It may be worth to emit a warning in case of using timeit for too short code."

I suggest to simply warn users that timeit is not reliable at all :-)

By the way, I close this issue, so I suggest you to open new issues if you want further enhancements.
msg278919 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-10-18 17:15
Serhiy Storchaka added the comment:
>> Sorry, I don't understand how running 1 iteration instead of 10 makes the benchmark less reliable. IMO the reliability is more impacted by the number

> Caches. Not high-level caching that can make the measurement senseless, but low-level caching, for example memory caching, that can cause small difference (but this difference can be larger than the effect that you measure). On every repetition you first run a setup code, and then run testing code in loops. After the first loop the memory cache is filled with used data and next loops can be faster. On next repetition running a setup code can unload this data from the memory cache, and the next loop will need to load it back from slow memory. Thus on every repetition the first loop is slower that the followings. If you run 10 or 100 loops the difference can be negligible, but if run the only one loop, the result can differs on 10% or more.

It seems like you give a time budget of less than 20 seconds to timeit
according to one of your previous message. IMO reliability is
incompatible with quick timeit command. If you want a reliable
benchmark, you need much more repetition than just 5. perf uses
20x(1+3) by default: it always run the benchmark once to "warmup" the
benchmark, but ignore this timing. All parameters can be tuned on the
command line (number of processes, warmups, samples, etc.).

Well, I'm not really interested by timeit in the stdlib anymore since
it seems to make significant enhancements without bikeshedding. So I
let you revert my change if you consider that it makes timeit less
reliable.
msg278974 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-10-19 13:49
New changeset 4e4d4e9183f5 by Victor Stinner in branch 'default':
Issue #28240: Fix formatting of the warning.
https://hg.python.org/cpython/rev/4e4d4e9183f5
msg317128 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-05-19 16:42
Wait, I didn't notice the change to the format of raw timings. It looks as a regression to me.
msg317279 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-22 13:37
> Wait, I didn't notice the change to the format of raw timings. It looks as a regression to me.

Do you mean that some applications may run timeit as a CLI and parse stdout to get raw values? Why doing so? timeit is a Python module, it's trivial to use its API to avoid using the CLI, no?

I don't think that the CLI output must not change.

master branch:

vstinner@apu$ ./python -m timeit -v '[1,2]*1000'
1 loop -> 1.73e-05 secs
2 loops -> 6.49e-05 secs
5 loops -> 0.000107 secs
10 loops -> 0.000173 secs
20 loops -> 0.000331 secs
50 loops -> 0.000798 secs
100 loops -> 0.00159 secs
200 loops -> 0.00304 secs
500 loops -> 0.00777 secs
1000 loops -> 0.0163 secs
2000 loops -> 0.0315 secs
5000 loops -> 0.0775 secs
10000 loops -> 0.154 secs
20000 loops -> 0.311 secs

raw times: 310 msec, 313 msec, 308 msec, 303 msec, 304 msec

20000 loops, best of 5: 15.2 usec per loop

Python 3.6:

vstinner@apu$ python3 -m timeit -v '[1,2]*1000'
10 loops -> 3.41e-05 secs
100 loops -> 0.000345 secs
1000 loops -> 0.00327 secs
10000 loops -> 0.0332 secs
100000 loops -> 0.325 secs
raw times: 0.319 0.316 0.319
100000 loops, best of 3: 3.16 usec per loop

Hum, the timings of the calibration (xx loops -> ...) should use the same function to format time to use ns, ms, etc.
msg317290 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-05-22 14:01
Yes, it was my thought. But seems you are right, it is easier to use Python as a programming language. In the past I used the CLI because the programming interface didn't supported autoranging.

Although I would change the human-readable output to

raw times (msec): 310 313 308 303 304

But it may be too later for 3.7.
msg318838 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-06 15:55
New changeset 3ef769fcd378a7f1cda19c0dfec2e79613d79e48 by Victor Stinner in branch 'master':
bpo-28240: timeit: Update repeat() doc (GH-7419)
https://github.com/python/cpython/commit/3ef769fcd378a7f1cda19c0dfec2e79613d79e48
msg318851 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-06 17:05
New changeset cebd4b009adca6611e92eb337747f59818e941a6 by Victor Stinner (Miss Islington (bot)) in branch '3.7':
bpo-28240: timeit: Update repeat() doc (GH-7419) (GH-7457)
https://github.com/python/cpython/commit/cebd4b009adca6611e92eb337747f59818e941a6
msg318852 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-06 17:06
I updated the documentation in 3.7 and master branches.
History
Date User Action Args
2022-04-11 14:58:37adminsetgithub: 72427
2018-06-08 18:59:43terry.reedylinkissue33771 superseder
2018-06-06 17:06:41vstinnersetmessages: + msg318852
2018-06-06 17:05:48vstinnersetmessages: + msg318851
2018-06-06 15:56:39miss-islingtonsetpull_requests: + pull_request7080
2018-06-06 15:55:20vstinnersetmessages: + msg318838
2018-06-05 10:43:13vstinnersetpull_requests: + pull_request7044
2018-05-22 14:01:01serhiy.storchakasetstatus: open -> closed

messages: + msg317290
stage: resolved
2018-05-22 13:37:19vstinnersetmessages: + msg317279
2018-05-19 16:42:38serhiy.storchakasetstatus: closed -> open

messages: + msg317128
2017-05-17 19:21:49jceasetnosy: + jcea
2017-03-31 16:36:20dstufftsetpull_requests: + pull_request945
2016-11-29 19:54:36serhiy.storchakalinkissue24015 superseder
2016-10-19 13:49:45python-devsetmessages: + msg278974
2016-10-18 17:15:12vstinnersetmessages: + msg278919
2016-10-18 17:10:53vstinnersetmessages: + msg278918
2016-10-18 17:08:10serhiy.storchakasetmessages: + msg278917
2016-10-18 17:03:58vstinnersetmessages: + msg278914
2016-10-18 17:00:36serhiy.storchakasetmessages: + msg278913
2016-10-18 16:30:11vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg278907
2016-10-18 16:21:23vstinnersetmessages: + msg278903
2016-10-18 16:18:35fijallsetnosy: - fijall
2016-10-18 16:16:12vstinnersetmessages: + msg278901
2016-10-18 16:05:44vstinnersetmessages: + msg278898
2016-10-18 16:02:37vstinnersetmessages: + msg278896
2016-10-18 15:59:37python-devsetmessages: + msg278895
2016-10-18 15:21:13python-devsetmessages: + msg278891
2016-10-18 15:14:28python-devsetnosy: + python-dev
messages: + msg278890
2016-10-05 15:39:18vstinnersetmessages: + msg278133
2016-09-23 21:57:03ned.deilysetmessages: + msg277303
versions: - Python 3.6
2016-09-23 19:03:41rhettingersetnosy: + rhettinger
messages: + msg277292
2016-09-23 12:19:27lemburgsetmessages: + msg277271
2016-09-23 09:11:03pitrousetmessages: + msg277259
2016-09-23 08:21:40vstinnersetmessages: + msg277251
2016-09-22 00:12:00steven.dapranosetmessages: + msg277186
2016-09-21 18:35:30lemburgsetnosy: + lemburg

messages: + msg277177
title: Enhance the timeit module -> Enhance the timeit module: display average +- std dev instead of minimum
2016-09-21 15:28:29vstinnersetmessages: + msg277161
2016-09-21 15:17:35serhiy.storchakasetmessages: + msg277157
2016-09-21 14:42:20yselivanovsetnosy: - yselivanov
2016-09-21 14:41:42vstinnersetmessages: + msg277149
2016-09-21 14:33:27steven.dapranosetnosy: + steven.daprano

messages: + msg277148
title: Enhance the timeit module: display average +- std dev instead of minimum -> Enhance the timeit module
2016-09-21 14:24:30pitrousetmessages: + msg277147
2016-09-21 14:20:43vstinnersetmessages: + msg277146
2016-09-21 14:19:11pitrousetmessages: + msg277145
2016-09-21 14:17:20pitrousetnosy: + tim.peters
2016-09-21 14:16:39vstinnersetmessages: + msg277144
2016-09-21 14:13:53vstinnersetmessages: + msg277143
2016-09-21 14:09:08vstinnersettitle: Enhance the timeit module: diplay average +- std dev instead of minimum -> Enhance the timeit module: display average +- std dev instead of minimum
2016-09-21 14:09:05vstinnersettitle: Enhance the timeit module -> Enhance the timeit module: diplay average +- std dev instead of minimum
2016-09-21 14:08:50vstinnercreate