New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
timeit accuracy could be better #67881
Comments
In bpo-6422 Haypo suggested making the timeit reports much better. This is a new ticket just for that. See https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py and http://bugs.python.org/issue6422?@ok_message=issue%206422%20nosy%2C%20nosy_count%2C%20stage%20edited%20ok&@template=item#msg164216 |
See also bpo-21988. |
Not only I'm too lazy to compute manually the number of loops and repeat, but also I don't trust myself. It's even worse when someone publishs results of a micro-benchmark. I don't trust how the benchmark was calibrated. In my experience, micro-benchmark are polluted by noise in timings, so results are not reliable. benchmarks.py calibration is based on time, whereas timeit uses hardcoded constants (loops=1000000, repeat=3) which can be modified on the command line. benchmarks.py has 3 main parameters:
The minimum duration is increased if the clock resolution is bad (1 ms or more). It's the case on Windows for time.clock() on Python 2 for example. Extract of benchmark.py: min_time = max(self.config.min_time, timer_precision * 100) The estimation of the number of loops is not reliable, but it's written to be "fast". Since I run a micro-benchmark many times, I don't want to wait too long. It's not a power of 10, but an arbitrary integer number. Usually, when running benchmark.py multiple times, the number of loops is different each time. It's not really a big issue, but it probably makes results more difficult to compare. My constrain is max_time. The tested function may not have a linear duration (time = time_one_iteration * loops). Repeat a test at least 5 times is a compromise between the stability of the result and the total duration of the benchmark. Feel free to reuse my code to enhance time.py. |
Hi, I develop a new implementation of timeit which should be more reliable:
Using the minimum and disable the garbage collector is a bad practice, it is not reliable:
My following blog post "My journey to stable benchmark, part 3 (average)" explains in depth the multiple issues of using the minimum: My perf module is very yound, it's still a work-in-progress. It should be better than timeit right now. It works on Python 2.7 and 3 (I tested 3.4). We may pick the best ideas into the timeit module. See also my article explaining how to tune Linux to reduce the "noise" of the operating system on microbenchmarks: |
I wrote a whole new project "perf" to fix root issues of this issue. It includes a timeit command. I suggest you to use "perf timeit" rather than "timeit" because perf is more reliable: |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: