This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Deprecate the regex_v8, telco, and spectral_norm benchmarks
Type: enhancement Stage: needs patch
Components: Benchmarks Versions:
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, pitrou, skrah, vstinner, yselivanov
Priority: normal Keywords: easy

Created on 2016-02-22 23:06 by brett.cannon, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (7)
msg260705 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-02-22 23:06
In the thread at https://mail.python.org/pipermail/speed/2016-February/000272.html it came up that the regex_v8, telco, and spectral_norm benchmarks are all very inconsistent. That means they should be deprecated.
msg260711 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2016-02-23 08:27
I'm not sure what "inconsistent" means. If the results are unstable between runs, it may mean the operations being measured themselves are unstable (for example because of hashing differences or cache aliasing effects from run to run).

I'd rather like benchmarks to be judged on their usefulness:
- spectral_norm really looks pointless as nobody would write scientific code in Python like that
- telco, AFAIU, is a widely-used benchmark for decimals (but perhaps Stefan can shed some light)
- regex_v8 claims to be drawn from real-world use of regular expressions by popular Web pages, so it sounds useful as well

(note that telco apparently loads a file in the main loop, perhaps that can be pulled out of the loop and into the init phase)
msg260713 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2016-02-23 09:43
Telco is a real world workload devised by Mike Cowlishaw. Some fixes
need to me made for the version in the benchmark suite; in particular,
the amount of input seems insufficient for _decimal (#26284).

I'm not a fan of weeding out real world benchmarks until our test suite
looks stable -- does Intel for example apply the techniques described
by Victor in another issue?
msg260714 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-02-23 09:46
I opened the issue #26275 to try to make benchmarks more reliable.

My notes to tune the Linux kernel to reduce the "noise" from the operating system:
http://haypo-notes.readthedocs.org/microbenchmark.html#reliable-micro-benchmarks

On the speed mailing list, it was also suggested to use the geometric mean rather than the minimum or the average.
msg260719 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2016-02-23 10:01
> On the speed mailing list, it was also suggested to use the geometric mean rather than the minimum or the average.

This should be considered a bit more carefully.

First, geometric mean is only useful when you are agregating heterogenous numbers. Here, we are agregating homogenous numbers (results from a single benchmark), so the arithmetic mean should be preferred.

Second, there still is the issue of weeding out outliars (due to e.g. background activity). So perhaps the 20% slowest runs should be discarded.

Third, to get enough precision in the arithmetic mean, the number of individual runs (separate process invocations, to smoothen variabilities due to cache aliasing etc.) should be raised to a sufficient number. See the central limit theorem.
msg260739 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-02-23 15:25
We now have speed.python.org up, so I'd keep spectral_norm to make sure we don't accidentally harm the performance of int/floats operations.  It also helped me to discover that PyLong_AsDouble was unnecessary slow, etc.
msg260744 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-02-23 20:57
OK, so everyone says to keep what we have.
History
Date User Action Args
2022-04-11 14:58:27adminsetgithub: 70604
2016-02-23 21:36:52brett.cannonsetresolution: postponed -> rejected
2016-02-23 20:57:12brett.cannonsetstatus: open -> closed
resolution: postponed
messages: + msg260744
2016-02-23 15:25:09yselivanovsetmessages: + msg260739
2016-02-23 10:01:52pitrousetmessages: + msg260719
2016-02-23 09:46:56vstinnersetmessages: + msg260714
2016-02-23 09:43:40skrahsetmessages: + msg260713
2016-02-23 08:27:16pitrousetnosy: + vstinner, yselivanov
messages: + msg260711
2016-02-23 08:21:20pitrousetnosy: + skrah
2016-02-22 23:12:38brett.cannonsetkeywords: + easy
2016-02-22 23:06:45brett.cannoncreate