Issue 26416: Deprecate the regex_v8, telco, and spectral_norm benchmarks

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/70604

classification

Title:	Deprecate the regex_v8, telco, and spectral_norm benchmarks
Type:	enhancement	Stage:	needs patch
Components:	Benchmarks	Versions:

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	brett.cannon, pitrou, skrah, vstinner, yselivanov
Priority:	normal	Keywords:	easy

Created on 2016-02-22 23:06 by brett.cannon, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (7)
msg260705 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2016-02-22 23:06
In the thread at https://mail.python.org/pipermail/speed/2016-February/000272.html it came up that the regex_v8, telco, and spectral_norm benchmarks are all very inconsistent. That means they should be deprecated.
msg260711 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2016-02-23 08:27
I'm not sure what "inconsistent" means. If the results are unstable between runs, it may mean the operations being measured themselves are unstable (for example because of hashing differences or cache aliasing effects from run to run). I'd rather like benchmarks to be judged on their usefulness: - spectral_norm really looks pointless as nobody would write scientific code in Python like that - telco, AFAIU, is a widely-used benchmark for decimals (but perhaps Stefan can shed some light) - regex_v8 claims to be drawn from real-world use of regular expressions by popular Web pages, so it sounds useful as well (note that telco apparently loads a file in the main loop, perhaps that can be pulled out of the loop and into the init phase)
msg260713 - (view)	Author: Stefan Krah (skrah) *	Date: 2016-02-23 09:43
Telco is a real world workload devised by Mike Cowlishaw. Some fixes need to me made for the version in the benchmark suite; in particular, the amount of input seems insufficient for _decimal (#26284). I'm not a fan of weeding out real world benchmarks until our test suite looks stable -- does Intel for example apply the techniques described by Victor in another issue?
msg260714 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-02-23 09:46
I opened the issue #26275 to try to make benchmarks more reliable. My notes to tune the Linux kernel to reduce the "noise" from the operating system: http://haypo-notes.readthedocs.org/microbenchmark.html#reliable-micro-benchmarks On the speed mailing list, it was also suggested to use the geometric mean rather than the minimum or the average.
msg260719 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2016-02-23 10:01
> On the speed mailing list, it was also suggested to use the geometric mean rather than the minimum or the average. This should be considered a bit more carefully. First, geometric mean is only useful when you are agregating heterogenous numbers. Here, we are agregating homogenous numbers (results from a single benchmark), so the arithmetic mean should be preferred. Second, there still is the issue of weeding out outliars (due to e.g. background activity). So perhaps the 20% slowest runs should be discarded. Third, to get enough precision in the arithmetic mean, the number of individual runs (separate process invocations, to smoothen variabilities due to cache aliasing etc.) should be raised to a sufficient number. See the central limit theorem.
msg260739 - (view)	Author: Yury Selivanov (yselivanov) *	Date: 2016-02-23 15:25
We now have speed.python.org up, so I'd keep spectral_norm to make sure we don't accidentally harm the performance of int/floats operations. It also helped me to discover that PyLong_AsDouble was unnecessary slow, etc.
msg260744 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2016-02-23 20:57
OK, so everyone says to keep what we have.

History
Date	User	Action	Args
2022-04-11 14:58:27	admin	set	github: 70604
2016-02-23 21:36:52	brett.cannon	set	resolution: postponed -> rejected
2016-02-23 20:57:12	brett.cannon	set	status: open -> closed resolution: postponed messages: + msg260744
2016-02-23 15:25:09	yselivanov	set	messages: + msg260739
2016-02-23 10:01:52	pitrou	set	messages: + msg260719
2016-02-23 09:46:56	vstinner	set	messages: + msg260714
2016-02-23 09:43:40	skrah	set	messages: + msg260713
2016-02-23 08:27:16	pitrou	set	nosy: + vstinner, yselivanov messages: + msg260711
2016-02-23 08:21:20	pitrou	set	nosy: + skrah
2016-02-22 23:12:38	brett.cannon	set	keywords: + easy
2016-02-22 23:06:45	brett.cannon	create