Issue 7029: Improve pybench

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/51278

classification

Title:	Improve pybench
Type:	behavior	Stage:
Components:	Demos and Tools	Versions:	Python 2.7

process

Status:	closed	Resolution:	later
Dependencies:		Superseder:
Assigned To:		Nosy List:	kristjan.jonsson, lemburg
Priority:	normal	Keywords:	patch

Created on 2009-10-01 17:16 by kristjan.jonsson, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
pybench.patch	kristjan.jonsson, 2009-10-01 17:16
pybench.patch	kristjan.jonsson, 2009-10-06 17:14

Messages (8)
msg93416 - (view)	Author: Kristján Valur Jónsson (kristjan.jonsson) *	Date: 2009-10-01 17:16
The attached patch contains suggested fixes to pybench.py: 1) add a processtime timer for windows 2) fix a bug in timer selection: timer wasn't put in 'self' 3) Collect 'gross' times for each round 4) show statistics for per-round sums, both for sum of "adjusted" times for each round, and 'gross' time per round. The last one is important. Traditionally, the focus would be on individual tests. The series of individual timings for each test would be examined, min and afverage found. These minima and averages would then be summed up to get a total time. These results are very noisy. In addition, the sum of individual minima is not a linear operator that has any meaning and is thus extremely noisy. Looking at the minimum and average of the sum is a much stabler indication of total benchmark performance. Another thing that I found when working with this, is that using "calibration" significantly adds to noise and can produce incorrect results. It adds a level of non-linearity to the results that can appear very strange. Typically the 'overhead' is so low that we should consider skipping the calibration. The purpose of the benchmark must be to measure the performance of python in context. The meaning of individual operations in isolation are pretty meaningless. Although I didn't change the default number of calibration runs from 10, I would suggest documenting that it may be useful to turn it to 0 to increase predictability and get numbers indicative of real performance.
msg93417 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-10-01 18:12
Kristján Valur Jónsson wrote: > Thanks for the patch. Here's a quick review (a bit terse, but hope you don't mind)... > The attached patch contains suggested fixes to pybench.py: > 1) add a processtime timer for windows I'm not sure why you added this: the systimes.py module already has a ctypes wrapper for kernel32.GetProcessTimes() ?! All you have to do is use this command line option to enable it: --timer systimes.processtime > 2) fix a bug in timer selection: timer wasn't put in 'self' Thanks for spotting this one. > 3) Collect 'gross' times for each round I'm assuming that "gross" = test time + overhead. Is that right ? I like the idea, but not the implementation :-) Don't like my own pybench statistics implementation much either. I plan to rewrite it for Python 2.7. > 4) show statistics for per-round sums, both for sum of "adjusted" times > for each round, and 'gross' time per round. > > The last one is important. Traditionally, the focus would be on > individual tests. The series of individual timings for each test would > be examined, min and afverage found. These minima and averages would > then be summed up to get a total time. These results are very noisy. > In addition, the sum of individual minima is not a linear operator that > has any meaning and is thus extremely noisy. > > Looking at the minimum and average of the sum is a much stabler > indication of total benchmark performance. I agree. For the minimum times, the minimum over all tests should be used. > Another thing that I found when working with this, is that using > "calibration" significantly adds to noise and can produce incorrect > results. It adds a level of non-linearity to the results that can > appear very strange. Typically the 'overhead' is so low that we should > consider skipping the calibration. The purpose of the benchmark must be > to measure the performance of python in context. The meaning of > individual operations in isolation are pretty meaningless. Although I > didn't change the default number of calibration runs from 10, I would > suggest documenting that it may be useful to turn it to 0 to increase > predictability and get numbers indicative of real performance. The idea behind the calibration is to hide away the overhead of setting up the test. You're right, that nowadays the tests run so fast, that the calibration causes more noise than do good. This is due to the fact that the number of iteration rounds and test "packs" were chosen in 2006. Back then, both Python and the CPUs were a lot slower. OTOH, timers have not gotten a lot more accurate. As a result, the measurements of pybench on todays machines have far more noise than they did in 2006. The only way to resolve this is to adjust all tests - something which I also plan to do in time for Python 2.7.
msg93422 - (view)	Author: Kristján Valur Jónsson (kristjan.jonsson) *	Date: 2009-10-01 20:53
Hello Marc-Andre. 1) yes, this is nonsense. I thought that "systimes" was some archaeic module that had been discontinued, since I didn't find it in the standard modules (where "time" is). I didn't realize that it was a special helper sibling-module to pybench. 3) 'gross', yes, as opposed to 'net'. The total time of each round. For timers which don't update frequently, such as GetProcessTimes(), summing up many small contributions compounds the error (in this case, +- 10ms per measurement). Can you be more specific about what it is that you don't like? About the calibration: I ran into problems where I was tuning the memory allocator. I put HeapAlloc in place of malloc() and found a consistent 10% performance degradation, even when using the LFH. I was puzzled by this until I turned off calibration and the effect went away. The calibration technique is not a bad idea, but it would be helpful, when one is comparing runs, to be able to keep the calibration times of the previous run, so as not to introduce a random element into the comparison. Alternatively, explain that when doing successive tests, the lowest variance in the result is to be expeced with -C 0. So, how can I improve this?
msg93655 - (view)	Author: Kristján Valur Jónsson (kristjan.jonsson) *	Date: 2009-10-06 17:13
Here is an updated patch with the superfluous timing function removed. Please advise me on how you "don't like the implementation" I'm also considering adding a '--quiet' flag that causes it to only emit the 'total' line. This is useful for automated testing of builds. Any thoughts about that?
msg93656 - (view)	Author: Kristján Valur Jónsson (kristjan.jonsson) *	Date: 2009-10-06 17:14
And here is the actual patch.
msg93661 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-10-06 20:57
Kristján Valur Jónsson wrote: > > Kristján Valur Jónsson <kristjan@ccpgames.com> added the comment: > > Here is an updated patch with the superfluous timing function removed. > Please advise me on how you "don't like the implementation" The reporting methods are a complete mess (not your patch, my code). They need a rewrite and then I can apply the extra gross output. Something that would need to change is the .stat() patch: why didn't you just add 3 new fields to the tuple instead of using a list with 2 tuples ? In any case, please don't invest too much time into this. I'm going to rewrite the statistics part anyway and can then add the new gross values while I'm at it. > I'm also considering adding a '--quiet' flag that causes it to only emit > the 'total' line. This is useful for automated testing of builds. Any > thoughts about that? Could be useful, yes.
msg93699 - (view)	Author: Kristján Valur Jónsson (kristjan.jonsson) *	Date: 2009-10-07 14:16
Ok, thanks for the info. We are currently using the modified version in house for monitoring our builds and that's ok for my purposes. I have set the -C 0 and --timer systimes.processtime as default for us. I'll close this now and leave you to do the rewrite.
msg93796 - (view)	Author: Kristján Valur Jónsson (kristjan.jonsson) *	Date: 2009-10-09 14:32
Fixed the bug with "timer = timer" in trunk in revision 75293

History
Date	User	Action	Args
2022-04-11 14:56:53	admin	set	github: 51278
2009-10-09 14:32:55	kristjan.jonsson	set	messages: + msg93796
2009-10-07 14:16:37	kristjan.jonsson	set	status: open -> closed resolution: later messages: + msg93699
2009-10-06 20:57:03	lemburg	set	messages: + msg93661
2009-10-06 17:14:28	kristjan.jonsson	set	files: + pybench.patch messages: + msg93656
2009-10-06 17:13:59	kristjan.jonsson	set	messages: + msg93655
2009-10-01 20:53:40	kristjan.jonsson	set	messages: + msg93422
2009-10-01 18:12:45	lemburg	set	nosy: + lemburg messages: + msg93417
2009-10-01 17:16:42	kristjan.jonsson	create