Message 93417 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	kristjan.jonsson, lemburg
Date	2009-10-01.18:12:42
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<4AC4F118.4050304@egenix.com>
In-reply-to	<1254417403.92.0.577059221924.issue7029@psf.upfronthosting.co.za>

Content
Kristján Valur Jónsson wrote: > Thanks for the patch. Here's a quick review (a bit terse, but hope you don't mind)... > The attached patch contains suggested fixes to pybench.py: > 1) add a processtime timer for windows I'm not sure why you added this: the systimes.py module already has a ctypes wrapper for kernel32.GetProcessTimes() ?! All you have to do is use this command line option to enable it: --timer systimes.processtime > 2) fix a bug in timer selection: timer wasn't put in 'self' Thanks for spotting this one. > 3) Collect 'gross' times for each round I'm assuming that "gross" = test time + overhead. Is that right ? I like the idea, but not the implementation :-) Don't like my own pybench statistics implementation much either. I plan to rewrite it for Python 2.7. > 4) show statistics for per-round sums, both for sum of "adjusted" times > for each round, and 'gross' time per round. > > The last one is important. Traditionally, the focus would be on > individual tests. The series of individual timings for each test would > be examined, min and afverage found. These minima and averages would > then be summed up to get a total time. These results are very noisy. > In addition, the sum of individual minima is not a linear operator that > has any meaning and is thus extremely noisy. > > Looking at the minimum and average of the sum is a much stabler > indication of total benchmark performance. I agree. For the minimum times, the minimum over all tests should be used. > Another thing that I found when working with this, is that using > "calibration" significantly adds to noise and can produce incorrect > results. It adds a level of non-linearity to the results that can > appear very strange. Typically the 'overhead' is so low that we should > consider skipping the calibration. The purpose of the benchmark must be > to measure the performance of python in context. The meaning of > individual operations in isolation are pretty meaningless. Although I > didn't change the default number of calibration runs from 10, I would > suggest documenting that it may be useful to turn it to 0 to increase > predictability and get numbers indicative of real performance. The idea behind the calibration is to hide away the overhead of setting up the test. You're right, that nowadays the tests run so fast, that the calibration causes more noise than do good. This is due to the fact that the number of iteration rounds and test "packs" were chosen in 2006. Back then, both Python and the CPUs were a lot slower. OTOH, timers have not gotten a lot more accurate. As a result, the measurements of pybench on todays machines have far more noise than they did in 2006. The only way to resolve this is to adjust all tests - something which I also plan to do in time for Python 2.7.

Kristján Valur Jónsson wrote:
>

Thanks for the patch. Here's a quick review (a bit terse, but
hope you don't mind)...

> The attached patch contains suggested fixes to pybench.py:
> 1) add a processtime timer for windows

I'm not sure why you added this: the systimes.py module
already has a ctypes wrapper for kernel32.GetProcessTimes() ?!

All you have to do is use this command line option to
enable it: --timer systimes.processtime

> 2) fix a bug in timer selection: timer wasn't put in 'self'

Thanks for spotting this one.

> 3) Collect 'gross' times for each round

I'm assuming that "gross" = test time + overhead. Is that right ?

I like the idea, but not the implementation :-) Don't like my own
pybench statistics implementation much either. I plan to
rewrite it for Python 2.7.

> 4) show statistics for per-round sums, both for sum of "adjusted" times 
> for each round, and 'gross' time per round.
> 
> The last one is important.  Traditionally, the focus would be on 
> individual tests.  The series of individual timings for each test would 
> be  examined, min and afverage found.  These minima and averages would 
> then be summed up to get a total time.  These results are very noisy.  
> In addition, the sum of individual minima is not a linear operator that 
> has any meaning and is thus extremely noisy.
> 
> Looking at the minimum and average of the sum is a much stabler 
> indication of total benchmark performance.

I agree. For the minimum times, the minimum over all tests
should be used.

> Another thing that I found when working with this, is that using 
> "calibration" significantly adds to noise and can produce incorrect 
> results.  It adds a level of non-linearity to the results that can 
> appear very strange.  Typically the 'overhead' is so low that we should 
> consider skipping the calibration.  The purpose of the benchmark must be 
> to measure the performance of python in context.  The meaning of 
> individual operations in isolation are pretty meaningless.  Although I 
> didn't change the default number of calibration runs from 10, I would 
> suggest documenting that it may be useful to turn it to 0 to increase 
> predictability and get numbers indicative of real performance.

The idea behind the calibration is to hide away the overhead
of setting up the test. You're right, that nowadays the tests
run so fast, that the calibration causes more noise than do
good.

This is due to the fact that the number of iteration rounds
and test "packs" were chosen in 2006. Back then, both Python
and the CPUs were a lot slower.

OTOH, timers have not gotten a lot more accurate.

As a result, the measurements of pybench on todays machines
have far more noise than they did in 2006.

The only way to resolve this is to adjust all tests - something
which I also plan to do in time for Python 2.7.

History
Date	User	Action	Args
2009-10-01 18:12:48	lemburg	set	recipients: + lemburg, kristjan.jonsson
2009-10-01 18:12:45	lemburg	link	issue7029 messages
2009-10-01 18:12:42	lemburg	create