Message 259469 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	brett.cannon, pitrou, vstinner, yselivanov
Date	2016-02-03.10:38:49
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1454495929.85.0.601853644385.issue26275@psf.upfronthosting.co.za>
In-reply-to

Content
Hi, I'm working on some optimizations projects like FAT Python (PEP 509: issue #26058, PEP 510: issue #26098, and PEP 511: issue #26145) and faster memory allocators (issue #26249). I have the feeling that perf.py output is not reliable even if it takes more than 20 minutes :-/ Maybe because Yury told that I must use -r (--rigorous) :-) Example with 5 runs of "python3 perf.py ../default/python ../default/python.orig -b regex_v8": --------------- Report on Linux smithers 4.3.3-300.fc23.x86_64 #1 SMP Tue Jan 5 23:31:01 UTC 2016 x86_64 x86_64 Total CPU cores: 8 ### regex_v8 ### Min: 0.043237 -> 0.050196: 1.16x slower Avg: 0.043714 -> 0.050574: 1.16x slower Significant (t=-19.83) Stddev: 0.00171 -> 0.00174: 1.0178x larger ### regex_v8 ### Min: 0.042774 -> 0.051420: 1.20x slower Avg: 0.043843 -> 0.051874: 1.18x slower Significant (t=-14.46) Stddev: 0.00351 -> 0.00176: 2.0009x smaller ### regex_v8 ### Min: 0.042673 -> 0.048870: 1.15x slower Avg: 0.043726 -> 0.050474: 1.15x slower Significant (t=-8.74) Stddev: 0.00283 -> 0.00467: 1.6513x larger ### regex_v8 ### Min: 0.044029 -> 0.049445: 1.12x slower Avg: 0.044564 -> 0.049971: 1.12x slower Significant (t=-13.97) Stddev: 0.00175 -> 0.00211: 1.2073x larger ### regex_v8 ### Min: 0.042692 -> 0.049084: 1.15x slower Avg: 0.044295 -> 0.050725: 1.15x slower Significant (t=-7.00) Stddev: 0.00421 -> 0.00494: 1.1745x larger --------------- I'm only care of the "Min", IMHO it's the most interesting information here. The slowdown is betwen 12% and 20%, for me it's a big difference. It looks like some benchmarks have very short iterations compare to others. For example, bm_json_v2 takes around 3 seconds, whereas bm_regex_v8 only takes less than 0.050 second (50 ms). $ python3 performance/bm_json_v2.py -n 3 --timer perf_counter 3.310384973010514 3.3116717970115133 3.3077902760123834 $ python3 performance/bm_regex_v8.py -n 3 --timer perf_counter 0.0670697659952566 0.04515827298746444 0.045114840992027894 Do you think that bm_regex_v8 is reliable? I see that there is an "iteration scaling" to use run the benchmarks with more iterations. Maybe we can start to increase the "iteration scaling" for bm_regex_v8? Instead of a fixed number of iterations, we should redesign benchmarks to use time. For example, one iteration must take at least 100 ms and should not take more than 1 second (but take longer to get more reliable results). Then the benchmark is responsible to ajust internal parameters. I used this design for my "benchmark.py" script which is written to get "reliable" microbenchmarks: https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py?fileviewer=file-view-default The script is based on time and calibrate a benchmark. It also uses the effictive resolution of the clock used by the benchmark to calibrate the benchmark. I will maybe work on such patch, but it would be good to know first your opinion on such change. I guess that we should use the base python to calibrate the benchmark and then pass the same parameters to the modified python.

Hi, I'm working on some optimizations projects like FAT Python (PEP 509: issue #26058, PEP 510: issue #26098, and PEP 511: issue #26145) and faster memory allocators (issue #26249).

I have the *feeling* that perf.py output is not reliable even if it takes more than 20 minutes :-/ Maybe because Yury told that I must use -r (--rigorous) :-)

Example with 5 runs of "python3 perf.py ../default/python ../default/python.orig -b regex_v8":

---------------
Report on Linux smithers 4.3.3-300.fc23.x86_64 #1 SMP Tue Jan 5 23:31:01 UTC 2016 x86_64 x86_64
Total CPU cores: 8

### regex_v8 ###
Min: 0.043237 -> 0.050196: 1.16x slower
Avg: 0.043714 -> 0.050574: 1.16x slower
Significant (t=-19.83)
Stddev: 0.00171 -> 0.00174: 1.0178x larger

### regex_v8 ###
Min: 0.042774 -> 0.051420: 1.20x slower
Avg: 0.043843 -> 0.051874: 1.18x slower
Significant (t=-14.46)
Stddev: 0.00351 -> 0.00176: 2.0009x smaller

### regex_v8 ###
Min: 0.042673 -> 0.048870: 1.15x slower
Avg: 0.043726 -> 0.050474: 1.15x slower
Significant (t=-8.74)
Stddev: 0.00283 -> 0.00467: 1.6513x larger

### regex_v8 ###
Min: 0.044029 -> 0.049445: 1.12x slower
Avg: 0.044564 -> 0.049971: 1.12x slower
Significant (t=-13.97)
Stddev: 0.00175 -> 0.00211: 1.2073x larger

### regex_v8 ###
Min: 0.042692 -> 0.049084: 1.15x slower
Avg: 0.044295 -> 0.050725: 1.15x slower
Significant (t=-7.00)
Stddev: 0.00421 -> 0.00494: 1.1745x larger
---------------

I'm only care of the "Min", IMHO it's the most interesting information here.

The slowdown is betwen 12% and 20%, for me it's a big difference.


It looks like some benchmarks have very short iterations compare to others. For example, bm_json_v2 takes around 3 seconds, whereas bm_regex_v8 only takes less than 0.050 second (50 ms).

$ python3 performance/bm_json_v2.py -n 3 --timer perf_counter
3.310384973010514
3.3116717970115133
3.3077902760123834

$ python3 performance/bm_regex_v8.py -n 3 --timer perf_counter
0.0670697659952566
0.04515827298746444
0.045114840992027894

Do you think that bm_regex_v8 is reliable? I see that there is an "iteration scaling" to use run the benchmarks with more iterations. Maybe we can start to increase the "iteration scaling" for bm_regex_v8?

Instead of a fixed number of iterations, we should redesign benchmarks to use time. For example, one iteration must take at least 100 ms and should not take more than 1 second (but take longer to get more reliable results). Then the benchmark is responsible to ajust internal parameters.

I used this design for my "benchmark.py" script which is written to get "reliable" microbenchmarks:
https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py?fileviewer=file-view-default

The script is based on time and calibrate a benchmark. It also uses the *effictive* resolution of the clock used by the benchmark to calibrate the benchmark.

I will maybe work on such patch, but it would be good to know first your opinion on such change.

I guess that we should use the base python to calibrate the benchmark and then pass the same parameters to the modified python.

History
Date	User	Action	Args
2016-02-03 10:38:49	vstinner	set	recipients: + vstinner, brett.cannon, pitrou, yselivanov
2016-02-03 10:38:49	vstinner	set	messageid: <1454495929.85.0.601853644385.issue26275@psf.upfronthosting.co.za>
2016-02-03 10:38:49	vstinner	link	issue26275 messages
2016-02-03 10:38:49	vstinner	create