This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients methane, python-dev, serhiy.storchaka, vstinner, yselivanov
Date 2017-02-02.11:20:58
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1486034459.05.0.0806431357324.issue29263@psf.upfronthosting.co.za>
In-reply-to
Content
Naoki> I confirmed bm_mako performance degrade is caused by L1 cache miss.

I know this performance instability very well, the issue is called "code placement":
https://haypo.github.io/journey-to-stable-benchmark-deadcode.html

I tried to fight it with GCC __attribute__((hot)) in the issue #28618, but it doesn't fix the issue (at least, not completely):
http://bugs.python.org/issue28618#msg281459

In my experience, the best fix is PGO. Slowly, I consider that it's worthless to try to "fight" against code placement, and that benchmark results are only reliable when PGO compilation was used. Otherwise, you should ignore small performance differences. Problem: What is the threshold? 5%? 10%? I already noticed a difference up to 70% only caused by code placement!
https://haypo.github.io/analysis-python-performance-issue.html

--

I ran benchmarks on loadmethod-methoddescr.patch on haypo@speed-python with LTO+PGO. If you only show performance difference of at least 5%, only 3 benchmarks are significant and are faster:
---
haypo@speed-python$ python3 -m perf compare_to ~/benchmarks/*762a93935afd*json loadmethod-methoddesc_ref_762a93935afd.json  -G --min-speed=5
Faster (3):
- regex_v8: 50.3 ms +- 0.4 ms -> 43.2 ms +- 0.3 ms: 1.17x faster (-14%)
- scimark_monte_carlo: 230 ms +- 6 ms -> 208 ms +- 4 ms: 1.11x faster (-10%)
- scimark_lu: 390 ms +- 17 ms -> 370 ms +- 13 ms: 1.05x faster (-5%)

Benchmark hidden because not significant (61): (...)
---

In my experience, regex_v8 and scimark_* benchmarks are not really reliable. I'm not surprised to not see a major speedup on performance, since the speedup on microbenchmarks is only 10% faster.

IHMO 10% faster on method calls is significant enough, since it's a core, very common, and widely used Python feature.

--

To be more explicitl:  loadmethod-methoddescr.patch LGTM except of minor comments on the review.
History
Date User Action Args
2017-02-02 11:20:59vstinnersetrecipients: + vstinner, methane, python-dev, serhiy.storchaka, yselivanov
2017-02-02 11:20:59vstinnersetmessageid: <1486034459.05.0.0806431357324.issue29263@psf.upfronthosting.co.za>
2017-02-02 11:20:59vstinnerlinkissue29263 messages
2017-02-02 11:20:58vstinnercreate