Message 409383 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	mark.dickinson
Recipients	PedanticHacker, Stefan Pochmann, mark.dickinson, mcognetta, rhettinger, serhiy.storchaka, tim.peters
Date	2021-12-30.19:38:10
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1640893091.04.0.166293247924.issue37295@roundup.psfhosted.org>
In-reply-to

Content
Thanks Tim for spotting the stupid mistake. The reworked timings are a bit more ... plausible. tl;dr: On my machine, Raymond's suggestion gives a 2.2% speedup in the case where POPCNT is not available, and a 0.45% slowdown in the case that it _is_ available. Given that, and the fact that a single-instruction population count is not as readily available as I thought it was, I'd be happy to change the implementation to use the trailing zero counts as suggested. I'll attach the scripts I used for timing and analysis. There are two of them: "timecomb.py" produces a single timing. "driver.py" repeatedly switches branches, re-runs make, runs "timecomb.py", then assembles the results. I ran the driver.py script twice: once with a regular `./configure` step, and once with `./configure CFLAGS="-march=haswell"`. Below, "base" refers to the code currently in master; "alt" is the branch with Raymond's suggested change on it. Output from the script for the normal ./configure Mean time for base: 40.130ns Mean for alt: 39.268ns Speedup: 2.19% Ttest_indResult(statistic=7.9929245698581415, pvalue=1.4418376402220854e-14) Output for CFLAGS="-march=haswell": Mean time for base: 39.612ns Mean for alt: 39.791ns Speedup: -0.45% Ttest_indResult(statistic=-6.75385578636895, pvalue=5.119724894191512e-11)

Thanks Tim for spotting the stupid mistake. The reworked timings are a bit more ... plausible.

tl;dr: On my machine, Raymond's suggestion gives a 2.2% speedup in the case where POPCNT is not available, and a 0.45% slowdown in the case that it _is_ available. Given that, and the fact that a single-instruction population count is not as readily available as I thought it was, I'd be happy to change the implementation to use the trailing zero counts as suggested.

I'll attach the scripts I used for timing and analysis. There are two of them: "timecomb.py" produces a single timing. "driver.py" repeatedly switches branches, re-runs make, runs "timecomb.py", then assembles the results.

I ran the driver.py script twice: once with a regular `./configure` step, and once with `./configure CFLAGS="-march=haswell"`. Below, "base" refers to the code currently in master; "alt" is the branch with Raymond's suggested change on it.

Output from the script for the normal ./configure

    Mean time for base: 40.130ns
    Mean for alt: 39.268ns
    Speedup: 2.19%
    Ttest_indResult(statistic=7.9929245698581415, pvalue=1.4418376402220854e-14)

Output for CFLAGS="-march=haswell":

    Mean time for base: 39.612ns
    Mean for alt: 39.791ns
    Speedup: -0.45%
    Ttest_indResult(statistic=-6.75385578636895, pvalue=5.119724894191512e-11)

History
Date	User	Action	Args
2021-12-30 19:38:11	mark.dickinson	set	recipients: + mark.dickinson, tim.peters, rhettinger, serhiy.storchaka, PedanticHacker, mcognetta, Stefan Pochmann
2021-12-30 19:38:11	mark.dickinson	set	messageid: <1640893091.04.0.166293247924.issue37295@roundup.psfhosted.org>
2021-12-30 19:38:11	mark.dickinson	link	issue37295 messages
2021-12-30 19:38:10	mark.dickinson	create