Message 409373 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	tim.peters
Recipients	PedanticHacker, Stefan Pochmann, mark.dickinson, mcognetta, rhettinger, serhiy.storchaka, tim.peters
Date	2021-12-30.16:51:29
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1640883089.36.0.741603405852.issue37295@roundup.psfhosted.org>
In-reply-to

Content
[Mark] > I ran some timings for comb(k, 67) on my macOS / Intel MacBook Pro, > using timeit to time calls to a function that looked like this: > > def f(comb): > for k in range(68): > for _ in range(256): > comb(k, 67) > comb(k, 67) > ... # 64 repetitions of comb(k, 67) in all I'm assuming you meant to write comb(67, k) instead, since the comb(k, 67) given is 0 at all tested k values except for k=67, and almost never executes any of the code in question. It's surprising to me that even the long-winded popcount code was faster! The other way needs to read up 3 1-byte values from a trailing zero table, but the long-winded popcount emulation needs to read up 4 4-byte mask constants (or are they embedded in the instruction stream?), in addition to doing many more bit-fiddling operations (4 shifts, 4 "&" masks, 3 add/subtract, and a multiply - compared to just 2 add/subtract). So if the results are right, Intel timings make no sense to me at all ;-)

[Mark]
> I ran some timings for comb(k, 67) on my macOS / Intel MacBook Pro,
> using timeit to time calls to a function that looked like this:
>
> def f(comb):
>     for k in range(68):
>         for _ in range(256):
>             comb(k, 67)
>             comb(k, 67)
>            ... # 64 repetitions of comb(k, 67) in all

I'm assuming you meant to write comb(67, k) instead, since the comb(k, 67) given is 0 at all tested k values except for k=67, and almost never executes any of the code in question.

It's surprising to me that even the long-winded popcount code was faster! The other way needs to read up 3 1-byte values from a trailing zero table, but the long-winded popcount emulation needs to read up 4 4-byte mask constants (or are they embedded in the instruction stream?), in addition to doing many more bit-fiddling operations (4 shifts, 4 "&" masks, 3 add/subtract, and a multiply - compared to just 2 add/subtract).

So if the results are right, Intel timings make no sense to me at all ;-)

History
Date	User	Action	Args
2021-12-30 16:51:29	tim.peters	set	recipients: + tim.peters, rhettinger, mark.dickinson, serhiy.storchaka, PedanticHacker, mcognetta, Stefan Pochmann
2021-12-30 16:51:29	tim.peters	set	messageid: <1640883089.36.0.741603405852.issue37295@roundup.psfhosted.org>
2021-12-30 16:51:29	tim.peters	link	issue37295 messages
2021-12-30 16:51:29	tim.peters	create