Author tim.peters
Recipients PedanticHacker, Stefan Pochmann, mark.dickinson, mcognetta, rhettinger, serhiy.storchaka, tim.peters
Date 2021-12-30.16:51:29
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1640883089.36.0.741603405852.issue37295@roundup.psfhosted.org>
In-reply-to
Content
[Mark]
> I ran some timings for comb(k, 67) on my macOS / Intel MacBook Pro,
> using timeit to time calls to a function that looked like this:
>
> def f(comb):
>     for k in range(68):
>         for _ in range(256):
>             comb(k, 67)
>             comb(k, 67)
>            ... # 64 repetitions of comb(k, 67) in all

I'm assuming you meant to write comb(67, k) instead, since the comb(k, 67) given is 0 at all tested k values except for k=67, and almost never executes any of the code in question.

It's surprising to me that even the long-winded popcount code was faster! The other way needs to read up 3 1-byte values from a trailing zero table, but the long-winded popcount emulation needs to read up 4 4-byte mask constants (or are they embedded in the instruction stream?), in addition to doing many more bit-fiddling operations (4 shifts, 4 "&" masks, 3 add/subtract, and a multiply - compared to just 2 add/subtract).

So if the results are right, Intel timings make no sense to me at all ;-)
History
Date User Action Args
2021-12-30 16:51:29tim.peterssetrecipients: + tim.peters, rhettinger, mark.dickinson, serhiy.storchaka, PedanticHacker, mcognetta, Stefan Pochmann
2021-12-30 16:51:29tim.peterssetmessageid: <1640883089.36.0.741603405852.issue37295@roundup.psfhosted.org>
2021-12-30 16:51:29tim.peterslinkissue37295 messages
2021-12-30 16:51:29tim.peterscreate