Message409373
[Mark]
> I ran some timings for comb(k, 67) on my macOS / Intel MacBook Pro,
> using timeit to time calls to a function that looked like this:
>
> def f(comb):
> for k in range(68):
> for _ in range(256):
> comb(k, 67)
> comb(k, 67)
> ... # 64 repetitions of comb(k, 67) in all
I'm assuming you meant to write comb(67, k) instead, since the comb(k, 67) given is 0 at all tested k values except for k=67, and almost never executes any of the code in question.
It's surprising to me that even the long-winded popcount code was faster! The other way needs to read up 3 1-byte values from a trailing zero table, but the long-winded popcount emulation needs to read up 4 4-byte mask constants (or are they embedded in the instruction stream?), in addition to doing many more bit-fiddling operations (4 shifts, 4 "&" masks, 3 add/subtract, and a multiply - compared to just 2 add/subtract).
So if the results are right, Intel timings make no sense to me at all ;-) |
|
Date |
User |
Action |
Args |
2021-12-30 16:51:29 | tim.peters | set | recipients:
+ tim.peters, rhettinger, mark.dickinson, serhiy.storchaka, PedanticHacker, mcognetta, Stefan Pochmann |
2021-12-30 16:51:29 | tim.peters | set | messageid: <1640883089.36.0.741603405852.issue37295@roundup.psfhosted.org> |
2021-12-30 16:51:29 | tim.peters | link | issue37295 messages |
2021-12-30 16:51:29 | tim.peters | create | |
|