No problem, Mark! I just prefer the shallowest approaches that are "good enough". If it's materially faster to use xors and a popcount instead, fine by me, just provided a comment points to a clue about why that works.
BTW, the later xor version was clearer to me at first glance than what it replaced, the older
k.bit_count() + (n-k).bit_count() - n.bit_count()
The connection to "carries" is quite obscured there. Instead it's a straightforward coding of one statement of Kummer's theorem for multinomial coefficients: the highest power of a prime p dividing the multinomial coefficient M(n; k1, k2, k3, ...), where sum(k_i)=n, is the sum of the digits of k1, k2, ... when expressed in base p, less n, then divided by p-1. So, for p=2 and M(n; k, n-k), that's exactly the same (and leaving out the no-op of dividing by p-1=1 in the p=2 case).
Which in turn is, I think, easiest derived not from thinking about carries, but from mechanically plugging in 3 instances of that the highest power of p dividing i! is i minus the sum of the digits of i in base p, then divided by p-1. That in turn is easy to show by considering what Legendre's formula does in each digit position (and "carries" don't come up in that line of proof). |