One approach that avoids the use of floating-point arithmetic is to precompute the odd part of the factorial of n modulo 2**64, for all small n. If we also precompute the inverses, then three lookups and two 64x64->64 unsigned integer multiplications gets us the odd part of the combinations modulo 2**64, hence for small enough n and k gets us the actual odd part of the combinations.
Then a shift by a suitable amount gives comb(n, k).
Here's what that looks like in Python. The "% 2**64" operation obviously wouldn't be needed in C: we'd just do the computation with uint64_t and rely on the normal wrapping semantics. We could also precompute the bit_count values if that's faster.
import math
# Max n to compute comb(n, k) for.
Nmax = 67
# Precomputation
def factorial_odd_part(n):
f = math.factorial(n)
return f // (f & -f)
F = [factorial_odd_part(n) % 2**64 for n in range(Nmax+1)]
Finv = [pow(f, -1, 2**64) for f in F]
PC = [n.bit_count() for n in range(Nmax+1)]
# Fast comb for small values.
def comb_small(n, k):
if not 0 <= k <= n <= Nmax:
raise ValueError("k or n out of range")
return (F[n] * Finv[k] * Finv[n-k] % 2**64) << k.bit_count() + (n-k).bit_count() - n.bit_count()
# Exhaustive test
for n in range(Nmax+1):
for k in range(0, n+1):
assert comb_small(n, k) == math.comb(n, k) |