The implementation of math.comb() is nice in that it limits the size of intermediate values as much as possible and it avoids unnecessary steps when min(k,n-k) is small with respect to k.
There are some ways it can be made better or faster:
1) For small values of n, there is no need to use PyLong objects at every step. Plain C integers would suffice and we would no longer need to make repeated allocations for all the intermediate values. For 32-bit builds, this could be done for n<=30. For 64-bit builds, it applies to n<=62. Adding this fast path would likely give a 10x to 50x speed-up.
2) For slightly larger values of n, the computation could be sped-up by precomputing one or more rows of binomial coefficients (perhaps for n=75, n=150, and n=225). The calculation could then start from that row instead of from higher rows on Pascal's triangle.
For example comb(100, 55) is currently computed as:
comb(55,1) * 56 // 2 * 57 // 3 ... 100 // 45 <== 45 steps
Instead, it could be computed as:
comb(75,20) * 76 // 21 * 77 // 22 ... 100 / 45 <== 25 steps
^-- found by table lookup
This gives a nice speed-up in exchange for a little memory in an array of constants (for n=75, we would need an array of length 75//2 after exploiting symmetry). Almost all cases would should show some benefit and in favorable cases like comb(76, 20) the speed-up would be nearly 75x.
3) When k is close to n/2, the current algorithm is slower than just computing (n!) / (k! * (n-k)!). However, the factorial method comes at the cost of more memory usage for large n. The factorial method consumes memory proportional to n*log2(n) while the current early-cancellation method uses memory proportional to n+log2(n). Above some threshold for memory pain, the current method should always be preferred. I'm not sure the factorial method should be used at all, but it is embarrassing that factorial calls sometimes beat the current C implementation:
$ python3.8 -m timeit -r 11 -s 'from math import comb, factorial as fact' -s 'n=100_000' -s 'k = n//2' 'comb(n, k)'
1 loop, best of 11: 1.52 sec per loop
$ python3.8 -m timeit -r 11 -s 'from math import comb, factorial as fact' -s 'n=100_000' -s 'k = n//2' 'fact(n) // (fact(k) * fact(n-k))'
1 loop, best of 11: 518 msec per loop
4) For values such as n=1_000_000 and k=500_000, the running time is very long and the routine doesn't respond to SIGINT. We could add checks for keyboard interrupts for large n. Also consider releasing the GIL.
5) The inner-loop current does a pure python subtraction than could in many cases be done with plain C integers. When n is smaller than maxsize, we could have a code path that replaces "PyNumber_Subtract(factor, _PyLong_One)" with something like "PyLong_FromUnsignedLongLong((unsigned long long)n - i)". |