I've been experimenting with a modification of Serhiy's recurrence relation, using a different value of j rather than j=k//2.
The current design splits-off three ways when it recurses, so the number of calls grows quickly. For C(200,100), C(225,112), and C(250,125), the underlying 64 bit modular arithmetic routine is called 115 times, 150 times, and 193 times respectively.
But with another 2kb of precomputed values, it drops to 3, 16, and 26 calls.
The main idea is to precompute one diagonal of Pascal's triangle, starting where the 64-bit mod arithmetic version leaves off and going through a limit as high as we want, depending on our tolerance for table size. A table for C(n, 20) where 67 < n <= 225 takes 2101 bytes.
The new routine adds one line and modifies one line from the current code:
def C(n, k):
k = min(k, n - k)
if k == 0: return 1
if k == 1: return n
if k < len(k2n) and n <= k2n[k]: return ModArith64bit(n, k)
if k == FixedJ and n <= Jlim: return lookup_known(n) # New line
j = min(k // 2, FixedJ) # Modified
return C(n, j) * C(n-j, k-j) // C(k, j)
The benefit of pinning j to match the precomputed diagonal is that two of the three splits-offs are to known values where no further work is necessary. Given a table for C(n, 20), we get:
C(200, 100) = C(200, 20) * C(180, 80) // C(100, 20)
\_known_/ \_recurse_/ \_known_/
A proof of concept is attached. To make it easy to experiment with, the precomputed diagonal is stored in a dictionary. At the bottom, I show an equivalent function to be used in a C version.
It looks promising at this point, but I haven't run timings, so I am not sure this is a net win. |