> I'm not sure that the C implementation of factorial() is much faster

The inner loop (see the for loop in factorial_partial_product) operates on C unsigned longs, so I'd be surprised if a Python implementation were competitive.  But it would be interesting to see timings.
