With the latest patch the decimal benchmark with a lot of small
allocations is consistently 2% slower. Large factorials (where
the operands are initialized to zero for the number-theoretic
transform) have the same performance with and without the patch.

It would be interesting to see some NumPy benchmarks (Nathaniel?).
