> Serhiy, I suggest you look at the code that Cython generates for its functions. It has been extensively profiled and optimised (years ago), so generating the same code for the argument clinic should yield the same performance.

Thanks, I'll look on it.

> And while I don't have exact numbers at hand, avoiding the tuple packing for the call by passing it into a METH_O function can make a substantial difference.

Good idea. Here are samples:

$ ./python -m timeit "chr(0x20ac)"
Unpatched: 1000000 loops, best of 3: 0.976 usec per loop
Patched:   1000000 loops, best of 3: 0.752 usec per loop

$ ./python -m timeit -s "from cmath import isnan; x = 1j" -- "isnan(x)"
Unpatched: 1000000 loops, best of 3: 0.62 usec per loop
Patched:   1000000 loops, best of 3: 0.386 usec per loop

Of course for more complex functions the effect is smaller.
