About "I hesitate between the C types "int" and "Py_ssize_t" for nargs. I read once that using "int" can cause performance issues on a loop using "i++" and "data[i]" because the compiler has to handle integer overflow of the int type."

This is true because of -fwrapv, but I believe it is true also for Py_ssize_t which is also of signed type. However, there would be a speed-up achievable by disabling -fwrapv, because only then the i++; data[i] can be safely optimized into *(++data)
