A more recent discussion of this on python-dev:

The situation there appears to be a case of "Hand off an OS level thread from the creating interpreter to a different subinterpreter. As far as I can tell, calling GILState_Ensure in such a thread will still acquire the GIL of the creating interpreter (or something equally nonsensical)."

It's a single-threaded application using subinterpreters, but all the callbacks from the NumPy code end up hitting the original interpreter that initialised the thread local state in the main thread.
