I quite like the last idea. Something like:

_PyEval_SuspendOtherThreads(PyThreadState *tstate, PyThread_lock_type lock);

  All threads other than tstate will be prevented from executing further interpreter bytecodes until "lock" is released.

Offering this API might pose a problem for various "superinstruction" concepts in the future, though.
