I think two things can trigger this problem, both have to do with how signals are handled by the interpreter.
Contrarily to what you may think, when a signal is received, its handler is _not_ called. Instead, it's Modules/signalmodule.c signal_handler() that's called. This handler stores the reception of the signal inside a table, and schedules the execution of the associated handler for later:
signal_handler(int sig_num)
{
[...]
Handlers[sig_num].tripped = 1;
/* Set is_tripped after setting .tripped, as it gets
cleared in PyErr_CheckSignals() before .tripped. */
is_tripped = 1;
Py_AddPendingCall(checksignals_witharg, NULL);
[...]
}
checksignal_withargs() calls PyErr_CheckSignals(), which in turn calls the handler.
The pending calls are checked periodically from the interpreter main loop, in Python/ceval.c: when _Py_Ticker reaches 0, then we check for pending calls, and if there are any, we run the pending calls, hence checksignals_witharg, and the handler.
This is actually a documented behaviour, quoting signal documentation:
"Although Python signal handlers are called asynchronously as far as the Python user is concerned, they can only occur between the “atomic” instructions of the Python interpreter. This means that signals arriving during long calculations implemented purely in C (such as regular expression matches on large bodies of text) may be delayed for an arbitrary amount of time."
But there's a race, imagine this happens:
- a thread (or a process for that matter) receives a signal
- signal_handler schedules the associated handler
- before _Py_Ticker reaches 0 and is checked from the interpreter main loop, a blocking call is made
- since the process is blocked in the call, the main eval loop doesn't run, and the handler doesn't get called until the process leaves the call and enters the main eval loop again. If the call doesn't return (e.g. select without timeout), then the process remains stuck forever.
This problem can also happen even if the signal is sent after select is called:
- the main thread calls select
- the second thread runs, and sends a signal to the process
- the signal is not received by the main thread, but by the second thread
- the second thread schedules execution of the handler
- since the main thread is blocked in select, the handler never gets called
But this case is quite flaky, because the documentation warns you:
"Some care must be taken if both signals and threads are used in the same program. The fundamental thing to remember in using signals and threads simultaneously is: always perform signal() operations in the main thread of execution. Any thread can perform an alarm(), getsignal(), pause(), setitimer() or getitimer(); only the main thread can set a new signal handler, and the main thread will be the only one to receive signals (this is enforced by the Python signal module, even if the underlying thread implementation supports sending signals to individual threads). This means that signals can’t be used as a means of inter-thread communication. Use locks instead."
Sending signals to a process with multiple threads is risky, you should use locks.
Finally, I think that the documentation should be rephrased:
"and the main thread will be the only one to receive signals (this is enforced by the Python signal module, even if the underlying thread implementation supports sending signals to individual threads)."
It's false. What's guaranteed is that the signal handler will only be executed on behalf of the main thread, but any thread can _receive_ a signal.
And comments in Modules/signalmodule.c are misleading:
We still have the problem that in some implementations signals
generated by the keyboard (e.g. SIGINT) are delivered to all
threads (e.g. SGI), while in others (e.g. Solaris) such signals are
delivered to one random thread (an intermediate possibility would
be to deliver it to the main thread -- POSIX?). For now, we have
a working implementation that works in all three cases -- the
handler ignores signals if getpid() isn't the same as in the main
thread. XXX This is a hack.
Sounds strange. If only a thread other than the main thread receives the signal and you ignore it, then it's lost, isn't it ?
Furthermore, under Linux 2.6 and NPTL, getpid() returns the main thread PID even from another thread.
Peers ?
|