Title: Problem with signals in a single-threaded application
Type: Stage:
Components: Library (Lib) Versions: Python 2.5
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: gvanrossum Nosy List: Rhamphoryncus, gvanrossum, loewis, tim.peters, tony_nelson, ulissesf
Priority: normal Keywords:

Created on 2007-01-24 18:14 by ulissesf, last changed 2007-12-10 23:04 by gvanrossum. This issue is now closed.

File name Uploaded Description Edit ulissesf, 2007-01-24 18:14 Test program to reproduce the probem with signals in a single-threaded application.
signals-v0.patch ulissesf, 2007-01-24 19:46 First attempt to solve the problem.
signals-v1.patch ulissesf, 2007-01-24 21:09 Second attempt to solve the problem.
Messages (13)
msg31080 - (view) Author: Ulisses Furquim (ulissesf) Date: 2007-01-24 18:14
I'm aware of the problems with signals in a multithreaded application, but I was using signals in a single-threaded application and noticed something that seemed wrong. Some signals were apparently being lost, but when another signal came in the python handler for that "lost" signal was being called.

The problem seems to be inside the signal module. The global variable is_tripped is incremented every time a signal arrives. Then, inside PyErr_CheckSignals() (the pending call that calls all python handlers for signals that arrived) we can return immediately if is_tripped is zero. If is_tripped is different than zero, we loop through all signals calling the registered python handlers and after that we zero is_tripped. This seems to be ok, but what happens if a signal arrives after we've returned from its handler (or even after we've checked if that signal arrived) and before we zero is_tripped? I guess we can have a situation where is_tripped is zero but some Handlers[i].tripped are not. In fact, I've inserted some debugging output and could see that this actually happens and then I've written the attached test program to reproduce the problem.

When we run this program, the handler for the SIGALRM isn't called after we return from the  SIGIO handler. We return to our main loop and print 'Loop!' every 3 seconds aprox. and the SIGALRM handler is called only when another signal arrives (like when we hit Ctrl-C).
msg31081 - (view) Author: Ulisses Furquim (ulissesf) Date: 2007-01-24 19:46
This patch is very simple. We didn't want to remove the is_tripped variable because PyErr_CheckSignals() is called several times directly so it would be nice if we could return immediately if no signals arrived. We also didn't want to run the registered handlers with any set of signals blocked. Thus, we thought of zeroing is_tripped as soon as we know there are signals to be handled (after we test is_tripped). This way most of the times we can return immediately because is_tripped is zero and we also don't need to block any signals. However, with this approach we can have a situation where is_tripped isn't zero but we have no signals to handle, so we'll loop through all signals and no registered handler will be called. This happens when we receive a signal after we zero is_tripped and before we check Handlers[i].tripped for that signal. Any comments?
File Added: signals-v0.patch
msg31082 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2007-01-24 20:19
Very nice!  I'd add a description of the minor pathology remaining you described here as a code comment, at the point is_tripped is set to 0.  If this stuff were screamingly obvious, the bug you fixed wouldn't have persisted for 15 years ;-)
msg31083 - (view) Author: Tony Nelson (tony_nelson) Date: 2007-01-24 20:24
ISTM that is_tripped should be zeroed after the test for threading, so that signals will finally get handled when the proper thread is running.
msg31084 - (view) Author: Ulisses Furquim (ulissesf) Date: 2007-01-24 21:09
Yep, you're right, Tony Nelson. We overlooked this case but we can zero is_tripped after the test for threading as you've already said. The patch was updated and it also includes the code comment Tim Peters suggested. Please, I don't know if the wording is right so feel free to comment on it. I still plan to write a test case for the problem being solved (as soon as I understand how work :-).

File Added: signals-v1.patch
msg31085 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2007-01-28 11:57
Your PyErr_SetInterrupt needs to set is_tripped twice, like so:

    is_tripped = 1;
    Handlers[SIGINT].tripped = 1;
    Py_AddPendingCall((int (*)(void *))PyErr_CheckSignals, NULL);

    is_tripped = 1;

The reason is that the signal handler run in a thread while the main thread goes through check
msg31086 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2007-01-28 12:02
Augh, bloody firefox messed up my focus.

Your PyErr_SetInterrupt needs to set the flags after, like so:

    Py_AddPendingCall((int (*)(void *))PyErr_CheckSignals, NULL);
    Handlers[SIGINT].tripped = 1;
    is_tripped = 1;

The reason is that the signal handler run in a thread while the main thread goes through PyErr_CheckSignals, the main thread may notice the flags, clear them flags, find nothing, then exit.  You need the signal handler to supply all the data before setting the flags.

Really though, if you fix enough signal problems you'll converge with the patch at
No need for two patches that do the same thing.
msg31087 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-01-29 08:13
What I dislike about #1564547 is the introduction of the pipe. I don't think this is an appropriate change, and unnecessary to fix the problems discussed here. So if one of the patches is dropped, I'd rather drop #1564547.

Also, I don't think it is necessary to set .tripped after Py_AddPendingCall. If there is a CheckSignals invocation already going on, it will invoke the handler just fine. What *is* necessary (IMO) is to set is_tripped after setting .tripped: Otherwise, an in-progress CheckSignals call might clear is_tripped before .tripped gets set, and thus not invoke the signal handler. The subsequent CheckSignals would quit early because is_tripped is not set.

So I think "a" right sequence is

  Handlers[SIGINT].tripped = 1;
  is_tripped = 1; /* Set is_tripped after setting .tripped, as it gets cleared before .tripped. */
  Py_AddPendingCall((int (*)(void *))PyErr_CheckSignals, NULL);

msg31088 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2007-01-29 21:45
To my knowledge, a pipe is the *only* way to reliably wakeup the main thread from a signal handler in another thread.  It's not necessary here simply because this bug only names a subset of the signal problems, whereas #1564547 attempts to fix all of them.  Dropping it would be silly unless it were officially declared that the signal module and the threading module were incompatible.

You're right about the .tripped/Py_AddPendingCall order.  I got myself confused as to what Py_AddPendingCall did.
msg31089 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-01-29 22:04
rhamphoryncus, see the discussion on #1564547 about that patch. I believe there are better ways to address the issues it raises, in particular by means of pthread_kill. It's certainly more reliable than a pipe (which wakes up the main thread only if it was polling the pipe).
msg58377 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-12-10 22:30
Discussing this with Rhamporyncus (Adam Olson) on #python-dev now.
msg58380 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-12-10 23:00
Committed revision 59460 to 2.6.

Will backport to 2.5 as well.
msg58381 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-12-10 23:04
Committed revision 59461 to 2.5.

Thanks, Ulisses!!
Date User Action Args
2007-12-10 23:04:12gvanrossumsetstatus: open -> closed
messages: + msg58381
2007-12-10 23:00:41gvanrossumsetresolution: accepted
messages: + msg58380
2007-12-10 22:30:10gvanrossumsetassignee: gvanrossum
messages: + msg58377
nosy: + gvanrossum
2007-01-24 18:14:14ulissesfcreate