classification
Title: siginterrupt with flag=False is reset when signal received
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.1, Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: exarkun, gz, neologix, r.david.murray, spiv, vila
Priority: normal Keywords: needs review, patch

Created on 2010-04-09 06:50 by spiv, last changed 2010-05-23 15:34 by vila. This issue is now closed.

Files
File name Uploaded Description Edit
sig-test.py spiv, 2010-04-09 06:50 Sample program to demonstrate loss of siginterrupt flag
test_signal_siginterrupt.diff neologix, 2010-04-09 19:56 Patch to add check to test_signal
signal_noreinstall.diff neologix, 2010-04-09 19:56 Patch to remove signal reinstall from signal_handler when sigaction is used
Messages (16)
msg102688 - (view) Author: Andrew Bennetts (spiv) Date: 2010-04-09 06:50
The effect of signal.siginterrupt(somesig, False) is reset the first time a that signal is received.  This is not the documented behaviour, and I do not think this is a desireable behaviour.  It renders siginterrupt effectively useless at providing the robustness against EINTR it is intended to provide.

Attached is a fairly simple program to show this using SIGWINCH: run it in a resizeable terminal, and resize it twice.  Notice that on the second terminal resize (i.e. the second SIGWINCH signal) the program crashes with an EINTR from the os.read.

A partial workaround for the problem is to call signal.siginterrupt(somesig, False) again inside your signal handler, but it's very fragile.  It depends on Python getting a chance to run the Python function registered by the signal.signal call, but this is not guaranteed.  If there's frequent IO, that workaround might suffice.  For the sig-test.py example attached to this bug, it doesn't (try it).

The cause seems to be that signal_handler in signalmodule.c unconditionally does PyOS_setsig(sig_num, signal_handler) [except for SIGCHLD], which unconditionally invokes siginterrupt(sig, 1).  A possible fix would be to add a 'int siginterrupt_flag;' to the Handlers array, and arrange for that value to be passed instead of the hard-coded 1.  Another might be to not call PyOS_setsig from signal_handler at all -- I'm not sure why it is trying to reinstall itself, but perhaps there's some issue there I'm not aware of.
msg102725 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010-04-09 15:16
> The effect of signal.siginterrupt(somesig, False) is reset the first time a that signal is received.  This is not the documented behaviour, and I do not think this is a desireable behaviour.  It renders siginterrupt effectively useless at providing the robustness against EINTR it is intended to provide.

Actually, siginterrupt shouldn't be used.
The proper way is to use sigaction with SA_RESTART flag (and still, don't rely on SA_RESTART too much, certain syscalls are non restartable and this isn't realy portable).

> Another might be to not call PyOS_setsig from signal_handler at all -- I'm not sure why it is trying to reinstall itself, but perhaps there's some issue there I'm not aware of.

Because signal.signal might be implemented with sigaction() or signal() and the latter resets the default signal handler when the handler is called. This means that if your system doesn't support sigaction and and you don't reinstall it, then the handler will only get called the first time.
However, reinstalling the signal handler contains a race, because if a second signal comes before you reinstall it, it's handled by the default handler. That's why sigaction is much better (and calling PyOS_setsig unecessary when sigaction is available).

The problem you describe can happen with both sigaction and signal :

sigaction:
- you set your handler with signal.signal()
- sigaction() is called, and by default syscalls are not restarted (SA_RESTART is false)
- you call siginterrupt() with False, which juste reinstalls the handler with SA_RESTART to true
- the first signal arrives: signal_handler() schedules the call of your handler, and calls PyOS_setsig() 
- PyOS_setsig() reinstalls your handler (again, it's neither a good idea nor necessary with sigaction) _without_ SA_RESTART
- the second signal comes in
- you get a EINTR, game over

signal:
- you set your handler with signal.signal()
- signal() is called, and syscalls are not restarted by default
- you call siginterrupt() with False, which juste reinstalls the handler with SA_RESTART to true
- the first signal arrives: signal_handler() schedules the call of your handler, and calls PyOS_setsig() 
- PyOS_setsig() reinstalles your handler _without_ SA_RESTART (I think the flag is lost even before calling siginterrupt)
- the second signal comes in
- you get a EINTR, game over

So the simple fix when sigaction is available is simply to not call PyOS_setsig() from signal_handler.
When sigaction is not available, well, you have to recall that you want restartable syscalls, and call siginterrupt again with that value. But I think if the OS doesn't support sigaction, there's little chance it'll support siginterrupt.
(1) I just found out that Windows doesn't have sigaction, but I don't know Window much, so if someone could confirm that it doesn't support siginterrupt, then the fix would simply be to not reinstall handler when sigaction is available.
msg102742 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010-04-09 19:56
Attached are two patches:
- test_signal_siginterrupt.diff is a patch for Lib/test/test_signal.py to check for this problem (more than one signal received after calling signal.siginterrupt())
before:
$ ./python Lib/test/regrtest.py test_signal
test_signal
1 test OK.

after:
$ ./python Lib/test/regrtest.py test_signal
test_signal
test test_signal failed -- Traceback (most recent call last):
  File "/home/cf/python/trunk/Lib/test/test_signal.py", line 299, in test_siginterrupt_off
    self.assertEquals(i, False)
AssertionError: True != False

1 test failed:
    test_signal

- signal_noreinstall.diff is a patch against Modules/signalmodule.c which modifies signal_handler() to call PyOS_setsig() only when sigaction is not available, since in that case the signal handler doesn't need to be reinstalled. This solves this problem, and also saves a call to sigaction every time a signal is received (even if its probably doesn't cost much).

with patch and updated test:
$ ./python Lib/test/regrtest.py test_signal
test_signal
1 test OK.

Of course, this also corrects the problem with sig-test.py, the terminal can be resized indefinitely.
I also passed test_subprocess and test_socketserver just to be sure, but reviews are more than welcome.
msg103204 - (view) Author: Andrew Bennetts (spiv) Date: 2010-04-15 12:42
Your patches look good to me.

(They don't fix platforms without sigaction, but as you say they probably don't have siginterrupt, and even if they do they will still have an unfixable race.)

What's the next step?  I can't see an button to add this issue to the "Show Needing Review" queue.
msg103205 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-04-15 12:54
Will the modified test fail on platforms that don't define HAVE_SIGACTION?
msg103207 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010-04-15 13:36
> Will the modified test fail on platforms that don't define HAVE_SIGACTION?

Well, in theory, if the system has siginterrupt but not sigaction, it will fail. But as said, I don't think it's possible, see man siginterrupt:
"      This library routine uses an extension of the sigaction(2) system call
     that is not available in 4.2BSD, hence it should not be used if backward
     compatibility is needed."

and the test right now has this at the beginning:
if sys.platform[:3] in ('win', 'os2') or sys.platform == 'riscos':
    raise unittest.SkipTest("Can't test signal on %s" % \
                                   sys.platform)

So it pretty much assumes that any Unix system has siginterrupt, hence sigaction, so it should be safe.
So I'd say in theory, it will, but I don't think that siginterrupt can be available without sigaction anyway.
So feel free to modify the test, or not ;-)
msg103208 - (view) Author: Andrew Bennetts (spiv) Date: 2010-04-15 13:38
Are there any platforms that define HAVE_SIGINTERRUPT but that do not define HAVE_SIGACTION?  If there are, then yes I expect they would fail that test.

It would be a shame to delay this fix just because it doesn't fix all platforms... is it possible to mark that test as a known failure on a platform with siginterrupt but without sigaction?

My initial comment outlined a change that would work for all platforms, but would be more complex.  (I think the signal_noreinstall.diff change would be good to make even with the more complex approach, though.)
msg103209 - (view) Author: Andrew Bennetts (spiv) Date: 2010-04-15 13:45
FWIW, comparing the "change history" sections of <http://www.opengroup.org/onlinepubs/000095399/functions/siginterrupt.html> and <http://www.opengroup.org/onlinepubs/000095399/functions/sigaction.html> suggests that sigaction predates siginterrupt, but it's a wild and varied world out there.
msg103212 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010-04-15 14:01
Well, I just think that the probability of having siginterrupt without sigaction is far less than having a Unix system without siginterrupt (which the current test_signal assumes). Or just drop the patch for the test, it honestly doesn't bother me ;-)
msg103213 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-04-15 14:01
Yes, my question was directed at finding out if there were any platforms on which we'd have to add an additional skip (which would mean refactoring that test into two tests).  But if the buildbots are all happy after it is applied we can probably choose to not worry about it until/if we get a bug report.
msg103214 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2010-04-15 14:06
> Will the modified test fail on platforms that don't define HAVE_SIGACTION?

Only if they also have siginterrupt, which seems unlikely (as neologix explained).  The implemented behavior on such platforms is unmodified from current trunk, while the test requires new behavior.

I think it's probably worth testing this new behavior separately from the test for the existing behavior anyway, for the sake of clarity.  If it turns out there's a platform with siginterrupt but not sigaction, then that'll also let the test be skipped there, which is nice (though this case seems unlikely to come up).

I'm not exactly sure how we will know if it is expected to fail, though.  I don't think `HAVE_SIGACTION` is exposed nicely to Python right now.  Perhaps the signal module should make this information available (it's somewhat important now, as it will mean the difference between a nicely working signal.siginterrupt and a not-so-nicely working one).

This quirk probably also bears a mention in the siginterrupt documentation.

A few other thoughts on the code nearby:

  * There seems to be no good reason to special case SIGCHLD in signal_handler.  The comment about infinite recursion has no obvious interpretation to me.  Fortunately, this is irrelevant on platforms with sigaction, because the handler remains installed anyway!

  * The distance between the new #ifndef HAVE_SIGACTION check and the corresponding #ifdef HAVE_SIGACTION in pythonrun.c is somewhat unfortunate.  Someone could very easily change the logic in PyOS_setsig and disturb this relationship.  I'm not sure how this could best be fixed.  A general idea which presents itself is that both of these pieces of code could use a new identifier to make this choice, and that identifier could be named/documented such that it is obvious that it is used in two different places which must be kept synchronized.  Or, some more comments in both places might help.  It seems likely that the remaining SIGCHLD check in handle_signal is only there because whoever updated PyOS_setsig to use sigaction didn't know about it/remember it.

Aside from those points, the fix seems correct and good.
msg103243 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010-04-15 18:25
> * There seems to be no good reason to special case SIGCHLD in signal_handler.  The comment about infinite recursion has no obvious interpretation to me.  Fortunately, this is irrelevant on platforms with sigaction, because the handler remains installed anyway!

I think that it's possible, on certain systems, that setting a signal handler for SIGCHLD while there is un unwaited child (zombie) process triggers a new sending of SIGCHLD.
See http://standards.ieee.org/reading/ieee/interp/1003-1-90_int/pasc-1003.1-132.html:
"The SIGCHLD or SIGCLD were used for implementing job control.
Since I had implemented job control for both BSD and System V,
I pointed out to the standards group that except for SIG_DFL,
these signals had different semantics.

If a signal handler was set for SIGCLD, then a signal would
be generated if there were any unreaped child processes.
When the signal handler was caught in System V, it was reset
by default to SIG_DFL.  However, if a process did not
reap a child and instead reestablished the signal handler,
it would go into an infinite loop since the signal would
be generated again.  The SIGCLD SIG_IGN behavior was that
the system reaped the child when it completed so
that the application didn't have to deal with it.
However, I believe that a process blocked in wait() would
be awakened, but I am not certain of this.

The SIGCHLD signal on the other hand was generated when
a child completed if a signal handler was set at that time.
No signal would be generated if a signal handler was
established while there was waiting children.
The SIGCHLD signal was also generated when a child process stopped.
I believe that BSD treated SIGCHLD SIG_IGN the same way
that it treated SIGCHLD SIG_DFL."

So, there might exist out there a couple systems that merrily mix SIGCLD and SIGCHLD, and re-raise a SIGCHLD when setting a new handler for it when unwaited children exist (which is the case in signal_handle, since child processes can't be waited for before the main thread gets to execute the handler it set up).
msg103276 - (view) Author: Andrew Bennetts (spiv) Date: 2010-04-15 23:29
> I'm not exactly sure how we will know if it is expected to fail,
> though.  I don't think `HAVE_SIGACTION` is exposed nicely to Python
> right now.

It might be useful to have the contents of pyconfig.h exposed as a dict somehow.  Maybe call it sys._pyconfig_h?

A less ambitious change would be to expose just HAVE_SIGACTION as e.g. signal._have_sigaction.

I agree with r.david.murray that we probably don't need to bother unless a buildbot or someone reports that this test fails.
msg105136 - (view) Author: Martin (gz) Date: 2010-05-06 11:27
This patch has been reviewed by both Andrew and myself, it would be nice if someone made the time to land it. The test change is unlikely to break anything, and hey, that's what buildbots are for.
msg105185 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2010-05-07 07:44
I agree that this should be landed (for 2.6 and 2.7).  I think I can do it.  I made some changes to the tests, though.  It would be nice for someone to look those over and make sure the change still looks good.

I checked everything in to the siginterrupt-reset-issue8354 branch.  You can also find the diff at http://codereview.appspot.com/1134042/show
msg105369 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2010-05-09 03:30
Should be resolved in, oh, let's see, r81007, r81011, r81016, and r81018.  Thanks to everyone who helped out.
History
Date User Action Args
2010-05-23 15:34:57vilasetnosy: + vila
2010-05-09 03:30:06exarkunsetstatus: open -> closed
resolution: fixed
messages: + msg105369

versions: + Python 3.1
2010-05-07 07:44:53exarkunsetmessages: + msg105185
versions: + Python 2.7
2010-05-06 11:27:28gzsetnosy: + gz
messages: + msg105136
2010-04-15 23:29:13spivsetmessages: + msg103276
2010-04-15 18:25:29neologixsetmessages: + msg103243
2010-04-15 14:06:28exarkunsetmessages: + msg103214
2010-04-15 14:01:18r.david.murraysetmessages: + msg103213
2010-04-15 14:01:01neologixsetmessages: + msg103212
2010-04-15 13:45:56spivsetmessages: + msg103209
2010-04-15 13:38:54spivsetmessages: + msg103208
2010-04-15 13:36:31neologixsetmessages: + msg103207
2010-04-15 12:54:18r.david.murraysetnosy: + r.david.murray
messages: + msg103205
2010-04-15 12:42:17spivsetmessages: + msg103204
2010-04-15 12:26:52ezio.melottisetpriority: normal
keywords: + needs review
stage: patch review
2010-04-09 19:56:55neologixsetfiles: + signal_noreinstall.diff
2010-04-09 19:56:16neologixsetfiles: + test_signal_siginterrupt.diff
keywords: + patch
messages: + msg102742
2010-04-09 15:16:14neologixsetnosy: + neologix
messages: + msg102725
2010-04-09 12:38:29pitrousetnosy: + exarkun
2010-04-09 06:50:36spivcreate