Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

siginterrupt with flag=False is reset when signal received #52601

Closed
spiv mannequin opened this issue Apr 9, 2010 · 16 comments
Closed

siginterrupt with flag=False is reset when signal received #52601

spiv mannequin opened this issue Apr 9, 2010 · 16 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@spiv
Copy link
Mannequin

spiv mannequin commented Apr 9, 2010

BPO 8354
Nosy @bitdancer, @bz2
Files
  • sig-test.py: Sample program to demonstrate loss of siginterrupt flag
  • test_signal_siginterrupt.diff: Patch to add check to test_signal
  • signal_noreinstall.diff: Patch to remove signal reinstall from signal_handler when sigaction is used
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2010-05-09.03:30:06.407>
    created_at = <Date 2010-04-09.06:50:36.754>
    labels = ['type-bug', 'library']
    title = 'siginterrupt with flag=False is reset when signal received'
    updated_at = <Date 2010-05-23.15:34:57.197>
    user = 'https://bugs.python.org/spiv'

    bugs.python.org fields:

    activity = <Date 2010-05-23.15:34:57.197>
    actor = 'vila'
    assignee = 'none'
    closed = True
    closed_date = <Date 2010-05-09.03:30:06.407>
    closer = 'exarkun'
    components = ['Library (Lib)']
    creation = <Date 2010-04-09.06:50:36.754>
    creator = 'spiv'
    dependencies = []
    files = ['16837', '16847', '16848']
    hgrepos = []
    issue_num = 8354
    keywords = ['patch', 'needs review']
    message_count = 16.0
    messages = ['102688', '102725', '102742', '103204', '103205', '103207', '103208', '103209', '103212', '103213', '103214', '103243', '103276', '105136', '105185', '105369']
    nosy_count = 6.0
    nosy_names = ['spiv', 'exarkun', 'vila', 'r.david.murray', 'gz', 'neologix']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue8354'
    versions = ['Python 2.6', 'Python 3.1', 'Python 2.7']

    @spiv
    Copy link
    Mannequin Author

    spiv mannequin commented Apr 9, 2010

    The effect of signal.siginterrupt(somesig, False) is reset the first time a that signal is received. This is not the documented behaviour, and I do not think this is a desireable behaviour. It renders siginterrupt effectively useless at providing the robustness against EINTR it is intended to provide.

    Attached is a fairly simple program to show this using SIGWINCH: run it in a resizeable terminal, and resize it twice. Notice that on the second terminal resize (i.e. the second SIGWINCH signal) the program crashes with an EINTR from the os.read.

    A partial workaround for the problem is to call signal.siginterrupt(somesig, False) again inside your signal handler, but it's very fragile. It depends on Python getting a chance to run the Python function registered by the signal.signal call, but this is not guaranteed. If there's frequent IO, that workaround might suffice. For the sig-test.py example attached to this bug, it doesn't (try it).

    The cause seems to be that signal_handler in signalmodule.c unconditionally does PyOS_setsig(sig_num, signal_handler) [except for SIGCHLD], which unconditionally invokes siginterrupt(sig, 1). A possible fix would be to add a 'int siginterrupt_flag;' to the Handlers array, and arrange for that value to be passed instead of the hard-coded 1. Another might be to not call PyOS_setsig from signal_handler at all -- I'm not sure why it is trying to reinstall itself, but perhaps there's some issue there I'm not aware of.

    @spiv spiv mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Apr 9, 2010
    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 9, 2010

    The effect of signal.siginterrupt(somesig, False) is reset the first time a that signal is received. This is not the documented behaviour, and I do not think this is a desireable behaviour. It renders siginterrupt effectively useless at providing the robustness against EINTR it is intended to provide.

    Actually, siginterrupt shouldn't be used.
    The proper way is to use sigaction with SA_RESTART flag (and still, don't rely on SA_RESTART too much, certain syscalls are non restartable and this isn't realy portable).

    Another might be to not call PyOS_setsig from signal_handler at all -- I'm not sure why it is trying to reinstall itself, but perhaps there's some issue there I'm not aware of.

    Because signal.signal might be implemented with sigaction() or signal() and the latter resets the default signal handler when the handler is called. This means that if your system doesn't support sigaction and and you don't reinstall it, then the handler will only get called the first time.
    However, reinstalling the signal handler contains a race, because if a second signal comes before you reinstall it, it's handled by the default handler. That's why sigaction is much better (and calling PyOS_setsig unecessary when sigaction is available).

    The problem you describe can happen with both sigaction and signal :

    sigaction:

    • you set your handler with signal.signal()
    • sigaction() is called, and by default syscalls are not restarted (SA_RESTART is false)
    • you call siginterrupt() with False, which juste reinstalls the handler with SA_RESTART to true
    • the first signal arrives: signal_handler() schedules the call of your handler, and calls PyOS_setsig()
    • PyOS_setsig() reinstalls your handler (again, it's neither a good idea nor necessary with sigaction) _without_ SA_RESTART
    • the second signal comes in
    • you get a EINTR, game over

    signal:

    • you set your handler with signal.signal()
    • signal() is called, and syscalls are not restarted by default
    • you call siginterrupt() with False, which juste reinstalls the handler with SA_RESTART to true
    • the first signal arrives: signal_handler() schedules the call of your handler, and calls PyOS_setsig()
    • PyOS_setsig() reinstalles your handler _without_ SA_RESTART (I think the flag is lost even before calling siginterrupt)
    • the second signal comes in
    • you get a EINTR, game over

    So the simple fix when sigaction is available is simply to not call PyOS_setsig() from signal_handler.
    When sigaction is not available, well, you have to recall that you want restartable syscalls, and call siginterrupt again with that value. But I think if the OS doesn't support sigaction, there's little chance it'll support siginterrupt.
    (1) I just found out that Windows doesn't have sigaction, but I don't know Window much, so if someone could confirm that it doesn't support siginterrupt, then the fix would simply be to not reinstall handler when sigaction is available.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 9, 2010

    Attached are two patches:

    • test_signal_siginterrupt.diff is a patch for Lib/test/test_signal.py to check for this problem (more than one signal received after calling signal.siginterrupt())
      before:
      $ ./python Lib/test/regrtest.py test_signal
      test_signal
      1 test OK.
    after:
    $ ./python Lib/test/regrtest.py test_signal
    test_signal
    test test_signal failed -- Traceback (most recent call last):
      File "/home/cf/python/trunk/Lib/test/test_signal.py", line 299, in test_siginterrupt_off
        self.assertEquals(i, False)
    AssertionError: True != False

    1 test failed:
    test_signal

    • signal_noreinstall.diff is a patch against Modules/signalmodule.c which modifies signal_handler() to call PyOS_setsig() only when sigaction is not available, since in that case the signal handler doesn't need to be reinstalled. This solves this problem, and also saves a call to sigaction every time a signal is received (even if its probably doesn't cost much).
    with patch and updated test:
    $ ./python Lib/test/regrtest.py test_signal
    test_signal
    1 test OK.

    Of course, this also corrects the problem with sig-test.py, the terminal can be resized indefinitely.
    I also passed test_subprocess and test_socketserver just to be sure, but reviews are more than welcome.

    @spiv
    Copy link
    Mannequin Author

    spiv mannequin commented Apr 15, 2010

    Your patches look good to me.

    (They don't fix platforms without sigaction, but as you say they probably don't have siginterrupt, and even if they do they will still have an unfixable race.)

    What's the next step? I can't see an button to add this issue to the "Show Needing Review" queue.

    @bitdancer
    Copy link
    Member

    Will the modified test fail on platforms that don't define HAVE_SIGACTION?

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 15, 2010

    Will the modified test fail on platforms that don't define HAVE_SIGACTION?

    Well, in theory, if the system has siginterrupt but not sigaction, it will fail. But as said, I don't think it's possible, see man siginterrupt:
    " This library routine uses an extension of the sigaction(2) system call
    that is not available in 4.2BSD, hence it should not be used if backward
    compatibility is needed."

    and the test right now has this at the beginning:
    if sys.platform[:3] in ('win', 'os2') or sys.platform == 'riscos':
    raise unittest.SkipTest("Can't test signal on %s" % \
    sys.platform)

    So it pretty much assumes that any Unix system has siginterrupt, hence sigaction, so it should be safe.
    So I'd say in theory, it will, but I don't think that siginterrupt can be available without sigaction anyway.
    So feel free to modify the test, or not ;-)

    @spiv
    Copy link
    Mannequin Author

    spiv mannequin commented Apr 15, 2010

    Are there any platforms that define HAVE_SIGINTERRUPT but that do not define HAVE_SIGACTION? If there are, then yes I expect they would fail that test.

    It would be a shame to delay this fix just because it doesn't fix all platforms... is it possible to mark that test as a known failure on a platform with siginterrupt but without sigaction?

    My initial comment outlined a change that would work for all platforms, but would be more complex. (I think the signal_noreinstall.diff change would be good to make even with the more complex approach, though.)

    @spiv
    Copy link
    Mannequin Author

    spiv mannequin commented Apr 15, 2010

    FWIW, comparing the "change history" sections of <http://www.opengroup.org/onlinepubs/000095399/functions/siginterrupt.html\> and <http://www.opengroup.org/onlinepubs/000095399/functions/sigaction.html\> suggests that sigaction predates siginterrupt, but it's a wild and varied world out there.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 15, 2010

    Well, I just think that the probability of having siginterrupt without sigaction is far less than having a Unix system without siginterrupt (which the current test_signal assumes). Or just drop the patch for the test, it honestly doesn't bother me ;-)

    @bitdancer
    Copy link
    Member

    Yes, my question was directed at finding out if there were any platforms on which we'd have to add an additional skip (which would mean refactoring that test into two tests). But if the buildbots are all happy after it is applied we can probably choose to not worry about it until/if we get a bug report.

    @exarkun
    Copy link
    Mannequin

    exarkun mannequin commented Apr 15, 2010

    Will the modified test fail on platforms that don't define HAVE_SIGACTION?

    Only if they also have siginterrupt, which seems unlikely (as neologix explained). The implemented behavior on such platforms is unmodified from current trunk, while the test requires new behavior.

    I think it's probably worth testing this new behavior separately from the test for the existing behavior anyway, for the sake of clarity. If it turns out there's a platform with siginterrupt but not sigaction, then that'll also let the test be skipped there, which is nice (though this case seems unlikely to come up).

    I'm not exactly sure how we will know if it is expected to fail, though. I don't think HAVE_SIGACTION is exposed nicely to Python right now. Perhaps the signal module should make this information available (it's somewhat important now, as it will mean the difference between a nicely working signal.siginterrupt and a not-so-nicely working one).

    This quirk probably also bears a mention in the siginterrupt documentation.

    A few other thoughts on the code nearby:

    • There seems to be no good reason to special case SIGCHLD in signal_handler. The comment about infinite recursion has no obvious interpretation to me. Fortunately, this is irrelevant on platforms with sigaction, because the handler remains installed anyway!

    • The distance between the new #ifndef HAVE_SIGACTION check and the corresponding #ifdef HAVE_SIGACTION in pythonrun.c is somewhat unfortunate. Someone could very easily change the logic in PyOS_setsig and disturb this relationship. I'm not sure how this could best be fixed. A general idea which presents itself is that both of these pieces of code could use a new identifier to make this choice, and that identifier could be named/documented such that it is obvious that it is used in two different places which must be kept synchronized. Or, some more comments in both places might help. It seems likely that the remaining SIGCHLD check in handle_signal is only there because whoever updated PyOS_setsig to use sigaction didn't know about it/remember it.

    Aside from those points, the fix seems correct and good.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 15, 2010

    • There seems to be no good reason to special case SIGCHLD in signal_handler. The comment about infinite recursion has no obvious interpretation to me. Fortunately, this is irrelevant on platforms with sigaction, because the handler remains installed anyway!

    I think that it's possible, on certain systems, that setting a signal handler for SIGCHLD while there is un unwaited child (zombie) process triggers a new sending of SIGCHLD.
    See http://standards.ieee.org/reading/ieee/interp/1003-1-90_int/pasc-1003.1-132.html:
    "The SIGCHLD or SIGCLD were used for implementing job control.
    Since I had implemented job control for both BSD and System V,
    I pointed out to the standards group that except for SIG_DFL,
    these signals had different semantics.

    If a signal handler was set for SIGCLD, then a signal would
    be generated if there were any unreaped child processes.
    When the signal handler was caught in System V, it was reset
    by default to SIG_DFL. However, if a process did not
    reap a child and instead reestablished the signal handler,
    it would go into an infinite loop since the signal would
    be generated again. The SIGCLD SIG_IGN behavior was that
    the system reaped the child when it completed so
    that the application didn't have to deal with it.
    However, I believe that a process blocked in wait() would
    be awakened, but I am not certain of this.

    The SIGCHLD signal on the other hand was generated when
    a child completed if a signal handler was set at that time.
    No signal would be generated if a signal handler was
    established while there was waiting children.
    The SIGCHLD signal was also generated when a child process stopped.
    I believe that BSD treated SIGCHLD SIG_IGN the same way
    that it treated SIGCHLD SIG_DFL."

    So, there might exist out there a couple systems that merrily mix SIGCLD and SIGCHLD, and re-raise a SIGCHLD when setting a new handler for it when unwaited children exist (which is the case in signal_handle, since child processes can't be waited for before the main thread gets to execute the handler it set up).

    @spiv
    Copy link
    Mannequin Author

    spiv mannequin commented Apr 15, 2010

    I'm not exactly sure how we will know if it is expected to fail,
    though. I don't think HAVE_SIGACTION is exposed nicely to Python
    right now.

    It might be useful to have the contents of pyconfig.h exposed as a dict somehow. Maybe call it sys._pyconfig_h?

    A less ambitious change would be to expose just HAVE_SIGACTION as e.g. signal._have_sigaction.

    I agree with r.david.murray that we probably don't need to bother unless a buildbot or someone reports that this test fails.

    @bz2
    Copy link
    Mannequin

    bz2 mannequin commented May 6, 2010

    This patch has been reviewed by both Andrew and myself, it would be nice if someone made the time to land it. The test change is unlikely to break anything, and hey, that's what buildbots are for.

    @exarkun
    Copy link
    Mannequin

    exarkun mannequin commented May 7, 2010

    I agree that this should be landed (for 2.6 and 2.7). I think I can do it. I made some changes to the tests, though. It would be nice for someone to look those over and make sure the change still looks good.

    I checked everything in to the siginterrupt-reset-issue8354 branch. You can also find the diff at http://codereview.appspot.com/1134042/show

    @exarkun
    Copy link
    Mannequin

    exarkun mannequin commented May 9, 2010

    Should be resolved in, oh, let's see, r81007, r81011, r81016, and r81018. Thanks to everyone who helped out.

    @exarkun exarkun mannequin closed this as completed May 9, 2010
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant