Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_multiprocessing_fork: test_semaphore_tracker_sigint() fails with -W error #77794

Closed
vstinner opened this issue May 23, 2018 · 31 comments
Closed
Assignees
Labels
3.8 only security fixes tests Tests in the Lib/test dir

Comments

@vstinner
Copy link
Member

BPO 33613
Nosy @pitrou, @vstinner, @taleinat, @serhiy-storchaka, @applio, @pablogsal
PRs
  • bpo-33613, test_semaphore_tracker_sigint: fix race condition #7850
  • bpo-33613: multiprocessing semtracker uses sigmask #8500
  • [3.7] bpo-33613, test_semaphore_tracker_sigint: fix race condition (GH-7850) #9055
  • bpo-33613: Fix test_semaphore_tracker signal tests when using -Werror #9778
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/pablogsal'
    closed_at = <Date 2018-10-10.08:18:02.308>
    created_at = <Date 2018-05-23.10:04:09.411>
    labels = ['3.8', 'tests']
    title = 'test_multiprocessing_fork: test_semaphore_tracker_sigint() fails with -W error'
    updated_at = <Date 2018-10-10.08:18:02.307>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2018-10-10.08:18:02.307>
    actor = 'pablogsal'
    assignee = 'pablogsal'
    closed = True
    closed_date = <Date 2018-10-10.08:18:02.308>
    closer = 'pablogsal'
    components = ['Tests']
    creation = <Date 2018-05-23.10:04:09.411>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 33613
    keywords = ['patch']
    message_count = 31.0
    messages = ['317372', '317374', '317378', '317380', '320062', '320063', '320113', '320364', '320370', '320371', '320372', '320373', '320374', '320376', '320379', '320381', '320382', '320384', '320614', '324564', '324565', '324567', '324568', '324569', '324570', '324571', '324572', '324573', '324574', '327244', '327459']
    nosy_count = 6.0
    nosy_names = ['pitrou', 'vstinner', 'taleinat', 'serhiy.storchaka', 'davin', 'pablogsal']
    pr_nums = ['7850', '8500', '9055', '9778']
    priority = 'normal'
    resolution = None
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue33613'
    versions = ['Python 3.8']

    @vstinner
    Copy link
    Member Author

    test_semaphore_tracker_sigint() emits a warning. If the test is run with -W error, the test fails.

    vstinner@apu$ ./python -m test test_multiprocessing_fork -v -m test_semaphore_tracker_sigint
    (...)
    test_semaphore_tracker_sigint (test.test_multiprocessing_fork.TestSemaphoreTracker) ... /home/vstinner/prog/python/master/Lib/multiprocessing/semaphore_tracker.py:55: UserWarning: semaphore_tracker: process died unexpectedly, relaunching. Some semaphores might leak.
    warnings.warn('semaphore_tracker: process died unexpectedly, '
    ok
    (...)
    Tests result: SUCCESS

    vstinner@apu$ ./python -Werror -m test test_multiprocessing_fork -v -m test_semaphore_tracker_sigint
    (...)
    ======================================================================
    ERROR: test_semaphore_tracker_sigint (test.test_multiprocessing_fork.TestSemaphoreTracker)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/vstinner/prog/python/master/Lib/test/_test_multiprocessing.py", line 4533, in test_semaphore_tracker_sigint
        self.check_semaphore_tracker_death(signal.SIGINT, False)
      File "/home/vstinner/prog/python/master/Lib/test/_test_multiprocessing.py", line 4521, in check_semaphore_tracker_death
        sem = ctx.Semaphore()
      File "/home/vstinner/prog/python/master/Lib/multiprocessing/context.py", line 82, in Semaphore
        return Semaphore(value, ctx=self.get_context())
      File "/home/vstinner/prog/python/master/Lib/multiprocessing/synchronize.py", line 127, in __init__
        SemLock.__init__(self, SEMAPHORE, value, SEM_VALUE_MAX, ctx=ctx)
      File "/home/vstinner/prog/python/master/Lib/multiprocessing/synchronize.py", line 81, in __init__
        register(self._semlock.name)
      File "/home/vstinner/prog/python/master/Lib/multiprocessing/semaphore_tracker.py", line 83, in register
        self._send('REGISTER', name)
      File "/home/vstinner/prog/python/master/Lib/multiprocessing/semaphore_tracker.py", line 90, in _send
        self.ensure_running()
      File "/home/vstinner/prog/python/master/Lib/multiprocessing/semaphore_tracker.py", line 55, in ensure_running
        warnings.warn('semaphore_tracker: process died unexpectedly, '
    UserWarning: semaphore_tracker: process died unexpectedly, relaunching.  Some semaphores might leak.
    (...)
    Tests result: FAILURE

    @vstinner vstinner added 3.8 only security fixes tests Tests in the Lib/test dir labels May 23, 2018
    @vstinner
    Copy link
    Member Author

    Note: I found this bug while working on bpo-33612.

    @vstinner
    Copy link
    Member Author

    (Oops, I attached a file and added two comments to this issue, whereas I wanted to comment bpo-33612.)

    @serhiy-storchaka
    Copy link
    Member

    See also some information about this in bpo-31463.

    @pablogsal pablogsal self-assigned this Jun 20, 2018
    @pablogsal
    Copy link
    Member

    I would like to work on this issue and on bpo-31463

    @vstinner
    Copy link
    Member Author

    I would like to work on this issue and on bpo-31463

    Say it on bpo-31463 as well on that case ;-)

    @pablogsal
    Copy link
    Member

    There are two problems in test_semaphore_tracker_sigint. One of the problems is that if should_die is False, the code does not check
    that no warning is raised (and the warning is being raised, so this
    the error has pass silently.) The other problem is a race condition
    between the parent calls _semaphore_tracker.ensure_running() (which in turn spawns the semaphore_tracker using _posixsubprocess.fork_exec) and the child registering the signals. So basically, the parent sends the signal before the child protects against the signal. Modifying:

            _semaphore_tracker.ensure_running()
            pid = _semaphore_tracker._pid
            os.kill(pid, signum)
            time.sleep(1.0)  # give it time to die

    to

            _semaphore_tracker.ensure_running()
            pid = _semaphore_tracker._pid
            time.sleep(1.0) # Give time for the child to register the signal handlers
            os.kill(pid, signum)
            time.sleep(1.0)  # give it time to die

    fix the issue. I have tested this on one of the most grumpy and slow buildbots (gcc110) and it works.

    I cannot think of a different way of waiting for the child to register the signals without modifying the child code for testing so I am not sure that we can do anything rather than the sleep.

    @taleinat
    Copy link
    Contributor

    You can know that the semaphore tracker is running by sending it an invalid command and seeing the error written to stderr. The semaphore tracker either inherits sys.stderr from the parent process (if possible) or uses its own. This is a bit of a hack, but might be worth it to avoid another wasted second on every run of the test suite.

    @pablogsal
    Copy link
    Member

    @taleinat Sadly, if the semaphore_tracker receives an invalid command, it raises and dies, making the test impossible:

                     cmd, name = line.strip().split(b':')
                     if cmd == b'REGISTER':
                         cache.add(name)
                     elif cmd == b'UNREGISTER':
                         cache.remove(name)
                     else:
                         raise RuntimeError('unrecognized command %r' % cmd)
    

    @pitrou
    Copy link
    Member

    pitrou commented Jun 24, 2018

    We can easily add a 'NOOP' command to the semaphore tracker if it helps solve that issue.

    @pablogsal
    Copy link
    Member

    Sorry, I was wrong, it does not die. I am investigating this solution using stderr.

    @pablogsal
    Copy link
    Member

    Additionally, as I commented in the PR, apart from the race condition I think the test should be modified so it fails if the warning is raised and not expected:

    The problem is that the test was silently failing when SIGINT was >being delivered as the test does not check that no warnings are >raised. Notice that the presence of a warning means that the process >died and therefore SIGINT did kill the process, which is precisely >what the test checks that does not happen.

    I think in order to check that the test works as intended without the >need to run the suite with -Wall, we need to modify the test.

    @pablogsal
    Copy link
    Member

    @pitrou I think that should be ideal. The reason is that if we send an invalid command the cache is not cleared and the semaphore tracker will warn when the process finishes which is precisely what the test is avoiding.

    @taleinat
    Copy link
    Contributor

    For the "no-op" we'll need a way to know that it was processed. To that end, I'd make the semaphore tracker recognize a new command e.g. "PING", upon which it would write something e.g. "PONG" into its end of the pipe.

    @pablogsal
    Copy link
    Member

    I have updated the PR implementing a PONG command in the semaphore and updating the test. I have tested on the failing buildbots (gcc110.fsffrance.org) and it works. Notice that a new pipe for the child to write was needed as the child is continuously reading from the original one and therefore we cannot use the same one.

    @pitrou
    Copy link
    Member

    pitrou commented Jun 24, 2018

    Is it possible to notify the ping / noop reply on stderr instead of introducing a new pipe?

    @pablogsal
    Copy link
    Member

    The process seems to use its own stderr and it cannot be captured from the parent without modifying it AFAIK.

    @pablogsal
    Copy link
    Member

    I have modified the PR to avoid creating an extra pipe in the tracker (the pipe is created as part of the test). To allow testing with the -R option:

    ./python -m test test_multiprocessing_fork -v -m test_semaphore_tracker_sigint -R 3:3

    I had to assure that we restart the tracker in the test so it does not reuse a closed file descriptor.

    @vstinner
    Copy link
    Member Author

    See also bpo-31687.

    @pitrou
    Copy link
    Member

    pitrou commented Sep 4, 2018

    New changeset ec74d18 by Antoine Pitrou (Pablo Galindo) in branch 'master':
    bpo-33613, test_semaphore_tracker_sigint: fix race condition (bpo-7850)
    ec74d18

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 4, 2018

    IMHO the fix should be backported to Python 3.6 and 3.7.

    It cannot be backported to 2.7 since 2.7 doesn't have signal.pthread_sigmask().

    @vstinner vstinner added the 3.7 (EOL) end of life label Sep 4, 2018
    @pitrou
    Copy link
    Member

    pitrou commented Sep 4, 2018

    As I wrote on Github: no user-visible bug is fixed here, and we shouldn't risk introducing regressions by backporting those changes.

    If buildbots hurt, I suggest skipping the tests on the buildbots.

    @pitrou pitrou removed the 3.7 (EOL) end of life label Sep 4, 2018
    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 4, 2018

    If buildbots hurt, I suggest skipping the tests on the buildbots.

    Let's do that. Pablo: do you want to write a PR to always skip TestSemaphoreTracker with a reference to this issue? Example: skipIf(True, "bpo-33613: the test has a race condition").

    @pitrou
    Copy link
    Member

    pitrou commented Sep 4, 2018

    Please only skip those tests on buildbots. They work fine otherwise.

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 4, 2018

    Please only skip those tests on buildbots.

    There is no easy wait to only skip a test on buildbots.

    They work fine otherwise.

    The race condition impacts everyone. It's just less likely if your computer is fast enough. The bug is much more likely if you "force" the bad path:

    diff --git a/Lib/multiprocessing/semaphore_tracker.py b/Lib/multiprocessing/semaphore_tracker.py
    index 3b50a46ddc..7261b43725 100644
    --- a/Lib/multiprocessing/semaphore_tracker.py
    +++ b/Lib/multiprocessing/semaphore_tracker.py
    @@ -107,6 +107,8 @@ getfd = _semaphore_tracker.getfd
     def main(fd):
         '''Run semaphore tracker.'''
         # protect the process from ^C and "killall python" etc
    +    import time
    +    time.sleep(0.5)
         signal.signal(signal.SIGINT, signal.SIG_IGN)
         signal.signal(signal.SIGTERM, signal.SIG_IGN)
     
    diff --git a/Lib/test/_test_multiprocessing.py b/Lib/test/_test_multiprocessing.py
    index 5c625dd495..6f9f7583e2 100644
    --- a/Lib/test/_test_multiprocessing.py
    +++ b/Lib/test/_test_multiprocessing.py
    @@ -4193,7 +4193,6 @@ class TestSemaphoreTracker(unittest.TestCase):
             _multiprocessing.sem_unlink(name1)
             p.terminate()
             p.wait()
    -        time.sleep(2.0)
             with self.assertRaises(OSError) as ctx:
                 _multiprocessing.sem_unlink(name2)
             # docs say it should be ENOENT, but OSX seems to give EINVAL

    @pitrou
    Copy link
    Member

    pitrou commented Sep 4, 2018

    Le 04/09/2018 à 11:16, STINNER Victor a écrit :

    > They work fine otherwise.

    The race condition impacts everyone. It's just less likely if your computer is fast enough. The bug is much more likely if you "force" the bad path:

    Which bug? It's only a failing test.

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 4, 2018

    By the way, "no user-visible bug is fixed here": I agree that it's not easy to trigger manually the bug (when pressing CTRL+C), but I don't see why an user couldn't hit this bug. The race condition is now obvious to me.

    @pitrou
    Copy link
    Member

    pitrou commented Sep 4, 2018

    I don't want to spend hours arguing. This issue is sufficiently rare and unlikely in normal conditions that I don't think we should risk regressions by trying to fix it.

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 4, 2018

    Which bug? It's only a failing test.

    Linux vendors run the full test suite. If the test suite fails, the build of the package fails as well. It's annoying.

    @serhiy-storchaka
    Copy link
    Member

    After this change tests are failed when ran with -Werror.

    $ ./python -Werror -m test -vuall -m 'test_semaphore_tracker_sig*' test_multiprocessing_fork test_multiprocessing_forkserver test_multiprocessing_spawn
    ...

    ======================================================================
    ERROR: test_semaphore_tracker_sigkill (test.test_multiprocessing_fork.TestSemaphoreTracker)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython/Lib/test/_test_multiprocessing.py", line 4562, in test_semaphore_tracker_sigkill
        self.check_semaphore_tracker_death(signal.SIGKILL, True)
      File "/home/serhiy/py/cpython/Lib/test/_test_multiprocessing.py", line 4526, in check_semaphore_tracker_death
        _semaphore_tracker.ensure_running()
      File "/home/serhiy/py/cpython/Lib/multiprocessing/semaphore_tracker.py", line 64, in ensure_running
        warnings.warn('semaphore_tracker: process died unexpectedly, '
    UserWarning: semaphore_tracker: process died unexpectedly, relaunching.  Some semaphores might leak.

    ...
    ======================================================================
    ERROR: test_semaphore_tracker_sigint (test.test_multiprocessing_forkserver.TestSemaphoreTracker)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython/Lib/test/_test_multiprocessing.py", line 4554, in test_semaphore_tracker_sigint
        self.check_semaphore_tracker_death(signal.SIGINT, False)
      File "/home/serhiy/py/cpython/Lib/test/_test_multiprocessing.py", line 4526, in check_semaphore_tracker_death
        _semaphore_tracker.ensure_running()
      File "/home/serhiy/py/cpython/Lib/multiprocessing/semaphore_tracker.py", line 64, in ensure_running
        warnings.warn('semaphore_tracker: process died unexpectedly, '
    UserWarning: semaphore_tracker: process died unexpectedly, relaunching.  Some semaphores might leak.

    ======================================================================
    ERROR: test_semaphore_tracker_sigkill (test.test_multiprocessing_forkserver.TestSemaphoreTracker)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython/Lib/test/_test_multiprocessing.py", line 4562, in test_semaphore_tracker_sigkill
        self.check_semaphore_tracker_death(signal.SIGKILL, True)
      File "/home/serhiy/py/cpython/Lib/test/_test_multiprocessing.py", line 4534, in check_semaphore_tracker_death
        sem = ctx.Semaphore()
      File "/home/serhiy/py/cpython/Lib/multiprocessing/context.py", line 82, in Semaphore
        return Semaphore(value, ctx=self.get_context())
      File "/home/serhiy/py/cpython/Lib/multiprocessing/synchronize.py", line 126, in __init__
        SemLock.__init__(self, SEMAPHORE, value, SEM_VALUE_MAX, ctx=ctx)
      File "/home/serhiy/py/cpython/Lib/multiprocessing/synchronize.py", line 80, in __init__
        register(self._semlock.name)
      File "/home/serhiy/py/cpython/Lib/multiprocessing/semaphore_tracker.py", line 104, in register
        self._send('REGISTER', name)
      File "/home/serhiy/py/cpython/Lib/multiprocessing/semaphore_tracker.py", line 111, in _send
        self.ensure_running()
      File "/home/serhiy/py/cpython/Lib/multiprocessing/semaphore_tracker.py", line 64, in ensure_running
        warnings.warn('semaphore_tracker: process died unexpectedly, '
    UserWarning: semaphore_tracker: process died unexpectedly, relaunching.  Some semaphores might leak.

    ...
    ======================================================================
    ERROR: test_semaphore_tracker_sigint (test.test_multiprocessing_spawn.TestSemaphoreTracker)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython/Lib/test/_test_multiprocessing.py", line 4554, in test_semaphore_tracker_sigint
        self.check_semaphore_tracker_death(signal.SIGINT, False)
      File "/home/serhiy/py/cpython/Lib/test/_test_multiprocessing.py", line 4526, in check_semaphore_tracker_death
        _semaphore_tracker.ensure_running()
      File "/home/serhiy/py/cpython/Lib/multiprocessing/semaphore_tracker.py", line 64, in ensure_running
        warnings.warn('semaphore_tracker: process died unexpectedly, '
    UserWarning: semaphore_tracker: process died unexpectedly, relaunching.  Some semaphores might leak.

    ======================================================================
    ERROR: test_semaphore_tracker_sigkill (test.test_multiprocessing_spawn.TestSemaphoreTracker)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython/Lib/test/_test_multiprocessing.py", line 4562, in test_semaphore_tracker_sigkill
        self.check_semaphore_tracker_death(signal.SIGKILL, True)
      File "/home/serhiy/py/cpython/Lib/test/_test_multiprocessing.py", line 4534, in check_semaphore_tracker_death
        sem = ctx.Semaphore()
      File "/home/serhiy/py/cpython/Lib/multiprocessing/context.py", line 82, in Semaphore
        return Semaphore(value, ctx=self.get_context())
      File "/home/serhiy/py/cpython/Lib/multiprocessing/synchronize.py", line 126, in __init__
        SemLock.__init__(self, SEMAPHORE, value, SEM_VALUE_MAX, ctx=ctx)
      File "/home/serhiy/py/cpython/Lib/multiprocessing/synchronize.py", line 80, in __init__
        register(self._semlock.name)
      File "/home/serhiy/py/cpython/Lib/multiprocessing/semaphore_tracker.py", line 104, in register
        self._send('REGISTER', name)
      File "/home/serhiy/py/cpython/Lib/multiprocessing/semaphore_tracker.py", line 111, in _send
        self.ensure_running()
      File "/home/serhiy/py/cpython/Lib/multiprocessing/semaphore_tracker.py", line 64, in ensure_running
        warnings.warn('semaphore_tracker: process died unexpectedly, '
    UserWarning: semaphore_tracker: process died unexpectedly, relaunching.  Some semaphores might leak.

    And I'm puzzled by the line

        issubclass(the_warn.category, UserWarning)

    @pablogsal
    Copy link
    Member

    New changeset 3058b7d by Pablo Galindo in branch 'master':
    bpo-33613: Fix test_semaphore_tracker signal tests when using -Werror (GH-9778)
    3058b7d

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes tests Tests in the Lib/test dir
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants