classification
Title: test_eintr.test_sigwaitinfo() hangs on "AMD64 FreeBSD CURRENT 3.x" buildbot
Type: Stage: resolved
Components: Tests Versions: Python 3.7, Python 3.6, Python 3.5
process
Status: closed Resolution: duplicate
Dependencies: Superseder: test_eintr.test_sigwaitinfo(): race condition on AMD64 FreeBSD 10.x Shared 3.6
View: 30320
Assigned To: Nosy List: bapt, haypo, koobs, martin.panter, python-dev
Priority: normal Keywords: buildbot, patch

Created on 2015-12-15 10:20 by haypo, last changed 2017-06-28 01:16 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
sigwaitinfo-block.patch martin.panter, 2016-01-28 03:30 review
sigwaitinfo-block.v2.patch martin.panter, 2016-11-05 02:26 review
Messages (14)
msg256449 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-12-15 10:20
Re-running test 'test_eintr' in verbose mode
test_all (test.test_eintr.EINTRTests) ... test test_eintr failed
FAIL

======================================================================
FAIL: test_all (test.test_eintr.EINTRTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/test_eintr.py", line 20, in test_all
    script_helper.assert_python_ok("-u", tester)
  File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/support/script_helper.py", line 135, in assert_python_ok
    return _assert_python(True, *args, **env_vars)
  File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/support/script_helper.py", line 121, in _assert_python
    err))
AssertionError: Process return code is 1
command line: ['/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/python', '-X', 'faulthandler', '-I', '-u', '/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/eintrdata/eintr_tester.py']

stdout:
---
test_read (__main__.OSEINTRTest) ... ok
test_wait (__main__.OSEINTRTest) ... ok
test_wait3 (__main__.OSEINTRTest) ... ok
test_wait4 (__main__.OSEINTRTest) ... ok
test_waitpid (__main__.OSEINTRTest) ... ok
test_write (__main__.OSEINTRTest) ... ok
test_accept (__main__.SocketEINTRTest) ... ok
test_open (__main__.SocketEINTRTest) ... ok
test_os_open (__main__.SocketEINTRTest) ... ok
test_recv (__main__.SocketEINTRTest) ... ok
test_recvmsg (__main__.SocketEINTRTest) ... ok
test_send (__main__.SocketEINTRTest) ... ok
test_sendall (__main__.SocketEINTRTest) ... ok
test_sendmsg (__main__.SocketEINTRTest) ... ok
test_sleep (__main__.TimeEINTRTest) ... ok
test_sigtimedwait (__main__.SignalEINTRTest) ... ok
test_sigwaitinfo (__main__.SignalEINTRTest) ...
---

stderr:
---
Timeout (0:10:00)!
Thread 0x0000000801815000 (most recent call first):
  File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/eintrdata/eintr_tester.py", line 404 in test_sigwaitinfo
  File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/unittest/case.py", line 600 in run
  File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/unittest/case.py", line 648 in __call__
  File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/unittest/suite.py", line 122 in run
  File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/unittest/suite.py", line 84 in __call__
  File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/unittest/suite.py", line 122 in run
  File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/unittest/suite.py", line 84 in __call__
  File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/unittest/runner.py", line 176 in run
  File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/support/__init__.py", line 1780 in _run_suite
  File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/support/__init__.py", line 1814 in run_unittest
  File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/eintrdata/eintr_tester.py", line 472 in test_main
  File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/eintrdata/eintr_tester.py", line 476 in <module>
---

----------------------------------------------------------------------
Ran 1 test in 607.737s

FAILED (failures=1)
1 test failed again:
    test_eintr
*** Error code 1
msg256450 - (view) Author: Roundup Robot (python-dev) Date: 2015-12-15 10:31
New changeset 7f49af7046e4 by Victor Stinner in branch 'default':
Issue #25868: Try to make test_eintr.test_sigwaitinfo() more reliable
https://hg.python.org/cpython/rev/7f49af7046e4
msg256451 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-12-15 10:38
With the pipe, there is still a potential race after the parent writes to the pipe and before sigwaitinfo() is invoked, versus the child sleep() call.

What do you think of my suggestion to block the signal? Then (in theory) it should be robust, rather than relying on timing.

https://bugs.python.org/issue25277#msg256278
msg256452 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-12-15 10:42
> With the pipe, there is still a potential race after the parent writes to the pipe and before sigwaitinfo() is invoked, versus the child sleep() call.

Yeah, I know, that's why I kept the ugly 100 ms sleep.

> What do you think of my suggestion to block the signal? Then (in theory) it should be robust, rather than relying on timing.
> https://bugs.python.org/issue25277#msg256278

I read your comment, but with this change, I'm not sure anymore that we test that sigwaitinfo() retries when it's interrupted by a signal (EINTR).
msg256453 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-12-15 10:44
I used a short sleep of 100 ms to test if the pipe is enough to synchronize the parent and the child, but this delay is too short to test that sigwaitinfo() is interrupted by EINTR.

"self.sleep_time = 0.100" must be removed to use the default sleep time (currently 200 ms, signal_period *  2).

By the way, we should maybe use sleep_time = signal_period * 3 to test more signals?
msg256456 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-12-15 10:58
You would only block SIGUSR1 (the signal being waited for). SIGALRM would remain unblocked, so the syscall should be interrupted by it.

If everything is working smoothly, you should generally see two SIGALRM interrupts per sleep time, so I don’t think it is worth changing to three.

I guess you could increment a counter in the SIGALRM handler to verify that it gets called multiple times in the vicinity of the sigwaitinfo() call.
msg256457 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-12-15 11:00
BTW I definitely think your pipe solution is better than before, which was relying on the whole Python process startup time being fast. The race window should be much smaller now.
msg259018 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-27 11:55
(Wrote this ages ago but never hit send:)

Here is another theory to explain the hang: When sigwaitinfo() is being called, and the SIGALRM signal arrives, it executes the signal handler (both the C handler and later the Python handler). If the SIGUSR1 signal arrives while either of these signal handlers are executing, it will be missed, and when sigwaitinfo() resumes it will hang. This would also be avoided by blocking SIGUSR1.

I wonder why the hang only seems to happen with BSD though.
msg259030 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-01-27 16:20
> This would also be avoided by blocking SIGUSR1.

Sorry, I'm lost on this old and complex issue. Can you please propose a patch?

If it doesn't break test_eintr on Linux and FreeBSD, we can just push
it and take a look sometimes at FreeBSD buildbots ;-)
msg259087 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-28 03:30
Here is my patch which blocks SIGUSR1. I have only tested it on Linux.

I also added a counter to the SIGALRM handler, so we can verify that it does get called by the Python sigwaitinfo() implementation.
msg259149 - (view) Author: Roundup Robot (python-dev) Date: 2016-01-28 17:15
New changeset ddce15b21c21 by Victor Stinner in branch '3.5':
Backport fixes on test_eintr
https://hg.python.org/cpython/rev/ddce15b21c21
msg276366 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-09-14 00:37
Happened again:
http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.x%203.x/builds/4990/steps/test/logs/stdio
Timeout (0:10:00)!
Thread 0x0000000801807400 (most recent call first):
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/eintrdata/eintr_tester.py", line 420 in test_sigwaitinfo

Perhaps I will have another look at committing my patch. As I recall, it should make the test more robust.
msg280100 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-11-05 02:26
My patch doesn’t help answer the question of why this only fails on BSD, so I will hold off on committing it. However v2 has a minor update to skip the test when pthread_sigmask() is unavailable.
msg297107 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-06-28 01:16
It seems like the root issue was fixed by myself in the bpo-30320. Sorry Martin, I forgot this issue and so forgot to review your patch.

But it seems like we took a similar approach: using signal.pthread_sigmask() seems to be the only right way to write a *reliable* test for sigwaitinfo() ;-)
History
Date User Action Args
2017-06-28 01:16:29hayposetstatus: open -> closed
superseder: test_eintr.test_sigwaitinfo(): race condition on AMD64 FreeBSD 10.x Shared 3.6
messages: + msg297107

resolution: duplicate
stage: patch review -> resolved
2016-11-05 02:26:19martin.pantersetfiles: + sigwaitinfo-block.v2.patch

messages: + msg280100
versions: + Python 3.7
2016-09-14 00:37:28martin.pantersetmessages: + msg276366
2016-01-28 17:15:12python-devsetmessages: + msg259149
2016-01-28 03:30:09martin.pantersetfiles: + sigwaitinfo-block.patch
keywords: + patch
messages: + msg259087

stage: patch review
2016-01-27 16:20:06hayposetmessages: + msg259030
2016-01-27 11:55:07martin.pantersetkeywords: + buildbot

messages: + msg259018
components: + Tests
versions: + Python 3.5
2015-12-15 11:00:39martin.pantersetmessages: + msg256457
2015-12-15 10:58:03martin.pantersetmessages: + msg256456
2015-12-15 10:44:53hayposetmessages: + msg256453
2015-12-15 10:42:36hayposetmessages: + msg256452
2015-12-15 10:38:38martin.pantersetnosy: + martin.panter
messages: + msg256451
2015-12-15 10:31:25python-devsetnosy: + python-dev
messages: + msg256450
2015-12-15 10:20:56haypocreate