This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: test_signal.test_without_siginterrupt() sporadic failures on FreeBSD 6.4
Type: Stage: resolved
Components: Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: neologix, pitrou, python-dev, vstinner
Priority: normal Keywords: patch

Created on 2011-06-19 17:21 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test_siginterrupt.patch vstinner, 2011-06-19 21:08 review
test_siginterrupt.diff neologix, 2011-06-21 21:08 review
Messages (13)
msg138647 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-19 17:21
======================================================================
FAIL: test_siginterrupt_on (test.test_signal.SiginterruptTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd/build/Lib/test/test_signal.py", line 396, in test_siginterrupt_on
    self.assertTrue(i)
AssertionError: False is not true

http://www.python.org/dev/buildbot/all/builders/x86%20FreeBSD%206.4%203.x/builds/1586/steps/test/logs/stdio
http://www.python.org/dev/buildbot/all/builders/x86%20FreeBSD%206.4%203.x/builds/1587/steps/test/logs/stdio
msg138652 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-19 17:58
Seen also on OpenSolaris:

test test_signal failed -- Traceback (most recent call last):
  File "/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Lib/test/test_signal.py", line 399, in test_siginterrupt_on
    self.assertTrue(i)
AssertionError: False is not true

http://www.python.org/dev/buildbot/all/builders/AMD64%20OpenIndiana%203.x/builds/1443/steps/test/logs/stdio
msg138664 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-19 21:08
I tried to patch the test to use a semaphore, but my patch was not reliable (don't remove completly the race condition).

Here is a patch using a subprocess to:
 - have only one thread
 - have a timeout on the blocking read (select cannot be used in the test, select always fail with EINTR, the kernel doesn't restart it)
 - not touch signal handling of the parent process

It is also based on time: it uses alarm() to raise a signal in one second, and use an hardcoded timeout of 3 seconds. But it doesn't need tricky synchronization between two processes.
msg138761 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-06-20 20:51
The patch looks good to me.
In the subprocess, why not use the standard 0 exit code in case of success?
Also, points 1 and 3 could be handled simply by having the parent
process send a signal to the child (but this wouldn't address the
timeout issue).
msg138762 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-06-20 21:28
New changeset 968b9ff9a059 by Victor Stinner in branch 'default':
Close #12363: fix a race condition in siginterrupt() tests
http://hg.python.org/cpython/rev/968b9ff9a059
msg138763 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-20 21:28
> In the subprocess, why not use the standard 0 exit code
> in case of success?

Something outside my code may exit Python with the code 0. Even if it unlikely, I prefer to use uncommon exit codes, to ensure that the child process executed "correctly" my code. A better check would be to write a specific pattern to stdout, and check stdout, but it would be overkill.

> Also, points 1 and 3 could be handled simply by having
> the parent process send a signal to the child
> (but this wouldn't address the timeout issue).

(Hum, points 1 and 3: "have only one thread" and "not touch signal handling of the parent process".)

True, but I would like to write a more reliable test, and I don't know how to synchronize two processes for this test case.

Because your first sentence was "The patch looks good to me.", let's try this new test in our buildbots.
msg138793 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-06-21 19:09
The test still fails on FreeBSD 7.2, Tiger and Debian parallel:

"""
======================================================================
FAIL: test_siginterrupt_on (test.test_signal.SiginterruptTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/test_signal.py",
line 379, in test_siginterrupt_on
    self.assertTrue(interrupted)
AssertionError: False is not true

======================================================================
FAIL: test_without_siginterrupt (test.test_signal.SiginterruptTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/test_signal.py",
line 372, in test_without_siginterrupt
    self.assertTrue(interrupted)
AssertionError: False is not true
"""

Here's a sample strace output on my Linux box when it fails:

2120  21:00:18.755448 rt_sigaction(SIGALRM, {0x810b332, [], 0}, NULL,
8) = 0 <0.000019>
2120  21:00:18.755690 alarm(1)          = 0 <0.000024>
2120  21:00:18.755803 read(3,  <unfinished ...>
[...]
2120  21:00:19.755811 <... read resumed> 0xb72cad90, 1) = ?
ERESTARTSYS (To be restarted) <0.999973>
2120  21:00:19.755864 --- SIGALRM (Alarm clock) @ 0 (0) ---
2120  21:00:19.756094 sigreturn()       = ? (mask now []) <0.000019>
2120  21:00:19.756434 alarm(1)          = 0 <0.000023>
2120  21:00:19.756547 read(3,  <unfinished ...>

sigaction is called without SA_RESTART, but the syscall is restarted anyway...
Apart from removing those tests, I don't see what we can do here.
msg138803 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-06-21 21:05
Duh, don't know what I was thinking: the syscall is not restarted
(even though ERESTARTSYS is displayed by strace): the real problem is
that the 3s timeout to communicate is not enough, because spawning a
new interpreter can take a long time (Antoine created an issue some
time ago about the high number of syscalls per import). If it takes
more than 1s, the test will fail.
Also, I guess it's worst on FreeBSD because subprocess.Popen uses
close_fds=True by default, and FreeBSD has a huge default
_SC_OPEN_MAX.
I've attached a patch passing close_fds=False to spawn_python, and
running the test only once, because otherwise this would require a
timeout quite large.
msg138832 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-06-22 20:16
New changeset aff0a7b0cb12 by Victor Stinner in branch 'default':
Issue #12363: improve siginterrupt() tests
http://hg.python.org/cpython/rev/aff0a7b0cb12
msg138833 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-22 20:20
> Apart from removing those tests, I don't see what we can do here.

The previous version of the test rarely failed (only sometimes on the FreeBSD 6.4 buildbox). We may revert my commits to restore the previous test if the new tests fail more often.

> the real problem is that the 3s timeout to communicate is not
> enough, because spawning a new interpreter can take a long time

Right, I added a basic synchronization code to wait until the beginning of the test. Let see if it is better or worse ;-)
msg138861 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-23 09:49
All Python 3.x buildbots are green (except FreeBSD 7.2, but it's not related to this issue). Let close this issue.
msg139579 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-07-01 13:26
New changeset 8250f04d5a41 by Victor Stinner in branch '3.2':
Issue #12363: improve siginterrupt() tests
http://hg.python.org/cpython/rev/8250f04d5a41
msg139582 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-07-01 14:00
New changeset 3f30cfe51315 by Victor Stinner in branch '3.2':
Issue #12363: increase the timeout of siginterrupt() tests
http://hg.python.org/cpython/rev/3f30cfe51315

New changeset 423268537083 by Victor Stinner in branch 'default':
(merge 3.2) Issue #12363: increase the timeout of siginterrupt() tests
http://hg.python.org/cpython/rev/423268537083
History
Date User Action Args
2022-04-11 14:57:18adminsetgithub: 56572
2011-07-01 14:00:02python-devsetmessages: + msg139582
2011-07-01 13:26:03python-devsetmessages: + msg139579
2011-06-23 09:49:14vstinnersetstatus: open -> closed

messages: + msg138861
2011-06-22 20:20:43vstinnersetmessages: + msg138833
2011-06-22 20:16:36python-devsetmessages: + msg138832
2011-06-21 21:09:23neologixsetstatus: closed -> open
2011-06-21 21:09:06neologixsetfiles: - test_siginterrupt.diff
2011-06-21 21:08:52neologixsetfiles: + test_siginterrupt.diff
2011-06-21 21:05:56neologixsetfiles: + test_siginterrupt.diff

messages: + msg138803
2011-06-21 19:09:54neologixsetmessages: + msg138793
2011-06-20 21:28:17vstinnersetmessages: + msg138763
2011-06-20 21:28:05python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg138762

resolution: fixed
stage: resolved
2011-06-20 20:51:44neologixsetmessages: + msg138761
2011-06-19 21:08:51vstinnersetfiles: + test_siginterrupt.patch

nosy: + pitrou
messages: + msg138664

keywords: + patch
2011-06-19 17:58:00vstinnersetmessages: + msg138652
2011-06-19 17:21:45vstinnercreate