classification
Title: test_sendall_interrupted() of test_socket hangs on FreeBSD
Type: Stage:
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: neologix, python-dev, vstinner
Priority: normal Keywords:

Created on 2011-04-03 19:44 by vstinner, last changed 2011-04-04 09:08 by vstinner. This issue is now closed.

Messages (7)
msg132871 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-03 19:44
I added a timeout of 30 minutes to regrtest. On "x86 FreeBSD 7.2 3.x" and "x86 FreeBSD 3.x" buildbot, test_sendall_interrupted() of test_socket does timeout after 30 minutes:
----------
...
[201/354] test_socket
Thread 0x28401040:
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/test_socket.py", line 729 in check_sendall_interrupted
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/test_socket.py", line 740 in test_sendall_interrupted
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/unittest/case.py", line 387 in _executeTestPart
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/unittest/case.py", line 442 in run
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/unittest/case.py", line 494 in __call__
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/unittest/suite.py", line 105 in run
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/unittest/suite.py", line 67 in __call__
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/unittest/suite.py", line 105 in run
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/unittest/suite.py", line 67 in __call__
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/support.py", line 1078 in run
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/support.py", line 1166 in _run_suite
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/support.py", line 1192 in run_unittest
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/test_socket.py", line 2052 in test_main
  File "./Lib/test/regrtest.py", line 1032 in runtest_inner
  File "./Lib/test/regrtest.py", line 826 in runtest
  File "./Lib/test/regrtest.py", line 650 in main
  File "./Lib/test/regrtest.py", line 1607 in <module>
----------
http://www.python.org/dev/buildbot/all/builders/x86%20FreeBSD%207.2%203.x/builds/1644/steps/test/logs/stdio
another example:
http://www.python.org/dev/buildbot/all/builders/x86%20FreeBSD%203.x/builds/1344/steps/test/logs/stdio

It looks like the test should be done in less than 5 seconds. Extract of the code (with_timeout=False for test_sendall_interrupted):
-------------------------
def check_sendall_interrupted(self, with_timeout):
        # socketpair() is not stricly required, but it makes things easier.
        if not hasattr(signal, 'alarm') or not hasattr(socket, 'socketpair'):
            self.skipTest("signal.alarm and socket.socketpair required for this test")
        # Our signal handlers clobber the C errno by calling a math function
        # with an invalid domain value.
        def ok_handler(*args):
            self.assertRaises(ValueError, math.acosh, 0)
        def raising_handler(*args):
            self.assertRaises(ValueError, math.acosh, 0)
            1 // 0
        c, s = socket.socketpair()
        old_alarm = signal.signal(signal.SIGALRM, raising_handler)
        try:
            if with_timeout:
                # Just above the one second minimum for signal.alarm
                c.settimeout(1.5)
            with self.assertRaises(ZeroDivisionError):
                signal.alarm(1)
                c.sendall(b"x" * (1024**2)) <~~~~ HERE
            if with_timeout:
                signal.signal(signal.SIGALRM, ok_handler)
                signal.alarm(1)
                self.assertRaises(socket.timeout, c.sendall, b"x" * (1024**2))
        finally:
            signal.signal(signal.SIGALRM, old_alarm)
            c.close()
            s.close()
-------------------------
I suppose that sendall() should be interrupted by SIGALRM. sendall() calls PyErr_CheckSignals() after each system call to send(), and exits with an error if PyErr_CheckSignals() returns NULL. If send() fails with EINTR error, sendall() retries calls send() again (in a loop).

I don't know if this hang is new or not, and if it is (or not) a border effect of the timeout feature of regrtest...

But test_signal don't always fail, it looks like a race condition.
msg132883 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-04-03 20:58
This test assumes that send will necessarily return if interrupted by a signal, but the kernel can automatically restart the syscall when no data has been committed (instead of returning -1 with errno set to EINTR).
And, AFAIK, that's exactly what FreeBSD does, see http://www.freebsd.org/cgi/man.cgi?query=siginterrupt&apropos=0&sektion=0&manpath=FreeBSD+8.2-RELEASE&format=html :
"""
The siginterrupt() function is used to change the system call restart
     behavior when a system call is interrupted by the specified signal.  If
     the flag is false (0), then system calls will be restarted if they are
     interrupted by the specified signal and no data has been transferred yet.
     System call restart has been the default behavior since 4.2BSD, and is
     the default behaviour for signal(3) on FreeBSD.
"""

And http://www.gsp.com/cgi-bin/man.cgi?section=2&topic=sigaction :

"""
If a signal is caught during the system calls listed below, the call may be forced to terminate with the error EINTR, the call may return with a data transfer shorter than requested, or the call may be restarted. Restart of pending calls is requested by setting the SA_RESTART bit in sa_flags. The affected system calls include open(2), read(2), write(2), sendto(2), recvfrom(2), sendmsg(2) and recvmsg(2) on a communications channel or a slow device (such as a terminal, but not a regular file) and during a wait(2) or ioctl(2). However, calls that have already committed are not restarted, but instead return a partial success (for example, a short read count).
"""

So if the signal arrives while some data has been transferred, send will return with a partial write, but if the signal arrives before any data has been written, then you'll never see EINTR and remain stuck forever (unless SA_RESTART is unset).
Note that POSIX seems to require write to return with EINTR if interrupted before any data is written, see http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html :

"""
If write() is interrupted by a signal before it writes any data, it shall return -1 with errno set to [EINTR].
"""

But send and sendto man pages don't require this behaviour.
msg132889 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-03 21:27
As issue #11755, this issue is a regression introduced by regrtest timeout (issue #11727): test_socket doesn't fail without regrtest timeout.
msg132897 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-03 22:21
Issue #11755 (test_itimer_real() of test_signal hang on FreeBSD) has been marked as a duplicate of this issue.
msg132904 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-03 23:21
The problem is that faulthandler's thread (faulthandler_thread) receives the SIGALRM signal: the signal interrupts sem_timedwait() which returns EINTR. PyThread_acquire_lock_timed() retries sem_timedwait() and so the other thread executing sendall() is not interrupted.

The solution is to configure which signals are handled by faulthandler_thread() using pthread_sigmask():
http://hg.python.org/sandbox/haypo/rev/4257fdfa5661

I forced a build on x86 FreeBSD (custom) buildbot:
http://www.python.org/dev/buildbot/all/builders/x86%20FreeBSD%20custom

I already tested sigprocmask() on FreeBSD (in my VM) and it works as expected: test_socket doesn't block anymore.
msg132921 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-04-04 09:05
New changeset ebc03d7e7110 by Victor Stinner in branch 'default':
Issue #11753: faulthandler thread uses pthread_sigmask()
http://hg.python.org/cpython/rev/ebc03d7e7110
msg132922 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-04 09:08
test_socket and test_signal succeed on "x86 FreeBSD custom":
http://www.python.org/dev/buildbot/all/builders/x86%20FreeBSD%20custom/builds/4
(there are other issues, but there are not related)

I pushed the fix in Python 3.3 (ebc03d7e7110).
History
Date User Action Args
2011-04-04 09:08:05vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg132922
2011-04-04 09:05:53python-devsetnosy: + python-dev
messages: + msg132921
2011-04-03 23:21:59vstinnersetmessages: + msg132904
2011-04-03 22:21:01vstinnersetmessages: + msg132897
2011-04-03 21:27:24vstinnersetmessages: + msg132889
2011-04-03 20:58:50neologixsetnosy: + neologix
messages: + msg132883
2011-04-03 19:44:52vstinnercreate