Issue 11753: test_sendall_interrupted() of test_socket hangs on FreeBSD

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/55962

classification

Title:	test_sendall_interrupted() of test_socket hangs on FreeBSD
Type:		Stage:
Components:	Library (Lib)	Versions:	Python 3.3

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	neologix, python-dev, vstinner
Priority:	normal	Keywords:

Created on 2011-04-03 19:44 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (7)
msg132871 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-04-03 19:44
I added a timeout of 30 minutes to regrtest. On "x86 FreeBSD 7.2 3.x" and "x86 FreeBSD 3.x" buildbot, test_sendall_interrupted() of test_socket does timeout after 30 minutes: ---------- ... [201/354] test_socket Thread 0x28401040: File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/test_socket.py", line 729 in check_sendall_interrupted File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/test_socket.py", line 740 in test_sendall_interrupted File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/unittest/case.py", line 387 in _executeTestPart File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/unittest/case.py", line 442 in run File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/unittest/case.py", line 494 in __call__ File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/unittest/suite.py", line 105 in run File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/unittest/suite.py", line 67 in __call__ File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/unittest/suite.py", line 105 in run File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/unittest/suite.py", line 67 in __call__ File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/support.py", line 1078 in run File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/support.py", line 1166 in _run_suite File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/support.py", line 1192 in run_unittest File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/test_socket.py", line 2052 in test_main File "./Lib/test/regrtest.py", line 1032 in runtest_inner File "./Lib/test/regrtest.py", line 826 in runtest File "./Lib/test/regrtest.py", line 650 in main File "./Lib/test/regrtest.py", line 1607 in <module> ---------- http://www.python.org/dev/buildbot/all/builders/x86%20FreeBSD%207.2%203.x/builds/1644/steps/test/logs/stdio another example: http://www.python.org/dev/buildbot/all/builders/x86%20FreeBSD%203.x/builds/1344/steps/test/logs/stdio It looks like the test should be done in less than 5 seconds. Extract of the code (with_timeout=False for test_sendall_interrupted): ------------------------- def check_sendall_interrupted(self, with_timeout): # socketpair() is not stricly required, but it makes things easier. if not hasattr(signal, 'alarm') or not hasattr(socket, 'socketpair'): self.skipTest("signal.alarm and socket.socketpair required for this test") # Our signal handlers clobber the C errno by calling a math function # with an invalid domain value. def ok_handler(args): self.assertRaises(ValueError, math.acosh, 0) def raising_handler(args): self.assertRaises(ValueError, math.acosh, 0) 1 // 0 c, s = socket.socketpair() old_alarm = signal.signal(signal.SIGALRM, raising_handler) try: if with_timeout: # Just above the one second minimum for signal.alarm c.settimeout(1.5) with self.assertRaises(ZeroDivisionError): signal.alarm(1) c.sendall(b"x" * (1024*2)) <~~~~ HERE if with_timeout: signal.signal(signal.SIGALRM, ok_handler) signal.alarm(1) self.assertRaises(socket.timeout, c.sendall, b"x" (1024**2)) finally: signal.signal(signal.SIGALRM, old_alarm) c.close() s.close() ------------------------- I suppose that sendall() should be interrupted by SIGALRM. sendall() calls PyErr_CheckSignals() after each system call to send(), and exits with an error if PyErr_CheckSignals() returns NULL. If send() fails with EINTR error, sendall() retries calls send() again (in a loop). I don't know if this hang is new or not, and if it is (or not) a border effect of the timeout feature of regrtest... But test_signal don't always fail, it looks like a race condition.
msg132883 - (view)	Author: Charles-François Natali (neologix) *	Date: 2011-04-03 20:58
This test assumes that send will necessarily return if interrupted by a signal, but the kernel can automatically restart the syscall when no data has been committed (instead of returning -1 with errno set to EINTR). And, AFAIK, that's exactly what FreeBSD does, see http://www.freebsd.org/cgi/man.cgi?query=siginterrupt&apropos=0&sektion=0&manpath=FreeBSD+8.2-RELEASE&format=html : """ The siginterrupt() function is used to change the system call restart behavior when a system call is interrupted by the specified signal. If the flag is false (0), then system calls will be restarted if they are interrupted by the specified signal and no data has been transferred yet. System call restart has been the default behavior since 4.2BSD, and is the default behaviour for signal(3) on FreeBSD. """ And http://www.gsp.com/cgi-bin/man.cgi?section=2&topic=sigaction : """ If a signal is caught during the system calls listed below, the call may be forced to terminate with the error EINTR, the call may return with a data transfer shorter than requested, or the call may be restarted. Restart of pending calls is requested by setting the SA_RESTART bit in sa_flags. The affected system calls include open(2), read(2), write(2), sendto(2), recvfrom(2), sendmsg(2) and recvmsg(2) on a communications channel or a slow device (such as a terminal, but not a regular file) and during a wait(2) or ioctl(2). However, calls that have already committed are not restarted, but instead return a partial success (for example, a short read count). """ So if the signal arrives while some data has been transferred, send will return with a partial write, but if the signal arrives before any data has been written, then you'll never see EINTR and remain stuck forever (unless SA_RESTART is unset). Note that POSIX seems to require write to return with EINTR if interrupted before any data is written, see http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html : """ If write() is interrupted by a signal before it writes any data, it shall return -1 with errno set to [EINTR]. """ But send and sendto man pages don't require this behaviour.
msg132889 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-04-03 21:27
As issue #11755, this issue is a regression introduced by regrtest timeout (issue #11727): test_socket doesn't fail without regrtest timeout.
msg132897 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-04-03 22:21
Issue #11755 (test_itimer_real() of test_signal hang on FreeBSD) has been marked as a duplicate of this issue.
msg132904 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-04-03 23:21
The problem is that faulthandler's thread (faulthandler_thread) receives the SIGALRM signal: the signal interrupts sem_timedwait() which returns EINTR. PyThread_acquire_lock_timed() retries sem_timedwait() and so the other thread executing sendall() is not interrupted. The solution is to configure which signals are handled by faulthandler_thread() using pthread_sigmask(): http://hg.python.org/sandbox/haypo/rev/4257fdfa5661 I forced a build on x86 FreeBSD (custom) buildbot: http://www.python.org/dev/buildbot/all/builders/x86%20FreeBSD%20custom I already tested sigprocmask() on FreeBSD (in my VM) and it works as expected: test_socket doesn't block anymore.
msg132921 - (view)	Author: Roundup Robot (python-dev)	Date: 2011-04-04 09:05
New changeset ebc03d7e7110 by Victor Stinner in branch 'default': Issue #11753: faulthandler thread uses pthread_sigmask() http://hg.python.org/cpython/rev/ebc03d7e7110
msg132922 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-04-04 09:08
test_socket and test_signal succeed on "x86 FreeBSD custom": http://www.python.org/dev/buildbot/all/builders/x86%20FreeBSD%20custom/builds/4 (there are other issues, but there are not related) I pushed the fix in Python 3.3 (ebc03d7e7110).

History
Date	User	Action	Args
2022-04-11 14:57:15	admin	set	github: 55962
2011-04-04 09:08:05	vstinner	set	status: open -> closed resolution: fixed messages: + msg132922
2011-04-04 09:05:53	python-dev	set	nosy: + python-dev messages: + msg132921
2011-04-03 23:21:59	vstinner	set	messages: + msg132904
2011-04-03 22:21:01	vstinner	set	messages: + msg132897
2011-04-03 21:27:24	vstinner	set	messages: + msg132889
2011-04-03 20:58:50	neologix	set	nosy: + neologix messages: + msg132883
2011-04-03 19:44:52	vstinner	create