Issue34130
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2018-07-16 22:19 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 8326 | merged | vstinner, 2018-07-18 14:25 | |
PR 8327 | merged | vstinner, 2018-07-18 15:12 | |
PR 8329 | merged | vstinner, 2018-07-18 16:34 | |
PR 8330 | merged | miss-islington, 2018-07-18 16:48 |
Messages (21) | |||
---|---|---|---|
msg321775 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-07-16 22:19 | |
test_signal.test_warn_on_full_buffer() failed on AppVeyor. https://ci.appveyor.com/project/python/cpython/build/3.8build19372 ====================================================================== FAIL: test_warn_on_full_buffer (test.test_signal.WakeupSocketSignalTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\projects\cpython\lib\test\test_signal.py", line 538, in test_warn_on_full_buffer assert_python_ok('-c', code) File "C:\projects\cpython\lib\test\support\script_helper.py", line 157, in assert_python_ok return _assert_python(True, *args, **env_vars) File "C:\projects\cpython\lib\test\support\script_helper.py", line 143, in _assert_python res.fail(cmd_line) File "C:\projects\cpython\lib\test\support\script_helper.py", line 84, in fail err)) AssertionError: Process return code is 1 command line: ['C:\\projects\\cpython\\PCbuild\\win32\\python.exe', '-X', 'faulthandler', '-I', '-c', 'if 1:\n import errno\n import signal\n import socket\n import sys\n import time\n import _testcapi\n from test.support import captured_stderr\n\n signum = signal.SIGINT\n\n # This handler will be called, but we intentionally won\'t read from\n # the wakeup fd.\n def handler(signum, frame):\n pass\n\n signal.signal(signum, handler)\n\n read, write = socket.socketpair()\n read.setblocking(False)\n write.setblocking(False)\n\n # Fill the send buffer\n try:\n while True:\n write.send(b"x")\n except BlockingIOError:\n pass\n\n # By default, we get a warning when a signal arrives\n signal.set_wakeup_fd(write.fileno())\n\n with captured_stderr() as err:\n _testcapi.raise_signal(signum)\n\n err = err.getvalue()\n if (\'Exception ignored when trying to send to the signal wakeup fd\'\n not in err):\n raise AssertionError(err)\n\n # And also if warn_on_full_buffer=True\n signal.set_wakeup_fd(write.fileno(), warn_on_full_buffer=True)\n\n with captured_stderr() as err:\n _testcapi.raise_signal(signum)\n\n err = err.getvalue()\n if (\'Exception ignored when trying to send to the signal wakeup fd\'\n not in err):\n raise AssertionError(err)\n\n # But not if warn_on_full_buffer=False\n signal.set_wakeup_fd(write.fileno(), warn_on_full_buffer=False)\n\n with captured_stderr() as err:\n _testcapi.raise_signal(signum)\n\n err = err.getvalue()\n if err != "":\n raise AssertionError("got unexpected output %r" % (err,))\n\n # And then check the default again, to make sure warn_on_full_buffer\n # settings don\'t leak across calls.\n signal.set_wakeup_fd(write.fileno())\n\n with captured_stderr() as err:\n _testcapi.raise_signal(signum)\n\n err = err.getvalue()\n if (\'Exception ignored when trying to send to the signal wakeup fd\'\n not in err):\n raise AssertionError(err)\n\n '] stdout: --- --- stderr: --- Traceback (most recent call last): File "<string>", line 39, in <module> AssertionError --- ---------------------------------------------------------------------- The test passed when run again sequentially ("Re-running test 'test_signal' in verbose mode"). |
|||
msg321780 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-07-17 00:09 | |
The test failed at: # Fill the send buffer try: while True: write.send(b"x") except BlockingIOError: pass # By default, we get a warning when a signal arrives signal.set_wakeup_fd(write.fileno()) with captured_stderr() as err: _testcapi.raise_signal(signum) err = err.getvalue() if ('Exception ignored when trying to {action} to the signal wakeup fd' not in err): raise AssertionError(err) # <~~~~~~ HERE |
|||
msg321781 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-07-17 00:16 | |
Traceback (most recent call last): File "<string>", line 39, in <module> AssertionError According to the traceback, the captured stderr ('err' variable) is an empty string. The test uses test.support.capture_stderr() which replaces sys.stderr. The signal module calls PySys_WriteStderr(msg) which calls sys.stderr.write(msg). If the Python call fails, msg is supposed to be written into the C stderr stream. sys.stderr.flush() is not called, but it shouldn't be needed, since test.support.capture_stderr() replace sys.stderr with a io.StringIO object. |
|||
msg321783 - (view) | Author: Nathaniel Smith (njs) * | Date: 2018-07-17 00:42 | |
Huh, that's weird. My first thought was some kind of race condition, but... raise_signal uses raise(), which on Windows should be invoking the signal handler synchronously, so the warning should definitely be printed before raise_signal() returns. Could the warning be trapped in some buffer? That would be weird too, usually stderr and warnings should not be buffered. Of course putting a retry loop around the test is an option if we can't figure out how to fix it properly. |
|||
msg321815 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-07-17 11:29 | |
I succeeded to reproduce the issue on Windows, but I also got a failure of test_signal.test_socket() as if the C signal handle has not been called immediately by _testcapi.raise_signal() and so the socket didn't contain the byte of the signal number. It seems quite easy to make test_signal fail on Windows. |
|||
msg321873 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-07-18 10:09 | |
On Windows 10, when I run "python -m test test_signal -v -F" in 3 terminals in parallel, sometimes test_socket() fails. I debugged test_socket() and I validated that the C signal handler called send() and that the signal byte (signal number) has been sent properly (send() returns 1) when the bug occurs. I added an internal "static size_t written = 0;" variable which is incremented when send() succeed and I added a signal.get_written() method to get the value after read.recv(1) fails in the unit test. But recv() fails with BlockingIOError, as if the single byte has not be transfered from the write end to the read end of the TCP socket pair. On Windows, a socket pair is a pair of two TCP sockets connected on the local link (IPv4 127.0.0.1). It's not a UNIX socket as on Linux. Questions. (*) Is the socket pair properly connected when the bug occurs? socket.socketpair() sets the socket as non-blocking to call connect() and ignores when connect() fails with BlockingIOError or InterruptedError (EINTR). I modified the Windows implementation of socketpair() (in socket.py) to use a blocking call and the bug occurs even if connect() is made in blocking mode. I also checked that csock.getpeername() == lsock.getsockname(). So the socket is connected. (*) Does the socket pair use buffering somewhere? Both socekts are set to non-blocking mode by test_signal.test_socket(). By default, SO_SNDBUF is 64 KiB (65 536 bytes) for the write end of the socket pair, and SO_RCVBUF is 64 KiB (65 536 bytes) for the read end of the socket pair. It *seems* like adding the following call to test_socket() works around the bug: write.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 0) I'm not sure if it's "correct" to use SO_SNDBUF=0. I'm not sure that SO_SNDBUF=0 really *fix* the bug: it's a race condition, so it's hard to check if it's really fixed or not. |
|||
msg321874 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-07-18 10:16 | |
> write.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 0) Instead of that, I also tried: read.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1) write.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1) But I'm able to reproduce the bug with TCP_NODELAY=1 on the write end. (The option should have no impact on the read end, but I added it anyway, just in case, I still reproduce the bug.) MSDN documentation for TCP_NODELAY: "Disables the Nagle algorithm for send coalescing.This socket option is included for backward compatibility with Windows Sockets 1.1 " https://docs.microsoft.com/en-us/windows/desktop/api/winsock/nf-winsock-setsockopt |
|||
msg321875 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-07-18 10:18 | |
Note: socket.socketpair() has been implemented for Windows in pure Python in Python 3.5: bpo-18643. |
|||
msg321876 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-07-18 10:28 | |
Oh, I found a test written by Nathaniel in trio source code and he wrote that "if SNDBUF = 0 freezes, so that's useless" :-) https://github.com/python-trio/trio/blob/master/notes-to-self/socketpair-buffering.py |
|||
msg321877 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-07-18 10:38 | |
Using "write.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 0)", I ran a stress test on test_socket() in 3 different terminals and I failed to reproduce the bug, whereas the same test (in 3 terminals) usually reproduces the bug in less than 30 seconds. Using "write.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 1)", I still reproduce the bug. It seems like Windows behaves differently with SO_SNDBUF=0 as if the socket has really no kernel buffer and so respects user preference for SO_SNDBUF=0 ;-) |
|||
msg321879 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-07-18 10:48 | |
I found two different articles on microsoft.com which confirms that SO_SNDBUF=0 is valid and is intended to completely disable kernel buffering: * https://support.microsoft.com/en-gb/help/181611/socket-overlapped-i-o-versus-blocking-nonblocking-mode * https://blogs.msdn.microsoft.com/larryosterman/2004/08/06/the-consequences-of-ignoring-nagling-and-delayed-acks/ "If SO_SNDBUF=0, each send() or WriteFile() must wait for an acknowledgement from the other side before returning. (That is the only way for the protocol to implement retransmissions)" |
|||
msg321880 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-07-18 12:37 | |
I ran "python -m test test_signal -v -F" 4 times in parallel (in 4 terminals), and I reproduced test_warn_on_full_buffer() failure after one hour: 2 terminals got the failures (the 2 others are still running). I ran this test using "write.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 0)" fix on test_socket(). |
|||
msg321881 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-07-18 13:14 | |
I modified test_warn_on_full_buffer() to fill the socket pair using chunks of 64 KiB instead of sending b'x' byte per byte, just to make the test faster. Using two terminals, I ran "python -m test test_signal -m test_warn_on_full_buffer -F": I reproduced the bug in 2 minutes. Ok. The problem is in that is that the code responsible to full the socketpair is not reliable. *Sometimes*, send() blocks, whereas if I retry *immediately*, the send() complete (the socketpair is not full). The issue seems to be related to the test_socket() failure where send() is "asynchronous". |
|||
msg321884 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-07-18 14:26 | |
I wrote a simple fix for test_signal.test_socket(): PR 8326. Just configure the read end of the socket pair as blocking, so read.recv(1) blocks until we get the signal handler. This fix is simple and should be more portable than my previous attempt of workaround: write.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 0) |
|||
msg321889 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-07-18 15:16 | |
I tried to modify the test_warn_on_full_buffer() code filling the socketpair buffer: * iterate on multiple chunk sizes: 2**16, 2**10, 2**5, 1 bytes * first try in non-blocking mode then retry with a timeout of 100 ms * I also tried to play with SO_SNDBUF and SO_RCVBUF socket options The problem is that I failed to fail a reliable way to fill the socketpair buffer. I will stick with Nathaniel's very simple idea: retry once if the test fails. I implemented exactly this idea in my PR 8327 (with additional minor changes to enhance the test). |
|||
msg321890 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-07-18 15:44 | |
New changeset 99bb6df66a42625367c4f38e6802c8bb527baf4a by Victor Stinner in branch 'master': bpo-34130: Fix test_signal.test_socket() (GH-8326) https://github.com/python/cpython/commit/99bb6df66a42625367c4f38e6802c8bb527baf4a |
|||
msg321892 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-07-18 16:29 | |
New changeset 686b4b5ff219ed66714f3b811815776dafadc23b by Victor Stinner in branch 'master': bpo-34130: Fix test_signal.test_warn_on_full_buffer() (GH-8327) https://github.com/python/cpython/commit/686b4b5ff219ed66714f3b811815776dafadc23b |
|||
msg321893 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-07-18 16:48 | |
Python 2.7 is not impacted: it doesn't have test_socket() nor test_warn_on_full_buffer() tests in test_signal. Python 3.6 only has test_socket() in test_signal. |
|||
msg321894 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-07-18 16:49 | |
New changeset 296e5720921f1837dfd09a16457ac5bba7cb1676 by Victor Stinner in branch '3.7': [3.7] bpo-34130: Fix 2 race conditions in test_signal (GH-8329) https://github.com/python/cpython/commit/296e5720921f1837dfd09a16457ac5bba7cb1676 |
|||
msg321897 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-07-18 17:02 | |
New changeset e13180bc4c40a9ac300590c62a6efbcaf10ae6df by Victor Stinner (Miss Islington (bot)) in branch '3.6': bpo-34130: Fix test_signal.test_socket() (GH-8326) (GH-8330) https://github.com/python/cpython/commit/e13180bc4c40a9ac300590c62a6efbcaf10ae6df |
|||
msg321898 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-07-18 17:03 | |
Ok, test_signal should now be more reliable on Windows in 3.6, 3.7 and master branches. If test_signal.test_warn_on_full_buffer() fails again, we should just increase the timeout of 50 ms to 100 ms or more. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:03 | admin | set | github: 78311 |
2018-07-18 17:03:02 | vstinner | set | status: open -> closed resolution: fixed messages: + msg321898 stage: patch review -> resolved |
2018-07-18 17:02:09 | vstinner | set | messages: + msg321897 |
2018-07-18 16:49:30 | vstinner | set | messages: + msg321894 |
2018-07-18 16:48:44 | vstinner | set | messages:
+ msg321893 versions: + Python 3.6, Python 3.7 |
2018-07-18 16:48:12 | miss-islington | set | pull_requests: + pull_request7867 |
2018-07-18 16:34:51 | vstinner | set | pull_requests: + pull_request7866 |
2018-07-18 16:29:57 | vstinner | set | messages: + msg321892 |
2018-07-18 15:44:07 | vstinner | set | messages: + msg321890 |
2018-07-18 15:16:02 | vstinner | set | messages: + msg321889 |
2018-07-18 15:12:51 | vstinner | set | pull_requests: + pull_request7863 |
2018-07-18 14:26:59 | vstinner | set | messages: + msg321884 |
2018-07-18 14:25:35 | vstinner | set | keywords:
+ patch stage: patch review pull_requests: + pull_request7862 |
2018-07-18 13:14:53 | vstinner | set | messages: + msg321881 |
2018-07-18 12:37:02 | vstinner | set | messages: + msg321880 |
2018-07-18 10:48:02 | vstinner | set | messages: + msg321879 |
2018-07-18 10:38:34 | vstinner | set | messages: + msg321877 |
2018-07-18 10:28:24 | vstinner | set | messages: + msg321876 |
2018-07-18 10:18:01 | vstinner | set | messages: + msg321875 |
2018-07-18 10:16:58 | vstinner | set | nosy:
+ yselivanov messages: + msg321874 |
2018-07-18 10:09:54 | vstinner | set | messages: + msg321873 |
2018-07-17 11:29:23 | vstinner | set | messages: + msg321815 |
2018-07-17 00:43:00 | njs | set | messages: + msg321783 |
2018-07-17 00:16:35 | vstinner | set | messages: + msg321781 |
2018-07-17 00:09:38 | vstinner | set | messages: + msg321780 |
2018-07-16 22:19:02 | vstinner | create |