New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sporadic freeze in test_interrupted_write_retry_text #67868
Comments
Sometimes the test suite freezes in test_interrupted_write_retry_text (test.test_io.CSignalsTest). Corresponding strace is: write(1, "test_interrupted_write_retry_tex"..., 66) = 66 A successful run looks like this: write(1, "test_interrupted_write_retry_tex"..., 66) = 66 |
The bug only occurs in Python 3.5, right?
How do you run the test suite? Is your system heavily loaded? Is it "fast"? This is an obvious race condition in the test if SIGARLM is send before write() is called: signal.alarm(1)
# Expected behaviour:
# - first raw write() is partial (because of the limited pipe buffer
# and the first alarm)
# - second raw write() returns EINTR (because of the second alarm)
# - subsequent write()s are successful (either partial or complete)
self.assertEqual(N, wio.write(item * N)) 1 second should be enough :-) -- Or it is maybe a regression caused by the changeset 5b63010be19e of issue bpo-23285 (PEP-475). |
The system is rather slow (a shared VPS instance). In the trace you can SIGALRM is triggered before the first write() call (or so it seems). |
Yep. It remembers me my old idea to make "sleep configurable" in tests: issue bpo-20910. Most of the time, 1 second is enough. But on such very slow setup, it's annoying to get random failures because of race conditions. It would be better to use longer timeout (ex: 5 seconds), without making tests longer on other buildbots. |
Note that PIPE_MAX_SIZE can be large. Perhaps move the memory allocation (i.e. Or what if alarm_interrupt is simply set up to retrigger the signal? e.g. instead of: def alarm_interrupt(self, sig, frame):
1/0 write: def alarm_interrupt(self, sig, frame):
signal.alarm(1)
1/0 |
New changeset a18f7508649b by Victor Stinner in branch 'default': |
Also, I think there another issue in that test. It uses There is another test that can have a race condition: check_interrupted_write(). |
2015-03-16 17:41 GMT+01:00 Antoine Pitrou report@bugs.python.org:
Good idea, the first strace shows that SIGALRM was received while I made this simple change. Can you tell me if my change fixes the issue?
It may make the test more reliable and I don't see how it can fix the |
PIPE_MAX_SIZE is much larger than the effictive size of a pipe on # A constant likely larger than the underlying OS pipe buffer size, to
# make writes blocking.
# Windows limit seems to be around 512 B, and many Unix kernels have a
# 64 KiB pipe buffer size or 16 * PAGE_SIZE: take a few megs to be sure.
# (see issue python/cpython#62035 for a discussion of this number).
PIPE_MAX_SIZE = 4 * 1024 * 1024 + 1 I don't think that PIPE_MAX_SIZE+1 makes a difference here. |
New changeset 10acab2d4a88 by Victor Stinner in branch 'default': |
I found a relatively recent case of this failing on a buildbot: http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.x%203.x/builds/4339/steps/test/logs/stdio Line 3808 is the write() call, so it could be hanging if the second signal is delivered just before Python makes the write() syscall. Perhaps it might help to use setitimer() so that even if a signal is delivered at an inconvenient moment, the write() call will still be interrupted by a later signal, and read thread will still be spawned. Also see bpo-22331 for an idea to avoid the EBADF hack. |
Here is my suggested change to use setitimer(). I also closed the pipe, which means there is no need to time out the read, speeding the test up a bit more. |
I experience this problem when trying to build/test Python 3.6 on the JASMIN Analysis Platform which runs Red Hat Enterprise Linux Server release 6.8 on a machine with 48 × Intel(R) Xeon(R) CPU E7-4860 v2 @ 2.60GHz, 2 TiB RAM, and a PanFSⓇ distributed file system. I experience this problem regardless whether I build/test Python 3.6 from within a PanFS filesystem or otherwise. ./python -m test -v test_io after which there is no more output. |
Closing as it is out of date now and our CI and buildbots are green. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: