New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_asyncio: test_subprocess_send_signal hangs on Fedora builders #65446
Comments
Trying to build Python 3.4.0 for Fedora we are seeing test_asyncio test_subprocess_send_signal hang every time, on all architectures. Unfortunately I cannot reproduce this locally. These builds are done inside of chroots, and the host has the kernel version 3.12.8-300.fc20 which is used for all build targets. We see hangs building for Fedora Rawhide and RHEL 7. We do *not* see hangs on our COPR builders which among other possible differences use RHEL6 hosts with kernel 2.6.32-358.el6. I've attached an strace of the hanging test. The calling process seems to be stuck in epoll_wait(). Tried using the watchdog patch from issue bpo-19652 but that doesn't seem to manage to kill things. In fact, the tests are never killed but the 1 hour timeout in the test runner. |
Hmm, looking at things a little closer, it looks like the SIGHUP is arriving very early, perhaps too early? |
It may also be possible that something has set the SIGHUP handler to SIG_IGN when the test is run. |
Looks like in the Fedora koji builds, the SIGHUP sigaction is set to SIG_IGN, which causes the processes that the python tests are trying to kill with SIGHUP not to die. Perhaps the koji builders should not be doing that, perhaps the python tests should reset the SIGHUP sigaction to SIG_DFL. |
This issue is a race condition or bug in the unit test, not in asyncio. The test doesn't check if echo.py is running, if Python started. Python doesn't setup an handler for SIGHUP, it uses the current handler. On my Fedora 20, it looks to be "SIG_DFL": Python 3.5.0a0 (default:795d90c7820d+, Apr 16 2014, 00:18:50)
[GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import signal
>>> signal.getsignal(signal.SIGHUP)
<Handlers.SIG_DFL: 0> Extract of the attached strace: So the child process has SIGHUP configured to SIG_IGN on your platform. |
We have determined that the koji builder is indeed setting the SIGHUP sigaction to SIG_IGN, which the python test is inheriting, and are working on trying to get that fixed. However, it may be worth considering something like pexpect/pexpect@1fbfddf in the python tests to ensure that the test run properly in situations like this (I can imagine someone running them under "nohup"). |
Here is a patch implementing a basic synchronization between the parent and the child processing, to wait until the child is sleeping. Can you please try this patch? If it doesn't work, we might add a small sleep of 500 ms after the readline(). |
That appears to work. Thanks! |
New changeset 651475d67225 by Victor Stinner in branch '3.4': New changeset 45e8eb53edbc by Victor Stinner in branch 'default': |
Cool, I commited my enhancement of the unit test. |
I'm really sorry, I thought I had done the test build properly, but a second attempt has resulted in the same hang: http://koji.fedoraproject.org/koji/taskinfo?taskID=7165208 So I don't think it does the trick. |
Bug still reproduced. Jenkins running from init.d use /usr/bin/daemon. This mean SIGHUP will be in SIG_IGN state. Since echo.py does not setup sighup handler, sighup will be equivalent of SIGKILL. So, why not to use, say, SIGTERM instead? After such change all tests passed. If not, signal handling tests should reset signal handling to SIG_DFL. Please reopen |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: