classification
Title: test_subprocess: test_child_terminated_in_stopped_state() leaks a zombie process
Type: Stage: resolved
Components: Tests Versions: Python 3.7, Python 3.6, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: gregory.p.smith, vstinner
Priority: normal Keywords:

Created on 2017-08-10 09:20 by vstinner, last changed 2017-08-11 12:38 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 3055 merged vstinner, 2017-08-10 10:04
PR 3070 merged vstinner, 2017-08-11 00:15
PR 3071 merged vstinner, 2017-08-11 00:45
Messages (7)
msg300066 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-10 09:20
The test_child_terminated_in_stopped_state() test creates a child process which calls ptrace(PTRACE_ME, 0, 0) and then crashs using SIGSEGV. The problem is that even if we read the exit status using os.waitpid() through subprocess, the process remains alive in the "t (tracing stop)" state.

I would prefer to not use ptrace() is an unit test since this API is very low-level and it's hard to use it correctly.

I suggest to either remove the functional test, or to rewrite it as an unit test using mocks to test bpo-29335 without ptrace().

haypo@selma$ ./python -m test -m test_child_terminated_in_stopped_state -F test_subprocess
Run tests sequentially
0:00:00 load avg: 0.95 [  1] test_subprocess
0:00:00 load avg: 0.95 [  2] test_subprocess
0:00:01 load avg: 0.96 [  3] test_subprocess
0:00:01 load avg: 0.96 [  4] test_subprocess
0:00:02 load avg: 0.96 [  5] test_subprocess
0:00:03 load avg: 0.96 [  6] test_subprocess
0:00:03 load avg: 0.96 [  7] test_subprocess
0:00:04 load avg: 0.96 [  8] test_subprocess
0:00:05 load avg: 0.96 [  9] test_subprocess
0:00:05 load avg: 0.96 [ 10] test_subprocess
^Z
[1]+  Stoppé                 ./python -m test -m test_child_terminated_in_stopped_state -F test_subprocess

haypo@selma$ ps
  PID TTY          TIME CMD
30359 pts/0    00:00:00 bash
31882 pts/0    00:00:00 python
31885 pts/0    00:00:00 python
31888 pts/0    00:00:00 python
31892 pts/0    00:00:00 python
31895 pts/0    00:00:00 python
31898 pts/0    00:00:00 python
31901 pts/0    00:00:00 python
31904 pts/0    00:00:00 python
31907 pts/0    00:00:00 python
31910 pts/0    00:00:00 python
31912 pts/0    00:00:00 python
31920 pts/0    00:00:00 ps

haypo@selma$ grep Stat /proc/31885/status
State:	t (tracing stop)
msg300068 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-10 10:08
https://github.com/python/cpython/pull/3055 removes the functional test and replaces it with an unit test which mocks os.waitpid() using a new _testcapi.W_STOPCODE() function to test the WIFSTOPPED() path.

The functional test created a core dump, but it's now fixed using SuppressCrashReport. It leaks a zombie process in a special state, the process is traced and cannot be killed. I tried to wait for the process a second time, but it's not enough to "close" it. I guess that we would have to write a little debugger to attach the process in the parent process. IMHO it's overcomplicated just to check that subprocess calls WIFSTOPPED().
msg300070 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-10 10:37
I chose to only add W_STOPCODE() to _testcapi rather than the os module, because I don't want to have to document this function. I don't think that anyone needs such function, usually we only need to consume process statuses, not to produce them. The only use case is to write an unit test.

This issue is part of bpo-31160 which ensures that unit tests don't leak child processes. This issue is part of my large project of reducing the fail rate on CIs (Travis CI, AppVeyor, buildbots):
https://haypo.github.io/python-buildbots-2017q2.html

I will now merge my PR 3055 to be able to unblock my work on CIs. But I will wait for feedback from Gregory before backporting this fix to 2.7 and 3.6.
msg300071 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-10 10:37
New changeset 7b7c6dcfff6a35333988a3c74c895ed19dff2e09 by Victor Stinner in branch 'master':
bpo-31173: Rewrite WSTOPSIG test of test_subprocess (#3055)
https://github.com/python/cpython/commit/7b7c6dcfff6a35333988a3c74c895ed19dff2e09
msg300148 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-11 00:36
New changeset bc69d00288a0b1f5ef49dcfd60a91c5e9b5b81ae by Victor Stinner in branch '3.6':
bpo-31173: Rewrite WSTOPSIG test of test_subprocess (#3055) (#3070)
https://github.com/python/cpython/commit/bc69d00288a0b1f5ef49dcfd60a91c5e9b5b81ae
msg300149 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-11 00:45
Reopen. I forgot Python 2.7.
msg300161 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-11 12:38
New changeset 4dea06531ece28dffc1452de2694fb22e99b45f9 by Victor Stinner in branch '2.7':
bpo-31173: Rewrite WSTOPSIG test of test_subprocess (#3055) (#3071)
https://github.com/python/cpython/commit/4dea06531ece28dffc1452de2694fb22e99b45f9
History
Date User Action Args
2017-08-11 12:38:59vstinnersetstatus: open -> closed
resolution: fixed
stage: resolved
2017-08-11 12:38:42vstinnersetmessages: + msg300161
2017-08-11 00:45:56vstinnersetpull_requests: + pull_request3107
2017-08-11 00:45:46vstinnersetmessages: + msg300149
2017-08-11 00:36:34vstinnersetmessages: + msg300148
2017-08-11 00:15:21vstinnersetpull_requests: + pull_request3106
2017-08-10 10:37:41vstinnersetmessages: + msg300071
2017-08-10 10:37:01vstinnersetnosy: + gregory.p.smith

messages: + msg300070
versions: + Python 2.7, Python 3.6
2017-08-10 10:08:24vstinnersetmessages: + msg300068
2017-08-10 10:04:16vstinnersetpull_requests: + pull_request3090
2017-08-10 09:21:02vstinnersetcomponents: + Tests
versions: + Python 3.7
2017-08-10 09:20:53vstinnercreate