This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: import _tkinter + TestForkInThread leaves zombie with stalled thread
Type: resource usage Stage: resolved
Components: Tests, Tkinter Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: martin.panter Nosy List: martin.panter, pitrou, python-dev, serhiy.storchaka, zach.ware
Priority: normal Keywords: patch

Created on 2016-02-29 06:29 by martin.panter, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
child-exit.patch martin.panter, 2016-03-01 05:48 review
Messages (8)
msg260993 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-02-29 06:29
After running the 2.7 test suite many times, my Linux OS’s memory slowly gets eaten up. It seems to be because of zombie Python processes that never get cleaned up unless I kill them explicitly. I never get this problem with the Python 3 test suite.

I narrowed it down to running test_tcl followed by test_thread, and then narrowed it even further to importing _tkinter and running TestForkInThread.test_forkinthread(). Now I have it minimized to the following:

$ ./python -c 'import _tkinter, thread, os; thread.start_new_thread(os.fork, ())'

A process is left behind listed with the “defunct” or Z (zombie) status. However it has a child thread; maybe this is why it does not automatically get cleaned up.

Extract from “htop”:
  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
    1 root       20   0 35412  4528  3448 S  0.0  0.2  0:01.25 /sbin/init
12615 vadmium    20   0     0     0     0 Z  0.0  0.0  0:00.00 ├─ python
12616 vadmium    20   0  142M  5952  2220 S  0.0  0.3  0:00.00 │  └─ ./python -c import _tkinter, thread, os; thread.start_new_thread(os.fork, ()) 

$ sudo strace -p 12616
Process 12616 attached - interrupt to quit
select(4, [3], [], [], NULL^C <unfinished ...>
Process 12616 detached
$ ls -l /proc/12616/fd
total 0
lrwx------ 1 vadmium users 64 Feb 29 05:57 0 -> /dev/pts/1
lrwx------ 1 vadmium users 64 Feb 29 05:57 1 -> /dev/pts/1
lrwx------ 1 vadmium users 64 Feb 29 05:57 2 -> /dev/pts/1
lr-x------ 1 vadmium users 64 Feb 29 05:57 3 -> pipe:[946176]
lr-x------ 1 vadmium users 64 Feb 29 05:57 4 -> pipe:[946321]
l-wx------ 1 vadmium users 64 Feb 29 05:57 5 -> pipe:[946176]
$ pacman -Q systemd glibc
systemd 222-1
glibc 2.22-4
msg261003 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-02-29 11:23
I should point out that I think this problem didn’t used to happen. As far as I know, it could be a bug in a recently upgraded glibc or something. On another Linux computer I cannot produce the problem. When I get a chance I will try upgrading packages to see if that triggers the problem.
msg261020 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2016-02-29 20:32
I suspect this may be what causes several of the 2.7 builders to fail.  The ones that fail look like they complete successfully, but then sit with no output until buildbot kills them.

For example:

http://buildbot.python.org/all/builders/AMD64%20Debian%20PGO%202.7/builds/506/steps/compile/logs/stdio
http://buildbot.python.org/all/builders/s390x%20Debian%202.7/builds/258/steps/test/logs/stdio
http://buildbot.python.org/all/builders/s390x%20RHEL%202.7/builds/261/steps/test/logs/stdio
http://buildbot.python.org/all/builders/s390x%20SLES%202.7/builds/264/steps/test/logs/stdio
http://buildbot.python.org/all/builders/x86-64%20Ubuntu%2015.10%20Skylake%20CPU%202.7/builds/159/steps/test/logs/stdio

In each of those cases, test_tcl runs before test_thread (except on the SLES builder; but test_tk also imports _tkinter and does come before test_thread).

I can reproduce on Ubuntu 14.04.3, but not on a freshly updated Gentoo.
msg261038 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-03-01 05:48
Yes it looks like you might be right about those hanging buildbots. The occasional successes (e.g. <http://buildbot.python.org/all/builders/s390x%20Debian%202.7/builds/205/steps/test/logs/stdio>) seem to happen when test_thread runs before any of the TK, TCL, and Idle tests.

The reason why this does not affect Python 3 is probably because the test only calls sys.exit() in Python 2; this code was added in r78527. In Python 3, the code was merged in revision 58c35495a934, but the code was apparently changed to call os._exit() at the same time. So one potential fix or workaround could be to change to os._exit() as in child-exit.patch.

It seems Tcl_FindExecutable() creates a thread, and this thread survives fork(). (Perhaps it is re-created?) Python exiting does not cause this thread to be stopped. Playing with “strace” it seems the threads that return from fork() in the parent and child both finish with _exit(0). However the “main” thread in the parent finishes with exit_group(0), which is documented as terminating all threads. Calling os._exit() also seems to call exit_group(), which explains why that fixes the problem in the child.

I can produce the problem in all versions of Python without using _tkinter, using the following code instead:

import _thread, os, time

def thread1():
    pid = os.fork()
    if not pid:                                            
        # In the child, the original main thread no longer exists. Start a
        # new thread that will stall for 60 s.
        _thread.start_new_thread(time.sleep, (60,))

_thread.start_new_thread(thread1, ())
time.sleep(2)  # Give fork() a chance to run

I’m not really sure, but maybe Python could improve its handling of this case, when fork() is called on a non-“main” thread and another thread is also running in the child process.
msg261178 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2016-03-03 21:30
I can confirm that child-exit.patch fixes the immediate issue, so I'm +1 on just committing it since it will make several buildbots useful again.  Improving general handling of the situation can be done in a new issue.

For the record, I agree that this seems to be a relatively recent phenomenon.  I tried bisecting cpython to find a source for it (using `./python -m test.regrtest test_tcl test_thread`), but the bisect just came up with the first changeset that allows _tkinter to actually build.  Perhaps Tcl_FindExecutable starting a thread is a new thing?
msg261184 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-04 08:20
But please add a reference to this issue.
msg261331 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-03-08 07:45
New changeset 613196986c09 by Martin Panter in branch '2.7':
Issue #26456: Force all child threads to terminate in TestForkInThread
https://hg.python.org/cpython/rev/613196986c09
msg261377 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-03-08 20:36
The change to the test seems to have the desired effect. The buildbots are no longer timing out (tests are failing for other reasons).
History
Date User Action Args
2022-04-11 14:58:28adminsetgithub: 70643
2016-03-08 20:36:35martin.pantersetstatus: open -> closed
resolution: fixed
messages: + msg261377

stage: commit review -> resolved
2016-03-08 07:45:16python-devsetnosy: + python-dev
messages: + msg261331
2016-03-04 08:20:01serhiy.storchakasetmessages: + msg261184
2016-03-03 21:30:02zach.waresetassignee: martin.panter
messages: + msg261178
stage: commit review
2016-03-01 05:48:57martin.pantersetfiles: + child-exit.patch
keywords: + patch
messages: + msg261038
2016-02-29 20:32:19zach.waresetnosy: + zach.ware
messages: + msg261020
2016-02-29 11:23:50martin.pantersetmessages: + msg261003
2016-02-29 07:19:34serhiy.storchakasetnosy: + pitrou, serhiy.storchaka
2016-02-29 06:29:11martin.pantercreate