The multiprocessing.resource_tracker instance is never reaped, leaving zombie processes.
There is a waitpid() call for the ResourceTracker's pid but it is in a private method _stop() which seems to be only called from some test modules.
Usually environments have some process handling zombies but if python is the "main" process in a container, for example, and runs another python instance that does something leaking a ResourceTracker process, zombies start to accumulate.
This is easily reproducible with a couple of small python programs as long as they are not run from a shell or another parent process that takes care of forgotten children.
It was originally discovered in a docker container that has a python program as its entry point (celery worker in an airflow container) running other python programs (dbt).
The minimal code is available on Github here: https://github.com/viktorvia/python-multi-issue
The attached multi.py is leaking resource tracker processes, but just running it from a full-fledged development environment will not show the issue.
Instead, run it via another python program from a Docker container:
Dockerfile:
---
FROM python:3.9
WORKDIR /usr/src/multi
COPY . ./
CMD ["python", "main.py"]
---
main.py:
---
from subprocess import run
from time import sleep
while True:
result = run(["python", "multi.py"], capture_output=True)
print(result.stdout.decode('utf-8'))
result = run(["ps", "-ef", "--forest"], capture_output=True)
print(result.stdout.decode('utf-8'), flush=True)
sleep(1)
---
When the program is run it will accumulate 1 zombie on each run:
---
$ docker run -it multi python main.py
[1, 4, 9]
UID PID PPID C STIME TTY TIME CMD
root 1 0 11 11:33 pts/0 00:00:00 python main.py
root 8 1 0 11:33 pts/0 00:00:00 [python] <defunct>
root 17 1 0 11:33 pts/0 00:00:00 ps -ef --forest
[1, 4, 9]
UID PID PPID C STIME TTY TIME CMD
root 1 0 6 11:33 pts/0 00:00:00 python main.py
root 8 1 3 11:33 pts/0 00:00:00 [python] <defunct>
root 19 1 0 11:33 pts/0 00:00:00 [python] <defunct>
root 28 1 0 11:33 pts/0 00:00:00 ps -ef --forest
[1, 4, 9]
UID PID PPID C STIME TTY TIME CMD
root 1 0 4 11:33 pts/0 00:00:00 python main.py
root 8 1 1 11:33 pts/0 00:00:00 [python] <defunct>
root 19 1 3 11:33 pts/0 00:00:00 [python] <defunct>
root 30 1 0 11:33 pts/0 00:00:00 [python] <defunct>
root 39 1 0 11:33 pts/0 00:00:00 ps -ef --forest
[1, 4, 9]
UID PID PPID C STIME TTY TIME CMD
root 1 0 3 11:33 pts/0 00:00:00 python main.py
root 8 1 1 11:33 pts/0 00:00:00 [python] <defunct>
root 19 1 1 11:33 pts/0 00:00:00 [python] <defunct>
root 30 1 4 11:33 pts/0 00:00:00 [python] <defunct>
root 41 1 0 11:33 pts/0 00:00:00 [python] <defunct>
root 50 1 0 11:33 pts/0 00:00:00 ps -ef --forest
---
Running from a shell script, or just another python program that handles SIGCHLD by calling wait() takes care of the zombies.
|