This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Mixing multiprocessing pool and subprocess may create zombie process, and cause program to hang.
Type: behavior Stage:
Components: Extension Modules Versions: Python 3.4, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: amikoren@yahoo.com, davin, gregory.p.smith, r.david.murray, rpcope1, sbt
Priority: normal Keywords:

Created on 2015-12-09 10:04 by amikoren@yahoo.com, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
hang_multiprocess_subprocess.py amikoren@yahoo.com, 2015-12-09 10:04 reducted script to demonstrate the hang
Messages (6)
msg256150 - (view) Author: Ami Koren (amikoren@yahoo.com) Date: 2015-12-09 10:04
Happens on Linux (Debian), Linux version 3.16.0-4-amd64 .
Seems like a multiprocessing issue.

When I use both multiprocessing pool and subprocess somewhere in the same python program, sometimes the subprocess become
'zombie', and the parent wait for it forever.

Reproduce:
 run the attached script (I ran it on both python 3.4 and 3.5), and wait (up to a minute in my
computer). Eventually, the script will hang (wait forever).

After it hangs:
ps -ef | grep "hang_multiprocess\|ls"

You should see now the "[ls] <defunct>" process - zombie.


Analysis:
Players:
- Parent process
- Subprocess Child - forked by parent using subprocess.popen()
- Handle_workers thread - multiprocessing thread responsible for verifying all workers are OK, and create them if not.
- Multiprocessing Worker - forked by multiprocessing, either at handle_workers thread context, or at main thread context.

The problem, in a nutshell, is that Handle_workers thread forks a Worker, while Subprocess Child creation.
This causes one of the Child pipes to be 'copied' to the Worker. When the Subprocess Child finishes, the
pipe is still alive (at Worker), hence Parent Process wait forever for the pipe to finish. Child turn into zombie because Parent doesn't reach the communicate/wait line.

In more details:
- The problematic line at subprocess is at  subprocess.py->_execute_child(), before  'while True:' loop, where errpipe_read pipe is read.
- The entry point at multiprocessing is at multiprocessing/pool.py->_handle_workers(). There the thread sleeps for 0.1,
  and then try to create (=fork) new workers.

Handle_workers thread 'copies' errpipe_read to the forked Worker. Hence the pipe never gets closed.

To me, it seems like a multiprocessing issue: The forking from a thread at multiprocessing module is the cause for this mess.

I'm a newbe at Python (first bug launched), so please be patient if I missed anything or jumped into non-based conclusions.
msg256156 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-12-09 15:34
Well, it sounds more like a problem of posix fork semantics.  What you need is to prevent workers from being spawned while the subprocess is running, which the multiprocessing API may not support (I'm not that familiar with it), and which might or might not work for your application in any case depending on what you are using each one for.

I'm not sure there's much Python can do to mitigate this problem, but I'll leave answering that to the experts :)
msg256157 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-12-09 15:35
Oh, I think there is a solution, though: don't use fork, use spawn.
msg256168 - (view) Author: Ami Koren (amikoren@yahoo.com) Date: 2015-12-10 08:14
Thanks David. using spawn -  multiprocessing.get_context('spawn').Pool(... - does the job . It does has it's flows - fork allows me to share data between workers (especially large readonly memory database, which I don't want to duplicate for each worker), which spawn (which uses fork-exec python script) doesn't. So I'll have to see about that.

I still don't understand why forking has to be done under the worker thread context. It doesn't seem like a good design - When forking from a thread you can never be sure what is being forked. A better approach seems to be to fork missing workers on-demand, synchronous to the main thread. But I probably lack the historic context of the multiprocess module.
msg256175 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-12-10 15:04
It probably has to do with the process management, but as I said I'm not that familiar with it, so we'll have to wait for the experts to chime in.
msg256264 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2015-12-12 05:46
I wouldn't _assume_ that there was a good design reason for that in multiprocessing... it already mixed threads and fork() without realizing that you cannot safely do that.
History
Date User Action Args
2022-04-11 14:58:24adminsetgithub: 70015
2016-03-29 06:30:49rpcope1setnosy: + rpcope1
2016-02-13 04:51:33ned.deilysetnosy: + davin
2015-12-12 05:46:23gregory.p.smithsetnosy: + gregory.p.smith, - gps
messages: + msg256264
2015-12-10 15:04:10r.david.murraysetmessages: + msg256175
2015-12-10 08:14:42amikoren@yahoo.comsetmessages: + msg256168
2015-12-09 15:35:56r.david.murraysetmessages: + msg256157
2015-12-09 15:34:35r.david.murraysetnosy: + gps, r.david.murray, sbt
messages: + msg256156
2015-12-09 10:04:34amikoren@yahoo.comcreate