This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: multiprocessing.Process.join() ignores timeout if child process use os.exec*()
Type: behavior Stage: needs patch
Components: Library (Lib) Versions: Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Huazuo Gao, davin, josh.r, larry, pitrou, vstinner
Priority: normal Keywords:

Created on 2019-01-04 11:52 by Huazuo Gao, last changed 2022-04-11 14:59 by admin.

Messages (5)
msg332970 - (view) Author: Huazuo Gao (Huazuo Gao) Date: 2019-01-04 11:52
import os
import time
from multiprocessing import Process

p = Process(target=lambda:os.execlp('bash', 'bash', '-c', 'sleep 1.5'))
t0 = time.time()
p.start()
p.join(0.1)
print(time.time() - t0)

---

Python 3.5 - 3.8 take 1.5 sec to finish
Python 2.7 take 0.1 sec to finish
msg332982 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2019-01-04 14:48
I don't know what triggered the change, but I strongly suspect this is not a supported use of the multiprocessing module; Process is for worker processes (still running Python), and it has a lot of coordination machinery set up between parent and child (for use by, among other things,  join) that exec severs rather abruptly.

Launching unrelated child processes is what the subprocess module is for.
msg332983 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2019-01-04 15:20
Looks like the cause of the change was when os.pipe was changed to create non-inheritable pipes by default; if I monkey-patch multiprocessing.popen_fork.Popen._launch to use os.pipe2(0) instead of os.pipe() to get inheritable descriptors or just clear FD_CLOEXEC in the child with fcntl.fcntl(child_w, fcntl.F_SETFD, 0), the behavior returns to Python 2's behavior.

The problem is caused by the mismatch in lifetimes between the pipe fd and the child process itself; normally the pipe lives as long as the child process (it's never actually touched in the child process at all, so it just dies with the child), but when exec gets involved, the pipe is closed long before the child ends.

The code in Popen.wait that is commented with "This shouldn't block if wait() returned successfully" is probably the issue; wait() first waits on the parent side of the pipe fd, which returns immediately when the child execs and the pipe is closed. The code is assumes the poll on the process itself can be run in blocking (since the process should have ended already) but this assumption is wrong of course.

Possible solutions:

1. No code changes; document that exec in worker processes is unsupported (use subprocess, possibly with a preexec_fn, for this use case).

2. Precede the call to process_obj._bootstrap() in the child with fcntl.fcntl(child_w, fcntl.F_SETFD, 0) to clear the CLOEXEC flag on the child's descriptor, so the file descriptor remains open in the child post-exec. Using os.pipe2(0) instead of os.pipe() in _launch would also work and restore the precise 3.3 and earlier behavior, but it would introduce reintroduce race conditions with parent threads, so it's better to limit the scope to the child process alone, for the child's version of the fd alone.

3. Change multiprocessing.popen_fork.Popen.wait to use os.WNOHANG for all calls with a non-None timeout (not just timeout=0.0), rather than trusting multiprocessing.connection.wait's return value (which only says whether the pipe is closed, not whether the process is closed). Problem is, this would just change the behavior from waiting for the lifetime of the child no matter what to waiting until the exec and then returning immediately, even well before the timeout; it might also introduce race conditions if the fd registers as being closed before the process is fully exited. Point is, this approach would likely require a lot of subtle tweaks to make it work.

I'm in favor of either #1 or #2. #2 feels like a intentionally opening a resource leak on the surface, but I think it's actually fine, since we already signed up for a file descriptor that would live for the life of the process; the fact that it's exec-ed seems sort of irrelevant.
msg333004 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2019-01-04 22:02
I'm in favor of #1 *and* not documenting it either.  I don't think it's reasonable for the documentation to enumerate all the kinds of situations where executing arbitrary code in a child process might lead to dysfunction.

Realistically, if you want to spawn a subprocess, you should just use subprocess, not multiprocessing + exec().

In other words, I'd like to close this issue as "won't fix" if nobody objects.
msg341886 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2019-05-08 15:23
3.4 is now EOL, so the 3.4regression tag goes away too.
History
Date User Action Args
2022-04-11 14:59:09adminsetgithub: 79838
2019-05-08 15:23:09larrysetkeywords: - 3.4regression
nosy: + larry
messages: + msg341886

2019-01-04 22:06:41vstinnersetnosy: + vstinner
2019-01-04 22:02:06pitrousetmessages: + msg333004
2019-01-04 21:40:00terry.reedysetnosy: + pitrou, davin
stage: needs patch

versions: - Python 3.5, Python 3.6
2019-01-04 15:20:15josh.rsetkeywords: + 3.4regression

messages: + msg332983
2019-01-04 14:48:58josh.rsetnosy: + josh.r
messages: + msg332982
2019-01-04 11:52:47Huazuo Gaocreate