Issue 42110: race condition in ThreadChildWatcher (default) and MultiLoopChildWatcher

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/86276

classification

Title:	race condition in ThreadChildWatcher (default) and MultiLoopChildWatcher
Type:		Stage:
Components:	asyncio	Versions:	Python 3.10

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	asvetlov, sourcejedi, yselivanov
Priority:	normal	Keywords:

Created on 2020-10-21 18:44 by sourcejedi, last changed 2022-04-11 14:59 by admin.

Messages (3)
msg379229 - (view)	Author: Alan Jenkins (sourcejedi) *	Date: 2020-10-21 18:44
## Test program ## import asyncio import time import os import signal import sys # This bug happens with the default, ThreadedChildWatcher # It also happens with MultiLoopChildWatcher, # but not the other three watcher types. #asyncio.set_child_watcher(asyncio.MultiLoopChildWatcher()) # Patch os.kill to call sleep(1) first, # opening up the window for a race condition os_kill = os.kill def kill(p, n): time.sleep(1) os_kill(p, n) os.kill = kill async def main(): p = await asyncio.create_subprocess_exec(sys.executable, '-c', 'import sys; sys.exit(0)') p.send_signal(signal.SIGTERM) # cleanup await p.wait() asyncio.run(main()) ## Test output ## Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/alan-sysop/src/cpython/Lib/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/home/alan-sysop/src/cpython/Lib/asyncio/base_events.py", line 642, in run_until_complete return future.result() File "<stdin>", line 3, in main File "/home/alan-sysop/src/cpython/Lib/asyncio/subprocess.py", line 138, in send_signal self._transport.send_signal(signal) File "/home/alan-sysop/src/cpython/Lib/asyncio/base_subprocess.py", line 146, in send_signal self._proc.send_signal(signal) File "/home/alan-sysop/src/cpython/Lib/subprocess.py", line 2081, in send_signal os.kill(self.pid, sig) File "<stdin>", line 3, in kill ProcessLookupError: [Errno 3] No such process ## Tested versions ## * v3.10.0a1-121-gc60394c7fc * python39-3.9.0-1.fc32.x86_64 * python3-3.8.6-1.fc32.x86_64 ## Race condition ## main thread vs ThreadedChildWatcher._do_waitpid() thread p=create_subprocess_exec(...) waitpid(...) # wait for process exit <Process p exits> <waitpid() returns. Process p is reaped, and no longer exists> p.send_signal(9) ## Result ## A signal is sent to p.pid, after p.pid has been reaped by waitpid(). It might raise an error because p.pid no longer exists. In the worst case the signal will be sent successfully - because an unrelated process has started with the same PID. ## How easy is it to reproduce? ## It turns out the window for this race condition has been kept short, due to mitigations in the subprocess module. IIUC, the mitigation protects against incorrect parallel use of a subprocess object by multiple threads. def send_signal(self, sig): # bpo-38630: Polling reduces the risk of sending a signal to the # wrong process if the process completed, the Popen.returncode # attribute is still None, and the pid has been reassigned # (recycled) to a new different process. This race condition can # happens in two cases [...] self.poll() if self.returncode is not None: # Skip signalling a process that we know has already died. return os.kill(self.pid, sig) ## Possible workarounds ## * SafeChildWatcher and FastChildWatcher should not have this defect. However we use ThreadedChildWatcher and MultiLoopChildWatcher to support running event loops in different threads. * PidfdChildWatcher should not have this defect. It is only available on Linux, kernel version 5.3 or above. It would be possible to avoid the ThreadedChildWatcher race by using native code and pthread_cancel(), so that the corresponding waitpid() call is canceled before sending a signal. Except the current implementation of pthread_cancel() is also unsound, because of race conditions. * https://lwn.net/Articles/683118/ "This is why we can't have safe cancellation points" * https://sourceware.org/bugzilla/show_bug.cgi?id=12683 "Race conditions in pthread cancellation"
msg379296 - (view)	Author: Alan Jenkins (sourcejedi) *	Date: 2020-10-22 14:29
There's one way to fix this in MultiLoopChildWatcher (but not ThreadedChildWatcher). Make sure the waitpid() runs on the same thread that created the subprocess. Prototype: https://github.com/sourcejedi/cpython/commit/92f979bce4582e807facb1c274a962b3caf0d2eb The other approach would be to copy subprocess.wait(timeout) - keep waking up regularly and polling to see if the process has exited yet. I'm not convinced ThreadedChildWatcher is fixable. You can't call waitpid() in one thread, and let someone call kill() in a different thread. You could try avoiding calling waitpid() until someone does `await asyncio.subprocess.Process.wait()`. I think I didn't like it because - what happens if you cancel a wait() task? I think you want to cancel the waitpid() so you could safely kill() again... And we don't have any way to cancel the blocking waitpid() call, at least not from python, definitely not since PEP-475.
msg379297 - (view)	Author: Alan Jenkins (sourcejedi) *	Date: 2020-10-22 15:09
Put the other way, if you wanted to fix this bug in ThreadedChildWatcher, and go as far as allowing cancelling Process.wait(), followed by kill() / send_signal(), then I think you need - * siginterrupt(SIGCHLD, 1) * not to mind about any random C code that doesn't really handle being interrupted. Like printf(), ho hum. https://groups.google.com/g/comp.unix.programmer/c/QZmFw1VytYs/m/BSBXBHTI1REJ * add & use a new call like os.waitpid_interruptible(), which doesn't restart on EINTR, as a workaround for PEP-475. * set a "stop" flag for the watcher thread * use threading.pthread_kill() (available on most python platforms) to interrupt the watcher thread. Spoof SIGCHLD, this will avoid conflicts with a python handler for any other signal. * wait for the watcher thread to finish using Thread.join() * now you can safely find out whether the child process has been reaped, or whether it's safe to kill it.

History
Date	User	Action	Args
2022-04-11 14:59:37	admin	set	github: 86276
2020-10-22 15:09:14	sourcejedi	set	messages: + msg379297
2020-10-22 14:29:06	sourcejedi	set	messages: + msg379296
2020-10-21 18:44:25	sourcejedi	create