This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author oconnor663
Recipients gregory.p.smith, oconnor663, vstinner
Date 2020-12-04.16:11:52
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1607098313.18.0.925016762901.issue42558@roundup.psfhosted.org>
In-reply-to
Content
Right, the example above is contrived to demonstrate the race and the crash.

In real life code, the good reason I know of to write code like this is to use os.waidid(WNOWAIT) to solve the wait/kill race properly. This is what Duct has been doing, and Nathaniel Smith also described this strategy in https://bugs.python.org/issue38630. The idea is that a waiting thread follows these steps:

1. waitid() with WNOWAIT set, without locking the child
2. after waitid returns, indicating the child has exited, lock the child
3. waitid() without WNOWAIT, or just waitpid(), to reap the zombie child
4. stash the exit status and unlock

Meanwhile a killing thread follows these steps:

1. lock the child
2. check the stashed exit status, and unlock and exit early if it's set
3. otherwise, signal the child and unlock

This strategy solves the race. The killing thread is free to signal while the waiting thread is blocked in step 1. If the killing thread happens to race in between when waitid() returns and when the waiting thread acquires the child lock, the child is a zombie and the kill signal has no effect. This is safe even if other threads (or e.g. the OOM killer) can randomly kill our child: *they* might have to worry about PID reuse, but their signals can never cause *us* to kill an unrelated process. What breaks this scheme is if some thread calls waitpid() and reaps the child outside of the lock, but normally that'd be a pretty unreasonable thing to do, especially since it can only be done by other threads in the parent process. (There's also some complexity around whether multiple threads are allowed to call waitid(WNOWAIT) on the same PID at the same time. I've just had one thread call it, and had other blocking waiters block on a second lock, but maybe that's overcautious.)

So anyway, if you use the strategy above -- precisely because you care about the PID reuse race and want to solve it properly -- and you also happen to use Popen.kill(), then changing Popen.send_signal to reap the child can break you.

I don't think this is a bug per se, but it's a behavior change, which matters to a small set of (abnormally) correct programs. But then again, if Duct is the only project that hits this in practice, maybe I'm making a mountain out of a molehill :)
History
Date User Action Args
2020-12-04 16:11:53oconnor663setrecipients: + oconnor663, gregory.p.smith, vstinner
2020-12-04 16:11:53oconnor663setmessageid: <1607098313.18.0.925016762901.issue42558@roundup.psfhosted.org>
2020-12-04 16:11:52oconnor663linkissue42558 messages
2020-12-04 16:11:52oconnor663create