This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author oconnor663
Recipients oconnor663
Date 2020-12-03.17:05:29
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1607015129.26.0.150065141179.issue42558@roundup.psfhosted.org>
In-reply-to
Content
In Python 3.9, Popen.send_signal() was changed to call Popen.poll() internally before signaling. (Tracking bug: https://bugs.python.org/issue38630.) This is a best-effort check for the famous kill/wait race condition. However, because this can now reap an already-exited child process as a side effect, it can cause previously working programs to crash. Here's a simple example:

```
import os
import subprocess
import time

child = subprocess.Popen(["true"])
time.sleep(1)
child.kill()
os.waitpid(child.pid, 0)
```

The program above exits cleanly in Python 3.8 but crashes with ChildProcessError in Python 3.9. It's a race against child process exit, so in practice (without the sleep) it's a heisenbug.

There's a deeper race here that's harder to demonstrate in an example, but parallel to the original wait/kill issue: If the child PID happens to be reused by another parent thread in this same process, the call to waitpid might *succeed* and reap the unrelated child process. That would export the crash to that thread, and possibly create more kill/wait races.

In short, the question of when exactly a child process is reaped is important for correct signaling on Unix, and changing that behavior can break programs in confusing ways. This change affected the Duct library, and I might not've caught it if not for a lucky failing doctest: https://github.com/oconnor663/duct.py/commit/5dfae70cc9481051c5e53da0c48d9efa8ff71507

I haven't searched for more instances of this bug in the wild, but one way to find them would be to look for code that calls both os.waitpid/waitid and also Popen.send_signal/kill/terminate. Duct found itself in this position because it was using waitid(WNOWAIT) on Unix only, to solve this same race condition, and also using Popen.kill on both Unix and Windows.
History
Date User Action Args
2020-12-03 17:05:29oconnor663setrecipients: + oconnor663
2020-12-03 17:05:29oconnor663setmessageid: <1607015129.26.0.150065141179.issue42558@roundup.psfhosted.org>
2020-12-03 17:05:29oconnor663linkissue42558 messages
2020-12-03 17:05:29oconnor663create