Author vstinner
Recipients ammar2, gregory.p.smith, vstinner
Date 2017-10-05.13:31:34
Message-id <>
> In a REPL on my Fedora 26, os.waitpid(0, 0) raises "ChildProcessError: [Errno 10] No child processes". I'm not sure that waitpid() is the cause of the hang, (...)

Oh wait, now I understood the full picture.


* 2 new tests were added to test_subprocess and these tests call waitpid(0, 0) by mistake
* In general, waitpid(0, 0) returns immediately and the code handles it properly
* Sometimes, a previous test leaks a child process and so waitpid(0, 0) takes a few seconds or can even block


Running tests leak randomly child processes. See for example my recent test_socketserver fix in Python 3.6: commit fdcf3e9629201ef725af629d99e02215a2d657c8. This commit is *not* part of the recent Python 3.6.3 release, tested by my colleague.

This fix is for the bug bpo-31593: test_socketserver leaked *randomly* child processes. Depending on the system load, waitpid() was called or not called to read the child process exit status.

If you run "./python -m test test_socketserver test_subprocess" and test_socketserver() doesn't call waitpid() on a single process, it's possible that test_subprocess hangs on waitpid(0, 0): waiting on the process spawned by test_socketserver.

test_socketserver is just one example, I fixed many other bugs in the Python test suite. Running Python tests in subprocesses using "./python -m test -jN ...", at least -j1, reduces the effect of the bug.

Short script to explain the bug:
import subprocess, sys, os, time

args = [sys.executable, '-c', 'import time; time.sleep(2)']
proc = subprocess.Popen(args)
t0 = time.monotonic()
print("waitpid(0, 0)...")
pid, status = os.waitpid(0, 0)
dt = time.monotonic() - t0
print("%.1f sec later: waitpid(0, 0) -> %s" % (dt, (pid, status)))

This script takes 3 seconds, since a test leaked a child process which takes time to complete.
