This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients ammar2, gregory.p.smith, vstinner
Date 2017-10-05.13:31:34
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1507210294.25.0.213398074469.issue31178@psf.upfronthosting.co.za>
In-reply-to
Content
> In a REPL on my Fedora 26, os.waitpid(0, 0) raises "ChildProcessError: [Errno 10] No child processes". I'm not sure that waitpid() is the cause of the hang, (...)

Oh wait, now I understood the full picture.

Summary:

* 2 new tests were added to test_subprocess and these tests call waitpid(0, 0) by mistake
* In general, waitpid(0, 0) returns immediately and the code handles it properly
* Sometimes, a previous test leaks a child process and so waitpid(0, 0) takes a few seconds or can even block

--

Running tests leak randomly child processes. See for example my recent test_socketserver fix in Python 3.6: commit fdcf3e9629201ef725af629d99e02215a2d657c8. This commit is *not* part of the recent Python 3.6.3 release, tested by my colleague.

This fix is for the bug bpo-31593: test_socketserver leaked *randomly* child processes. Depending on the system load, waitpid() was called or not called to read the child process exit status.

If you run "./python -m test test_socketserver test_subprocess" and test_socketserver() doesn't call waitpid() on a single process, it's possible that test_subprocess hangs on waitpid(0, 0): waiting on the process spawned by test_socketserver.

test_socketserver is just one example, I fixed many other bugs in the Python test suite. Running Python tests in subprocesses using "./python -m test -jN ...", at least -j1, reduces the effect of the bug.

Short script to explain the bug:
---
import subprocess, sys, os, time

args = [sys.executable, '-c', 'import time; time.sleep(2)']
proc = subprocess.Popen(args)
t0 = time.monotonic()
print("waitpid(0, 0)...")
pid, status = os.waitpid(0, 0)
dt = time.monotonic() - t0
print("%.1f sec later: waitpid(0, 0) -> %s" % (dt, (pid, status)))
proc.wait()
---

This script takes 3 seconds, since a test leaked a child process which takes time to complete.
History
Date User Action Args
2017-10-05 13:31:34vstinnersetrecipients: + vstinner, gregory.p.smith, ammar2
2017-10-05 13:31:34vstinnersetmessageid: <1507210294.25.0.213398074469.issue31178@psf.upfronthosting.co.za>
2017-10-05 13:31:34vstinnerlinkissue31178 messages
2017-10-05 13:31:34vstinnercreate