Author bpcreech
Recipients DragonSA, Pankrat, Rhamphoryncus, benjamin.peterson, bpcreech, gregory.p.smith, jyasskin
Date 2008-10-24.22:49:15
SpamBayes Score 8.03246e-14
Marked as misclassified No
Message-id <1224888558.39.0.466034129643.issue2320@psf.upfronthosting.co.za>
In-reply-to
Content
You can trigger this bug without use of stdin=subprocess.PIPE, or for
that matter any subprocess.PIPE's, although that makes it worse.  (So
the name for this issue may be overly specific.)

I have seen this happening intermittently in our parallel job manager,
and verified the lockup via gdb.

I've attached a regression test that should reliably trigger it without
use of any pipes for stdin, stderr, or stdout.  By kludging
subprocess.py with an extra "sleep", we can reliably trigger the race
condition.

The test launches two children, each from a thread.  Since there are no
calls to Popen.wait(), the main process should exit quickly, but it
never does.

Indeed, setting close_fds=True will correct the problem, although there
may be rare instances wherein the user wants the child to inherit some
file descriptors (for instance, this is how you usually use UNIX domain
sockets).

The problem is, as hinted at by Adam Olsen, as follows.

Say we have two threads, A and B, concurrently executing
subprocess.Popen.__init__.

In subprocess.Popen._execute_child, thread B is sitting between the
following lines:
errpipe_read, errpipe_write = os.pipe()
self._set_cloexec_flag(errpipe_write)

... when thread A calls fork().

Thus, thread A's child inherits both pipes, and never closes either one
when it calls execv.

Until thread A's child exits, that errpipe_write handle created in
thread B will remain open.

So, thread B will not return from the "os.read(errpipe_read...)" line,
and thus Popen.__init__ until thread A's child exits.  That is surely a bug.

On the other hand, it is hard to fix.  Ludwig Hähne suggested wrapping
the pipe creation and cloexec in a mutex.  As indicated by Gregory P.
Smith, that will not work reliably, as fork() could be called in one
thread while another thread is between those operations.  The mutex
would have to block the fork from happening during the atomic
create/cloexec operations.

Additionally, a parent could be creating a stdin output pipe end and
stdout/stderr input pipe ends for a particular subprocess.  Due to the
pooled nature of POSIX file descriptors, if another subprocess is
launched in a separate thread, it will inherit those pipe ends!  That
could cause unforeseen artifacts such as Popen.stdout hanging after
termination of the process that stdout was intended (because ANOTHER
process also inherited those pipe ends, and is still running).

Anyways, like Mr. Olson said, this is a mess.  close_fds=True is
probably the best option.  Perhaps it would be best to add an argument
allowing for a list of FD's to NOT close, in case the user is
intentionally trying to pass on additional file descriptors (eg UNIX
domain sockets).

I would vote for a change to set close_fds=True by default, and adding
an explicit list of inherited_fds.  This is based on the logic that
setting close_fds=False is unsafe and should not be default.

If that is too extreme, foolish users like myself at least need a huge
warning in the documentation, and an inherited_fds parameter would still
be useful.
History
Date User Action Args
2008-10-24 22:49:18bpcreechsetrecipients: + bpcreech, gregory.p.smith, Rhamphoryncus, jyasskin, benjamin.peterson, Pankrat, DragonSA
2008-10-24 22:49:18bpcreechsetmessageid: <1224888558.39.0.466034129643.issue2320@psf.upfronthosting.co.za>
2008-10-24 22:49:17bpcreechlinkissue2320 messages
2008-10-24 22:49:16bpcreechcreate