classification
Title: subprocess.Popen hangs at communicate() when child exits
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: amy20_z, belopolsky, dmalcolm, elmysnail, ggenellina, lemur, rosslagerwall, terry.reedy
Priority: normal Keywords:

Created on 2008-10-27 23:57 by amy20_z, last changed 2011-03-09 04:09 by terry.reedy. This issue is now closed.

Files
File name Uploaded Description Edit
subprocess_test03.py amy20_z, 2008-10-27 23:57 test program to call a shell script
Messages (9)
msg75270 - (view) Author: Amy Zhu (amy20_z) Date: 2008-10-27 23:57
I have a simple program to call a shell command "service cpboot start"
to start Check Point firewall on RHEL5.1.

=================
#!/usr/bin/env python
# vim: softtabstop=4 shiftwidth=4 expandtab

import os
from subprocess import *

p = Popen('service cpboot stop',shell=True, stdout=PIPE)
output = p.communicate()
print 'STDERR: %s' % output[0]
print 'STDOUT: %s' % output[1]

===============

Python process pid 13343 spawned child 13375 to run "service cpboot
start".  However, after child process 13375 finished and sent SIGCHLD to
the python script, the parent hangs in Popen function communicate() at
line 1041 and child process 13375 became a defunct process.


Traceback (most recent call last):
  File "./subprocess_test03.py", line 7, in ?
    output = p.communicate()
  File "/usr/lib/python2.4/subprocess.py", line 1041, in communicate
    rlist, wlist, xlist = select.select(read_set, write_set, [])
KeyboardInterrupt


Here is part of the strace:

Process 13375 detached
[pid 19195] close(878)                  = -1 EBADF (Bad file descriptor)
[pid 19195] close(879)                  = -1 EBADF (Bad file descriptor)
[pid 19195] close(880)                  = -1 EBADF (Bad file descriptor)
[pid 13343] <... select resumed> )      = ? ERESTARTNOHAND (To be restarted)
[pid 19195] close(881 <unfinished ...>
[pid 13343] --- SIGCHLD (Child exited) @ 0 (0) ---
[pid 19195] <... close resumed> )       = -1 EBADF (Bad file descriptor)
[pid 13343] select(7, [4 6], [], [], NULL <unfinished ...>


It seems like the select system call got interrupted and error code was
"ERESTARTNOHAND" was returned. The PIPEs won't be able to terminate
since child process has finished and exited and EOF won't be read from
the PIPEs.

If executing the shell command directly from shell command line, there's
no problem at all.It seems like there might be some race condition
somewhere in the python library. 

Any idea what may cause the problem? Many thanks in advance.
msg75400 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-10-31 00:48
I'm unable to reproduce your problem with Python 2.4.4 nor Python 
2.5.1. I'm using Ubuntu and the program "service" doesn't exist. So I 
used the commands "pwd", "sh -c pwd", "sh -c 'echo $$'". All commands 
terminate correctly.

Your problem is specific to the command "service cpboot start"? You 
can reproduce the problem with another command?

Can you attach the full trace? I mean something like "strace -f -o 
trace python test.py".
msg75418 - (view) Author: Amy Zhu (amy20_z) Date: 2008-10-31 15:13
Yes, I can only replicate this issue with command to start firewall.
Stopping firewall or other service commands don't have the problem.
Check Point firewall is a kernel application. 

Somehow select system call is interrupted and can't terminate the PIPE.
I don't know what interrupt it is, though.

I'll collect the full log and attach it here.
msg75419 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-10-31 15:35
The bug should be fixed in Python 2.5 since it uses:

    while read_set or write_set:
        try:
            rlist, wlist, xlist = select.select(read_set, write_set, 
[])
        except select.error, e:
            if e[0] == errno.EINTR:
                continue
            else:
                raise

EINTR is supported in subprocess for select(), read(), write() and 
waitpid()

Can't you migrate to Python 2.5 or 2.6? You can try to copy 
subprocess.py from Python 2.5 to Python 2.4.
msg77582 - (view) Author: Louis-Dominique Dubeau (lemur) Date: 2008-12-11 02:05
I'm running python 2.5.2 on Ubuntu 8.10.

I believe I've also encountered the problem reported here.  The scenario
in my case was the following:

1. Python process A uses subprocess.Popen to create another python
process (B).  Process B is created with stdout=PIPE and stderr=PIPE.
Process A communicates with process B using communicate().

2. Python process B, starts a ssh process (process C) which is invoked
to open a new control socket in master mode.  Process C is started
without pipes so it gets its std{in,out,err} from process B.  Process C
is going to run for a long time.  That is, it will run until a command
is sent to the control socket to close the ssh connexion.

3. Process B does not wait for process C to end, so it ends right away.

4. Python process A remains stuck in communicate() until process C (ssh)
dies even though process B has ended already.

Analysis:

The reason for this is that process C (ssh) gets its stdout and stderr
from process B.  But process C keeps both stdout and stderr opened until
it is terminated.  So process A does not get an EOF on the pipes it
opened for communicating with process B until process C ends.

The set of conditions which will trigger the effect is not outlandish.
However, it is specific enough that testing by executing "pwd" or "ls
-l", or "echo blah" or any other simple command won't trigger it.

In my case, I fixed the problem by changing the code of process B to
invoke process C with stdout and stderr set to PIPE and close those
pipes as soon as process B is satisfied that process C is started
properly.  In this way, process A does not block.

(FYI, process A in my case is the python testing tool nosetests.  I use
nosetests to test a backup script written in python and that script
invokes ssh.)

It seems that in general subprocess creators might have two needs:

1. Create a subprocess and communicate with it until there is no more
data to be passed to its stdin or data to be read from its std{out,err}.

2. Create a subprocess and communicate with it *only* until *this*
process dies.  After it is dead, neither stdout nor stderr are of any
interest.

Currently, need 1 is addressed by communicate() but not need 2.  In my
scenario above, I was able to work around the problem by modifying
process B but there are going to be cases where process B is not
modifiable (or at least not easily modifiable).  In those cases, process
A has to be able to handle it.
msg77891 - (view) Author: Gabriel Genellina (ggenellina) Date: 2008-12-15 23:24
I think communicate() works as documented now: reads stdout/stderr 
until EOF, *and* waits for subprocess to terminate.

You're asking for a different method, or perhaps an optional 
parameter "return_when_died" to communicate, so it returns as soon as 
the child process terminates (I don't like the parameter name...)

I think this is more a feature request than a crash, targeted to 
2.7/3.1 - 2.4 only gets security fixes anyway.
msg126817 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2011-01-22 04:42
Yes I think subprocess is working correctly.

Since this feature request is 2 years old now without any interest, I think it should be closed. If the functionality is needed, it can always be programmed by the user when needed.
msg128233 - (view) Author: Arty (elmysnail) Date: 2011-02-09 19:33
You can try subprocess.call() with close_fds=True.
It helped me, at least.
msg130419 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-03-09 04:09
Closing as suggested by Ross
History
Date User Action Args
2011-03-09 04:09:01terry.reedysetstatus: open -> closed
versions: + Python 3.3, - Python 3.1, Python 2.7
nosy: + terry.reedy

messages: + msg130419

resolution: rejected
2011-02-09 19:33:43elmysnailsetnosy: + elmysnail
messages: + msg128233
2011-01-22 04:42:05rosslagerwallsetnosy: + rosslagerwall
messages: + msg126817
2009-10-16 21:11:34dmalcolmsetnosy: + dmalcolm
2009-03-24 23:33:25hayposetnosy: - haypo
2008-12-15 23:24:01ggenellinasettype: crash -> enhancement
messages: + msg77891
nosy: + ggenellina
versions: + Python 3.1, Python 2.7, - Python 2.4
2008-12-11 02:05:20lemursetnosy: + lemur
messages: + msg77582
2008-10-31 15:35:29hayposetmessages: + msg75419
2008-10-31 15:13:15amy20_zsetmessages: + msg75418
2008-10-31 00:48:50hayposetnosy: + haypo
messages: + msg75400
2008-10-29 02:45:39belopolskysetnosy: + belopolsky
2008-10-27 23:57:46amy20_zcreate