Author dsagal
Recipients
Date 2007-06-05.22:19:54
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Python's subprocess module has a race condition: Popen() constructor has a call to global "_cleanup()" function on whenever a Popen object gets created, and that call causes a check for all pending Popen objects whether their subprocess has exited - i.e. the poll() method is called for every active Popen object.

Now, if I create Popen object "foo" in one thread, and then a.wait(), and meanwhile I create another Popen object "bar" in another thread, then a.poll() might get called by _clean() right at the time when my first thread is running a.wait(). But those are not synchronized - each calls os.waitpid() if returncode is None, but the section which checks returncode and calls os.waitpid() is not protected as a critical section should be.

The following code illustrates the problem, and is known to break reasonably consistenly with Python2.4. Changes to subprocess in Python2.5 seems to address a somewhat related problem, but not this one.

import sys, os, threading, subprocess, time

class X(threading.Thread):
  def __init__(self, *args, **kwargs):
    super(X, self).__init__(*args, **kwargs)
    self.start()

def tt():
  s = subprocess.Popen(("/bin/ls", "-a", "/tmp"), stdout=subprocess.PIPE,
      universal_newlines=True)
  # time.sleep(1)
  s.communicate()[0]

for i in xrange(1000):
  X(target = tt)

This typically gives a few dozen errors like these:
Exception in thread Thread-795:
Traceback (most recent call last):
  File "/usr/lib/python2.4/threading.py", line 442, in __bootstrap
    self.run()
  File "/usr/lib/python2.4/threading.py", line 422, in run
    self.__target(*self.__args, **self.__kwargs)
  File "z.py", line 21, in tt
    s.communicate()[0]
  File "/usr/lib/python2.4/subprocess.py", line 1083, in communicate
    self.wait()
  File "/usr/lib/python2.4/subprocess.py", line 1007, in wait
    pid, sts = os.waitpid(self.pid, 0)
OSError: [Errno 10] No child processes

Note that uncommenting time.sleep(1) fixes the problem. So does wrapping subprocess.poll() and wait() with a lock. So does removing the call to global _cleanup() in Popen constructor.
History
Date User Action Args
2007-08-23 14:54:35adminlinkissue1731717 messages
2007-08-23 14:54:35admincreate