classification
Title: subprocess.Popen the os.close calls in _execute_child can raise an EBADF exception
Type: behavior Stage: resolved
Components: Versions: Python 3.4, Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: asvetlov, bennoleslie, gregory.p.smith, mark.dickinson, python-dev, rosslagerwall
Priority: high Keywords:

Created on 2012-10-05 06:59 by gregory.p.smith, last changed 2012-11-11 17:42 by gregory.p.smith. This issue is now closed.

Messages (9)
msg172053 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-10-05 06:59
Ben Leslie writes this on python-dev:

Hi all,

I have a Python program where I have many threads each calling Popen, and I was hitting some trouble.

I've been seeing this on 3.2.3, however I believe the same issue is still potentially a problem on head.

The error manifests itself when a call to os.close(errpipe_read) fails with EBADF (http://hg.python.org/releasing/3.2.3/file/86d1421a552c/Lib/subprocess.py#l1314)

I believe the root cause of this problem is due to a double close() on a different file descriptor (which is then reused as errpipe_read).

The file descriptors: p2cwrite, c2pread and errread are all closed at the end of the _execute_child method:

http://hg.python.org/releasing/3.2.3/file/86d1421a552c/Lib/subprocess.py#l1351

However, these filedescriptors are wrapped up into file objects during __init__ (see: http://hg.python.org/releasing/3.2.3/file/86d1421a552c/Lib/subprocess.py#l725)

As far as I can tell at the point where the garbage collector kicks in  Popen.{stdin,stdout,stderr} all go out of scope, and will be deallocated, and the underlying filedescriptor closed.

However because the filedescriptor is already closed, and by this time is actually reused, this deallocation closes what ends up being an incorrect file-descriptor.

Since closing a file object where the underlying fd is already closed is silenced (http://hg.python.org/releasing/3.2.3/file/86d1421a552c/Modules/_io/iobase.c#l235) this would not normally be very apparent.

This race between a new filedescriptor being allocated and the garbage collector deallocating the file descriptors certainly hits when using a few threads, but I guess depending on the exact behaviour of the garbage collector it could potentially also occur in a single threaded case as well.

I think a fix would be to remove the explicit close of these file descriptors at the end of _execute_child, and let the garbage collector close them. Of course that may leak file descriptors, if the GC doesn't kick in for a while, so the other option would be to close the file object, rather than just the file descriptor.

Hopefully someone more intimately familiar with the module can point me in the right direction to verify this, and provide a fix.

Thanks,

Benno
msg172056 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-10-05 07:17
What you've described makes sense.

The file descriptors are indeed conditionally wrapped in file objects using io.open which could lead to a double close in the error case in a threaded application.  yuck.

1) The code needs to check if the fd was wrapped before calling os.close() on it. It could do this by checking the sys.stdin, sys.stdout and sys.stderr attributes respectively for each fd.

2) another option that is a little more clear code wise as it doesn't try to use an implied connection between the stdin/stdout/strerr attributes would be to _always_ wrap the fd's that this can happen to in an io object (regardless of if they've been assigned to the stdin/stdout/stderr attributes) and pass those to _execute_child.  Their .fileno() method can be used for the value to pass to _posixsubprocess.fork_exec().  And the os.close on those would turn into a close method call which won't allow double close EBADF errors.
msg172076 - (view) Author: Benno Leslie (bennoleslie) * Date: 2012-10-05 12:28
Regarding #2 my understanding is that the FDs are already always wrapped.

E.g: at line http://hg.python.org/cpython/file/b9ac3c44a4eb/Lib/subprocess.py#l798 it shows these always being wrapped (assuming the file descriptor is not -1).

For my test case on 3.2.3, replacing the os.close loop with:

                if p2cwrite != -1:
                    self.stdin.close()
                if c2pread != -1:
                    self.stdout.close()
                if errread != -1:
                    self.stderr.close()

This fixed all my stability problems and races, and can't (as far as I can tell) cause any other problems; however this is a very subtle module, so I'm eager to understand if this would cause any undesirable side-effects.
msg175331 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-11-11 08:53
To remove the double close fd-reuse-window race condition you describe in 3.2.3 and later I don't think you even need to add that code snippet.  Just get rid of the for loop calling os.close on those three fd's all together.  self.stdin, self.stdout and self.stderr are already closed in the error case here:

http://hg.python.org/releasing/3.2.3/file/86d1421a552c/Lib/subprocess.py#l746
msg175335 - (view) Author: Roundup Robot (python-dev) Date: 2012-11-11 09:41
New changeset 6ebfdfe74c04 by Gregory P. Smith in branch '3.2':
Fixes issue #16140: The subprocess module no longer double closes its
http://hg.python.org/cpython/rev/6ebfdfe74c04

New changeset 9f8b0444c8a6 by Gregory P. Smith in branch '3.3':
Fixes issue #16140: The subprocess module no longer double closes its
http://hg.python.org/cpython/rev/9f8b0444c8a6

New changeset d51df72dadb7 by Gregory P. Smith in branch 'default':
Fixes issue #16140: The subprocess module no longer double closes its
http://hg.python.org/cpython/rev/d51df72dadb7
msg175340 - (view) Author: Roundup Robot (python-dev) Date: 2012-11-11 10:02
New changeset 2bdd984a55ac by Gregory P. Smith in branch '2.7':
Fix issue #16140 bug that the fix to issue #16327 added - don't double
http://hg.python.org/cpython/rev/2bdd984a55ac
msg175346 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-11-11 11:24
test_gc and test_csv seem to be failing on some of the buildbots as a result of these checkins.  It looks as though the new test_subprocess test creates some uncollectable garbage.
msg175347 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-11-11 11:28
Adding 'del p' at the end of the test method fixes this for me.
msg175376 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-11-11 17:42
That test is gross... it creates a nasty cycle of an instance back to itself via a stubbed out nested function's enclosing scope.

I'm sanitizing it now to simplify the code and not have any cycles.
History
Date User Action Args
2012-11-11 17:42:38gregory.p.smithsetmessages: + msg175376
2012-11-11 11:28:57mark.dickinsonsetmessages: + msg175347
2012-11-11 11:24:06mark.dickinsonsetnosy: + mark.dickinson
messages: + msg175346
2012-11-11 10:03:40gregory.p.smithsetstage: test needed -> resolved
2012-11-11 10:03:28gregory.p.smithsetstatus: open -> closed
resolution: fixed
versions: + Python 2.7
2012-11-11 10:02:10python-devsetmessages: + msg175340
2012-11-11 09:41:58python-devsetnosy: + python-dev
messages: + msg175335
2012-11-11 08:53:43gregory.p.smithsetmessages: + msg175331
2012-10-05 13:20:21asvetlovsetnosy: + asvetlov
2012-10-05 12:28:15bennolesliesetmessages: + msg172076
2012-10-05 12:17:30bennolesliesetnosy: + bennoleslie
2012-10-05 07:17:19gregory.p.smithsetmessages: + msg172056
2012-10-05 07:00:25rosslagerwallsetnosy: + rosslagerwall
2012-10-05 06:59:21gregory.p.smithcreate