classification
Title: sporadic failures of test_close_fds and test_pass_fds in test_subprocess
Type: behavior Stage: test needed
Components: Versions: Python 3.3, Python 3.4
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: donmez, ezio.melotti, gregory.p.smith, matejcik, neologix, pitrou, vstinner
Priority: normal Keywords:

Created on 2011-07-01 12:56 by vstinner, last changed 2013-03-29 06:52 by neologix. This issue is now closed.

Messages (17)
msg139575 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-01 12:56
======================================================================
FAIL: test_close_fds (test.test_subprocess.POSIXProcessTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_subprocess.py", line 1302, in test_close_fds
    "Some fds were left open")
AssertionError: {9} is not false : Some fds were left open

http://www.python.org/dev/buildbot/all/builders/x86%20Tiger%203.x/builds/2825/steps/test/logs/stdio

test_pass_fds() is already skipped on Mac OS X Tiger because of an OS bug: see issue #12230.
msg178889 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-01-03 01:47
Sporadic failure, I don't know if it was reproduced recently, and the issue has no activity since more than 6 months => I'm closing the issue.
msg183756 - (view) Author: Ismail Donmez (donmez) * Date: 2013-03-08 19:29
We are now hitting this on openSUSE 12.3, here is the relevant log (ignore the timestamps) :

[ 1041s] ======================================================================
[ 1041s] FAIL: test_close_fds (test.test_subprocess.POSIXProcessTestCase)
[ 1041s] ----------------------------------------------------------------------
[ 1041s] Traceback (most recent call last):
[ 1041s]   File "/home/abuild/rpmbuild/BUILD/Python-3.Lib/test/test_subprocess.py", line 1533, in test_close_fds
[ 1041s]     "Some fds were left open")
[ 1041s] AssertionError: {3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13} is not false : Some fds were left open
[ 1041s] 
[ 1041s] ======================================================================
[ 1041s] FAIL: test_pass_fds (test.test_subprocess.POSIXProcessTestCase)
[ 1041s] ----------------------------------------------------------------------
[ 1041s] Traceback (most recent call last):
[ 1041s]   File "/home/abuild/rpmbuild/BUILD/Python-3.Lib/test/support.py", line 457, in wrapper
[ 1041s]     return func(*args, **kw)
[ 1041s]   File "/home/abuild/rpmbuild/BUILD/Python-3.Lib/test/test_subprocess.py", line 1576, in test_pass_fds
[ 1041s]     "fd to be closed passed")
[ 1041s] AssertionError: {4, 5, 6, 7, 8, 9, 10, 11, 12} is not false : fd to be closed passed

This is Python 3.3.0 and Linux 3.7 on 64bit.
msg184611 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-03-19 07:14
Ismail, does it always fail or is it a sporadic failure?
msg184625 - (view) Author: Ismail Donmez (donmez) * Date: 2013-03-19 10:05
I can reproduce the error locally. But I see something in the log might be related to this (possibly not but anyway):

[352/368/1] test_multiprocessing
test_multiprocessing skipped -- This platform lacks a functioning sem_open implementation, therefore, the required synchronization primitives needed will not function, see issue 3770.

This is weird because pyconfig.h has HAVE_SEM_OPEN set to 1.
msg184735 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2013-03-20 04:24
fyi - sem_open is unrelated, nothing in subprocess needs that.
msg184741 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2013-03-20 06:12
Some way for me to reproduce this is needed.
msg184989 - (view) Author: jan matejek (matejcik) * Date: 2013-03-22 17:27
Ismail, the sem_open thing is because in autobuild you don't get /proc or somesuch - it's a problem of our buildsystem, otherwise the test passes.

I can reproduce the failure in autobuild, but not separately yet. Will keep you posted, signs point to yet another misfeature of our buildsystem, but maybe it's something with test ordering...
msg184991 - (view) Author: Ismail Donmez (donmez) * Date: 2013-03-22 17:37
Jan, thanks for the excellent explanation. We can close this bug I guess then?
msg184994 - (view) Author: jan matejek (matejcik) * Date: 2013-03-22 17:50
Not just yet, please. I can now reproduce the test_subprocess failures outside autobuild too. Interestingly, it's the reverse here: test_subprocess passes when /proc is not present, but fails when it is.

I'm pretty sure that we hijacked a completely different bug with a similar symptom :e) but whatever. I'll just go ahead and debug this.
msg184999 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2013-03-22 19:29
fyi - subprocess uses /proc to get a list of open fd's in the child process before exec when possible.  search for FD_DIR in http://hg.python.org/cpython/file/d674bbbed333/Modules/_posixsubprocess.c
msg185386 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-03-27 22:16
Hummm...
For those experiencing failures when /proc is mounted: do you have a grsecurity-patched kernel?
If RBAC is enabled, /proc/self/fd is empty, so this approach won't work...
msg185455 - (view) Author: jan matejek (matejcik) * Date: 2013-03-28 16:28
i don't think that's it, look at the result output in comment #183756: the list of fds is correct, except that they aren't closed as they should be.

i can't reproduce the result when running code by hand, so it's either a race or an issue with test setup.

so far i figured out that this goes away when patch from issue #16962 is reverted
msg185457 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-03-28 16:52
> i don't think that's it, look at the result output in comment #183756: the list of fds is correct, except that they aren't closed as they should be.

Of course the list is correct: the message is printed by the parent
process, which opened the FDs ;-)

> i can't reproduce the result when running code by hand, so it's either a race or an issue with test setup.

If it's random, yes. One way to debug this would be to run the test
with "strace -f".

> so far i figured out that this goes away when patch from issue #16962 is reverted

Note that the patch doesn't retry getdents64() on EINTR, and retries
close() on EINTR while it's not advised, see this post by Linus:
http://article.gmane.org/gmane.linux.kernel/330619
msg185460 - (view) Author: jan matejek (matejcik) * Date: 2013-03-28 18:53
so apparently the problem was that we're using the borked original patch for issue #16962, instead of the fixed version that's actually in repos.
(The d_type field was missing, so every filename had an extra character prepended to it.
AFAICT, all mysterious circumstances that I reported would trace back to workplace mess, like running the test in a terminal with a different build than i thought...)

sorry for taking your time, folks, the bug can be closed now.
msg185462 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2013-03-28 19:22
phew, yay.  because i didn't see anything obviously wrong with the #16962 commit.

also, regarding checking for EINTR on some of the close() calls.  It isn't a big deal.  This _posixsubprocess.c code that contains those checks is always executing single-threaded post-fork so there is no chance of a race condition even though the EINTR check and retry are pointless as Linus mentions.  They will never cause errant behavior but I'll "fix" them in a future commit for consistency.
msg185492 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-03-29 06:52
Greg, the original issue was about an OS X failure, and it's still a
problem AFAICT.
However it's definitely not a problem with your patch, but an OS X
kernel bug (we've had another similar issue some time ago I think), so
we might as well skip the offending tests on this OS X version.

As for close() and EINTR, if you plan to do the change, it would
probably be interesting to check if there are other such patterns in
the stdlib.
History
Date User Action Args
2013-03-29 06:52:41neologixsetmessages: + msg185492
2013-03-28 19:22:46gregory.p.smithsetstatus: open -> closed
versions: + Python 3.4
type: behavior
messages: + msg185462

resolution: not a bug
2013-03-28 18:53:12matejciksetmessages: + msg185460
2013-03-28 16:52:18neologixsetmessages: + msg185457
2013-03-28 16:28:18matejciksetmessages: + msg185455
2013-03-27 22:16:38neologixsetnosy: + neologix
messages: + msg185386
2013-03-22 19:29:20gregory.p.smithsetmessages: + msg184999
2013-03-22 17:50:42matejciksetmessages: + msg184994
2013-03-22 17:37:17donmezsetmessages: + msg184991
2013-03-22 17:27:07matejciksetnosy: + matejcik
messages: + msg184989
2013-03-20 06:12:19gregory.p.smithsetmessages: + msg184741
stage: test needed
2013-03-20 04:24:44gregory.p.smithsetmessages: + msg184735
2013-03-19 10:05:03donmezsetmessages: + msg184625
2013-03-19 08:09:24pitrousetnosy: + gregory.p.smith
2013-03-19 07:14:43ezio.melottisetnosy: + ezio.melotti
messages: + msg184611
2013-03-09 04:21:59ned.deilysettitle: sporadic failures of test_close_fds() and test_pass_fds in test_subprocess -> sporadic failures of test_close_fds and test_pass_fds in test_subprocess
2013-03-09 04:21:20ned.deilysetstatus: closed -> open
title: test_subprocess.test_close_fds() sporadic failures on Mac OS X Tiger -> sporadic failures of test_close_fds() and test_pass_fds in test_subprocess
resolution: out of date -> (no value)
versions: + Python 3.3
2013-03-08 19:29:34donmezsetnosy: + donmez
messages: + msg183756
2013-01-03 01:47:23vstinnersetstatus: open -> closed
resolution: out of date
messages: + msg178889
2011-07-01 12:56:21vstinnercreate