This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: test_asyncio fails intermittently on OS X 10.4
Type: Stage: resolved
Components: Tests Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: gvanrossum, ned.deily
Priority: normal Keywords: buildbot

Created on 2013-10-19 08:46 by ned.deily, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (8)
msg200389 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2013-10-19 08:46
Since asyncio was checked in, there have been a couple of similar buildbot failures on the OS X 10.4 (Tiger) buildbot sandwiched around a successful run.  I also saw a similar failure on a 10.4 system I have.  So far, the OS X 10.6 (Snow Leopard) buildbot has not had any failures nor have I seen any so far in limited testing on other OS X version (newer than 10.4).

http://buildbot.python.org/all/builders/x86%20Tiger%203.x/builds/7093

http://buildbot.python.org/all/builders/x86%20Tiger%203.x/builds/7095

From the 

======================================================================
ERROR: test_subprocess_shell (test.test_asyncio.test_events.KqueueEventLoopTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_asyncio/test_events.py", line 1034, in test_subprocess_shell
    self.loop.run_until_complete(connect())
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/base_events.py", line 177, in run_until_complete
    return future.result()
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/futures.py", line 221, in result
    raise self._exception
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/tasks.py", line 261, in _step
    result = next(coro)
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_asyncio/test_events.py", line 1031, in connect
    'echo "Python"')
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/base_events.py", line 521, in subprocess_shell
    protocol, cmd, True, stdin, stdout, stderr, bufsize, **kwargs)
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/unix_events.py", line 161, in _make_subprocess_transport
    yield from transp._post_init()
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/unix_events.py", line 483, in _post_init
    proc.stdin)
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/base_events.py", line 507, in connect_write_pipe
    transport = self._make_write_pipe_transport(pipe, protocol, waiter)
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/unix_events.py", line 150, in _make_write_pipe_transport
    return _UnixWritePipeTransport(self, pipe, protocol, waiter, extra)
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/unix_events.py", line 273, in __init__
    if not stat.S_ISFIFO(os.fstat(self._fileno).st_mode):
OSError: [Errno 9] Bad file descriptor

======================================================================
ERROR: test_subprocess_shell (test.test_asyncio.test_events.SelectEventLoopTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_asyncio/test_events.py", line 1034, in test_subprocess_shell
    self.loop.run_until_complete(connect())
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/base_events.py", line 177, in run_until_complete
    return future.result()
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/futures.py", line 221, in result
    raise self._exception
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/tasks.py", line 261, in _step
    result = next(coro)
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_asyncio/test_events.py", line 1031, in connect
    'echo "Python"')
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/base_events.py", line 521, in subprocess_shell
    protocol, cmd, True, stdin, stdout, stderr, bufsize, **kwargs)
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/unix_events.py", line 161, in _make_subprocess_transport
    yield from transp._post_init()
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/unix_events.py", line 483, in _post_init
    proc.stdin)
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/base_events.py", line 507, in connect_write_pipe
    transport = self._make_write_pipe_transport(pipe, protocol, waiter)
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/unix_events.py", line 150, in _make_write_pipe_transport
    return _UnixWritePipeTransport(self, pipe, protocol, waiter, extra)
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/unix_events.py", line 273, in __init__
    if not stat.S_ISFIFO(os.fstat(self._fileno).st_mode):
OSError: [Errno 9] Bad file descriptor
msg200427 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-10-19 15:44
Thanks for reporting these. These tests sometimes pass and sometimes fail,
right? I suspect that there's a race condition in the test and slower
machines provoke it more often. I'm not sure what to do -- I won't have
time to properly debug this until after the release; perhaps it's best to
temporarily disable the failing tests?
msg200851 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-10-21 21:45
I find it pretty suspicious that when this fails it is always connecting the stdin pipe. That code is also suspect in the AIX failure (issue 19293), although for a different reason.

The only theory I have at this point is that perhaps there's a platform bug on OS X 10.4 and before where the pipe is considered invalid if the process already exits? But the subprocess shouldn't be exiting, so this theory is no good, unless there's also a bug that causes the fork or exec to fail. :-(

Maybe this will go away if we replace the stdin pipe with a socket pair, as we are proposing for issue 19293, without ever understanding why.
msg200870 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-10-22 03:40
I committed a patch that should hopefully fix issue 19293 -- maybe it will also reduce the flakiness of test_subprocess_shell() on OS X?
msg200883 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2013-10-22 04:46
Success! (I think.) On my elderly 10.4 machine, as of f33cc4a175a4:

tag:         tip
user:        Guido van Rossum <guido@dropbox.com>
date:        Mon Oct 21 20:57:25 2013 -0700
files:       Lib/test/test_asyncio/test_events.py
description:
Unsilence several asyncio AIX tests that no longer hang, and silence a new hang.

test_asyncio now runs successfully repeatedly.  If I go back 4 mods on the default branch (to 4f1121ae1cb5), before your last three pushes this evening, test_asyncio fails with the "Bad file descriptor" every time.
msg200972 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-10-22 16:57
If the buildbot is still green, can you close this?  I don't think it's worth trying to formulate and test a more precise theory about the root cause, given that it's most likely a platform bug anyway.
msg200975 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2013-10-22 17:47
I agree. And there appear to have been no test_asyncio failures on the buildbot since the checkins -> closing.
msg201142 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-10-24 14:51
I revived my ancient PowerBook G4 just to get to the bottom of this.

The long and short of the issue seems to be that when subprocess.Popen() closes the *read* end of the stdin pipe, fstat() on the *write* end starts failing with EBADF. The exact line where the transition happens is this (aroun subprocess.py#1375):

if p2cread != -1 and p2cwrite != -1 and p2cread != devnull_fd:
    os.close(p2cread)

This is clearly a platform bug. I also verified that the child process is still running.

Now I can't reproduce this *without* a fork+exec in the middle, so it may be that there's something magical that's done by the _posixsubprocess.fork_exec() call affecting the pipe, but before the above os.close() call the fstat() call definitely reports success, so the "closing" must be happening as a side effect of closing the other end. I'm sure there's someone at Apple with the kernel source code who can explain this, but I think we've done our due diligence.

That switching to a socketpair fixed the issue suggests that the kernel bug is related to pipes specifically.
History
Date User Action Args
2022-04-11 14:57:52adminsetgithub: 63493
2013-10-24 14:51:32gvanrossumsetmessages: + msg201142
2013-10-22 17:47:52ned.deilysetstatus: open -> closed
resolution: fixed
messages: + msg200975

stage: resolved
2013-10-22 16:57:33gvanrossumsetmessages: + msg200972
2013-10-22 04:46:27ned.deilysetmessages: + msg200883
2013-10-22 03:40:15gvanrossumsetmessages: + msg200870
2013-10-21 21:45:00gvanrossumsetmessages: + msg200851
2013-10-19 15:44:34gvanrossumsetmessages: + msg200427
2013-10-19 08:46:44ned.deilycreate