Issue19294
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2013-10-19 08:46 by ned.deily, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Messages (8) | |||
---|---|---|---|
msg200389 - (view) | Author: Ned Deily (ned.deily) * | Date: 2013-10-19 08:46 | |
Since asyncio was checked in, there have been a couple of similar buildbot failures on the OS X 10.4 (Tiger) buildbot sandwiched around a successful run. I also saw a similar failure on a 10.4 system I have. So far, the OS X 10.6 (Snow Leopard) buildbot has not had any failures nor have I seen any so far in limited testing on other OS X version (newer than 10.4). http://buildbot.python.org/all/builders/x86%20Tiger%203.x/builds/7093 http://buildbot.python.org/all/builders/x86%20Tiger%203.x/builds/7095 From the ====================================================================== ERROR: test_subprocess_shell (test.test_asyncio.test_events.KqueueEventLoopTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_asyncio/test_events.py", line 1034, in test_subprocess_shell self.loop.run_until_complete(connect()) File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/base_events.py", line 177, in run_until_complete return future.result() File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/futures.py", line 221, in result raise self._exception File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/tasks.py", line 261, in _step result = next(coro) File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_asyncio/test_events.py", line 1031, in connect 'echo "Python"') File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/base_events.py", line 521, in subprocess_shell protocol, cmd, True, stdin, stdout, stderr, bufsize, **kwargs) File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/unix_events.py", line 161, in _make_subprocess_transport yield from transp._post_init() File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/unix_events.py", line 483, in _post_init proc.stdin) File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/base_events.py", line 507, in connect_write_pipe transport = self._make_write_pipe_transport(pipe, protocol, waiter) File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/unix_events.py", line 150, in _make_write_pipe_transport return _UnixWritePipeTransport(self, pipe, protocol, waiter, extra) File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/unix_events.py", line 273, in __init__ if not stat.S_ISFIFO(os.fstat(self._fileno).st_mode): OSError: [Errno 9] Bad file descriptor ====================================================================== ERROR: test_subprocess_shell (test.test_asyncio.test_events.SelectEventLoopTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_asyncio/test_events.py", line 1034, in test_subprocess_shell self.loop.run_until_complete(connect()) File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/base_events.py", line 177, in run_until_complete return future.result() File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/futures.py", line 221, in result raise self._exception File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/tasks.py", line 261, in _step result = next(coro) File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_asyncio/test_events.py", line 1031, in connect 'echo "Python"') File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/base_events.py", line 521, in subprocess_shell protocol, cmd, True, stdin, stdout, stderr, bufsize, **kwargs) File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/unix_events.py", line 161, in _make_subprocess_transport yield from transp._post_init() File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/unix_events.py", line 483, in _post_init proc.stdin) File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/base_events.py", line 507, in connect_write_pipe transport = self._make_write_pipe_transport(pipe, protocol, waiter) File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/unix_events.py", line 150, in _make_write_pipe_transport return _UnixWritePipeTransport(self, pipe, protocol, waiter, extra) File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/asyncio/unix_events.py", line 273, in __init__ if not stat.S_ISFIFO(os.fstat(self._fileno).st_mode): OSError: [Errno 9] Bad file descriptor |
|||
msg200427 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-10-19 15:44 | |
Thanks for reporting these. These tests sometimes pass and sometimes fail, right? I suspect that there's a race condition in the test and slower machines provoke it more often. I'm not sure what to do -- I won't have time to properly debug this until after the release; perhaps it's best to temporarily disable the failing tests? |
|||
msg200851 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-10-21 21:45 | |
I find it pretty suspicious that when this fails it is always connecting the stdin pipe. That code is also suspect in the AIX failure (issue 19293), although for a different reason. The only theory I have at this point is that perhaps there's a platform bug on OS X 10.4 and before where the pipe is considered invalid if the process already exits? But the subprocess shouldn't be exiting, so this theory is no good, unless there's also a bug that causes the fork or exec to fail. :-( Maybe this will go away if we replace the stdin pipe with a socket pair, as we are proposing for issue 19293, without ever understanding why. |
|||
msg200870 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-10-22 03:40 | |
I committed a patch that should hopefully fix issue 19293 -- maybe it will also reduce the flakiness of test_subprocess_shell() on OS X? |
|||
msg200883 - (view) | Author: Ned Deily (ned.deily) * | Date: 2013-10-22 04:46 | |
Success! (I think.) On my elderly 10.4 machine, as of f33cc4a175a4: tag: tip user: Guido van Rossum <guido@dropbox.com> date: Mon Oct 21 20:57:25 2013 -0700 files: Lib/test/test_asyncio/test_events.py description: Unsilence several asyncio AIX tests that no longer hang, and silence a new hang. test_asyncio now runs successfully repeatedly. If I go back 4 mods on the default branch (to 4f1121ae1cb5), before your last three pushes this evening, test_asyncio fails with the "Bad file descriptor" every time. |
|||
msg200972 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-10-22 16:57 | |
If the buildbot is still green, can you close this? I don't think it's worth trying to formulate and test a more precise theory about the root cause, given that it's most likely a platform bug anyway. |
|||
msg200975 - (view) | Author: Ned Deily (ned.deily) * | Date: 2013-10-22 17:47 | |
I agree. And there appear to have been no test_asyncio failures on the buildbot since the checkins -> closing. |
|||
msg201142 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-10-24 14:51 | |
I revived my ancient PowerBook G4 just to get to the bottom of this. The long and short of the issue seems to be that when subprocess.Popen() closes the *read* end of the stdin pipe, fstat() on the *write* end starts failing with EBADF. The exact line where the transition happens is this (aroun subprocess.py#1375): if p2cread != -1 and p2cwrite != -1 and p2cread != devnull_fd: os.close(p2cread) This is clearly a platform bug. I also verified that the child process is still running. Now I can't reproduce this *without* a fork+exec in the middle, so it may be that there's something magical that's done by the _posixsubprocess.fork_exec() call affecting the pipe, but before the above os.close() call the fstat() call definitely reports success, so the "closing" must be happening as a side effect of closing the other end. I'm sure there's someone at Apple with the kernel source code who can explain this, but I think we've done our due diligence. That switching to a socketpair fixed the issue suggests that the kernel bug is related to pipes specifically. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:52 | admin | set | github: 63493 |
2013-10-24 14:51:32 | gvanrossum | set | messages: + msg201142 |
2013-10-22 17:47:52 | ned.deily | set | status: open -> closed resolution: fixed messages: + msg200975 stage: resolved |
2013-10-22 16:57:33 | gvanrossum | set | messages: + msg200972 |
2013-10-22 04:46:27 | ned.deily | set | messages: + msg200883 |
2013-10-22 03:40:15 | gvanrossum | set | messages: + msg200870 |
2013-10-21 21:45:00 | gvanrossum | set | messages: + msg200851 |
2013-10-19 15:44:34 | gvanrossum | set | messages: + msg200427 |
2013-10-19 08:46:44 | ned.deily | create |