classification
Title: AIX: test_builtin.test_input_no_stdout_fileno() hangs
Type: behavior Stage: resolved
Components: Tests Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: BTaskaya, Michael.Felt, vstinner
Priority: normal Keywords: patch

Created on 2020-04-02 12:34 by Michael.Felt, last changed 2020-04-03 09:38 by Michael.Felt. This issue is now closed.

Files
File name Uploaded Description Edit
test_builtin.txt Michael.Felt, 2020-04-02 14:30
Pull Requests
URL Status Linked Edit
PR 19308 closed Michael.Felt, 2020-04-02 17:44
PR 19312 closed vstinner, 2020-04-02 20:08
Messages (13)
msg365583 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020-04-02 12:34
related to 

Two AIX bots - different environments - continue to fail the test: `test_builtin` since

During the first run - the test fails with something such as:

0:31:47 [215/420] test_abc passed -- running: test_builtin (14 min 10 sec)
0:32:17 running: test_builtin (14 min 40 sec), test_decimal (30.0 sec)
0:32:37 [216/420] test_decimal passed (49.4 sec) -- running: test_builtin (15 min)
0:32:37 [217/420/1] test_builtin crashed (Exit code 1)
Timeout (0:15:00)!
Thread 0x00000001 (most recent call first):

And in the second pass (re-run of failed tests) the failure goes:

test_zip (test.test_builtin.BuiltinTest) ... ok
test_zip_bad_iterable (test.test_builtin.BuiltinTest) ... ok
test_zip_pickle (test.test_builtin.BuiltinTest) ... ok
Timeout (0:15:00)!
Thread 0x00000001 (most recent call first):
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/support/__init__.py", line 3435 in wait_process
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/test_builtin.py", line 1898 in run_child
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/test_builtin.py", line 1952 in test_input_no_stdout_fileno
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/unittest/case.py", line 616 in _callTestMethod
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/unittest/case.py", line 659 in run
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/unittest/case.py", line 719 in __call__
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/unittest/suite.py", line 122 in run
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/unittest/suite.py", line 84 in __call__
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/unittest/suite.py", line 122 in run
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/unittest/suite.py", line 84 in __call__
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/unittest/suite.py", line 122 in run
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/unittest/suite.py", line 84 in __call__
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/unittest/runner.py", line 176 in run
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/support/__init__.py", line 2079 in _run_suite
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/support/__init__.py", line 2201 in run_unittest
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/libregrtest/runtest.py", line 209 in _test_module
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/libregrtest/runtest.py", line 234 in _runtest_inner2
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/libregrtest/runtest.py", line 270 in _runtest_inner
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/libregrtest/runtest.py", line 153 in _runtest
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/libregrtest/runtest.py", line 193 in runtest
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/libregrtest/main.py", line 318 in rerun_failed_tests
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/libregrtest/main.py", line 691 in _main
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/libregrtest/main.py", line 634 in main
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/libregrtest/main.py", line 712 in main
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/__main__.py", line 2 in <module>
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/runpy.py", line 87 in _run_code
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/runpy.py", line 197 in _run_module_as_main
make: 1254-004 The error code from the last command is 1.
Stop.
program finished with exit code 2
elapsedTime=3610.362052
test_input_no_stdout_fileno (test.test_builtin.PtyTests) ... 

NOTE: the run has stopped with the line

test_input_no_stdout_fileno (test.test_builtin.PtyTests) ... 

and, so the results also fail to get uploaded.
msg365585 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020-04-02 12:43
did not get issue numbers in above: issue31160 and issue40094.

I waited a day, before posting - in the hope it would go away.

Also, I have been testing manually (no -j arguments) - and test_builtin passes when run manually. So, becoming hard to dissect and propose a "cure".

I have started with:

./python  ./Tools/scripts/run_tests.py -j 1 -u all -W --slowest --fail-env-changed --timeout=1200 -j2

and now see the test stalled:

0:39:41 running: test_builtin (5 min 14 sec)
0:40:11 running: test_builtin (5 min 44 sec)
0:40:41 running: test_builtin (6 min 14 sec)
0:41:11 running: test_builtin (6 min 44 sec)
0:41:41 running: test_builtin (7 min 14 sec)
0:42:11 running: test_builtin (7 min 44 sec)
0:42:41 running: test_builtin (8 min 14 sec)
0:43:11 running: test_builtin (8 min 44 sec)
msg365586 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-02 12:46
See also bpo-40140: test_builtin crashes when runned in parallel mode on solaris.
msg365597 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020-04-02 14:13
Now consistently - stalled.

aixtools@x064:[/home/aixtools/py39-3.9]git diff
diff --git a/Lib/test/test_builtin.py b/Lib/test/test_builtin.py
index eaada1b504..89c4ebc2bd 100644
--- a/Lib/test/test_builtin.py
+++ b/Lib/test/test_builtin.py
@@ -1849,26 +1849,40 @@ class PtyTests(unittest.TestCase):

         if pid == 0:
             # Child
+            pid = os.getpid() # get my PID
+            tty = open("/dev/pts/3", "a")
+            tty.write("\nI am child - this is my PID:{}".format(pid))
+            tty.close()
             try:
                 # Make sure we don't get stuck if there's a problem
-                signal.alarm(2)
+                # signal.alarm(2)
                 os.close(r)
                 with open(w, "w") as wpipe:
                     child(wpipe)
             except:
                 traceback.print_exc()
             finally:
+                tty = open("/dev/pts/3", "a")
+                tty.write("\nI am child - exiting PID:{}".format(pid))
+                tty.close()
                 # We don't want to return to unittest...
                 os._exit(0)

         # Parent
         os.close(w)
+        tty = open("/dev/pts/3", "a")
+        tty.write("\nI am parent:{} with child:{}".format(os.getpid(), pid))
+        tty.write("fd:{} input:{}".format(fd, terminal_input))
+        tty.close()
         os.write(fd, terminal_input)

         # Get results from the pipe
         with open(r, "r") as rpipe:
             lines = []
             while True:
+                tty = open("/dev/pts/3", "a")
+                tty.write("\nI am parent:{} with lines:{}".format(os.getpid(), lines))
+                tty.close()
                 line = rpipe.readline().strip()
                 if line == "":
                     # The other end was closed => the child exited
@@ -1895,6 +1909,9 @@ class PtyTests(unittest.TestCase):

         # Wait until the child process completes before closing the PTY to
         # prevent sending SIGHUP to the child process.
+        tty = open("/dev/pts/3", "a")
+        tty.write("\nI am parent:{} starting wait_process({}, exitcode=0)".format(os.getpid(), pid))
+        tty.close()
         support.wait_process(pid, exitcode=0)

         # Close the PTY

======== Debug ========
aixtools@x064:[/home/aixtools/py39-3.9]./python -m test test_builtin
0:00:00 Run tests sequentially
0:00:00 [1/1] test_builtin

I am child - this is my PID:22544440
I am parent:21954660 with child:22544440fd:6 input:b'quux\r'
I am parent:21954660 with lines:[]
I am child - exiting PID:22544440
I am parent:21954660 with lines:['stdin.isatty(): True']
I am parent:21954660 with lines:['stdin.isatty(): True', "captured: 'prompt'"]
I am parent:21954660 starting wait_process(22544440, exitcode=0)
msg365598 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020-04-02 14:20
If I move the close to before the support.waitprocess() call I get:

aixtools@x064:[/home/aixtools/py39-3.9]./python -m test test_builtin
0:00:00 Run tests sequentially
0:00:00 [1/1] test_builtin

I am child - this is my PID:22544444
I am parent:21954666 with child:22544444fd:6 input:b'quux\r'
I am parent:21954666 with lines:[]
I am child - exiting PID:22544444
I am parent:21954666 with lines:['stdin.isatty(): True']
I am parent:21954666 with lines:['stdin.isatty(): True', "captured: 'prompt'"]
I am parent:21954666 starting wait_process(22544444, exitcode=0)
I am parent:21954666 with child:22544446fd:6 input:b'quux\r\n'
I am parent:21954666 with lines:[]
I am child - this is my PID:22544446
I am child - exiting PID:22544446
I am parent:21954666 with lines:['tty = True']
I am parent:21954666 with lines:['tty = True', "'quux'"]
I am parent:21954666 starting wait_process(22544446, exitcode=0)
I am parent:21954666 with child:22544448fd:6 input:b'quux\xe9\r\n'
I am parent:21954666 with lines:[]
I am child - this is my PID:22544448
I am child - exiting PID:22544448
I am parent:21954666 with lines:['tty = True']
I am parent:21954666 with lines:['tty = True', "'quux\\udce9'"]
I am parent:21954666 starting wait_process(22544448, exitcode=0)
I am parent:21954666 with child:22544450fd:6 input:b'quux\xe9\r\n'
I am parent:21954666 with lines:[]
I am child - this is my PID:22544450
I am child - exiting PID:22544450
I am parent:21954666 with lines:['tty = True']
I am parent:21954666 with lines:['tty = True', "'quux\\udce9'"]
I am parent:21954666 starting wait_process(22544450, exitcode=0)
== Tests result: SUCCESS ==

1 test OK.
msg365599 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020-04-02 14:30
I tried moving the child/parent logic blocks and get this as debug output:

aixtools@x064:[/home/aixtools/py39-3.9]./python -m test test_builtin
0:00:00 Run tests sequentially
0:00:00 [1/1] test_builtin

I am child - this is my PID:8519822
I am parent:6422696 with child:8519822fd:6 input:b'quux\r'
I am child - exiting PID:8519822
I am parent:6422696 with lines:[]
I am parent:6422696 with lines:['stdin.isatty(): True']
I am parent:6422696 with lines:['stdin.isatty(): True', "captured: 'prompt'"]
I am parent:6422696 starting wait_process(8519822, exitcode=0)

+++++++
Diff as attachment...
msg365600 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020-04-02 14:35
Next debug info:

I am child - this is my PID:8519830
I am parent:18284612 with child:8519830fd:6 input:b'quux\r'
I am parent:18284612 with lines:[]
I am child - exiting PID:8519830
I am parent:18284612 with lines:['stdin.isatty(): True']
I am parent:18284612 with lines:['stdin.isatty(): True', "captured: 'prompt'"]
I am parent:18284612 starting wait_process(8519830, exitcode=0)[2] + Stopped (SIGTSTP)        ./python -m test test_builtin
aixtools@x064:[/home/aixtools/py39-3.9]ps -p 8519830
      PID    TTY  TIME CMD
  8519830            - <exiting>
aixtools@x064:[/home/aixtools/py39-3.9]kill 8519830
aixtools@x064:[/home/aixtools/py39-3.9]ps -p 8519830
      PID    TTY  TIME CMD
  8519830            - <exiting>
aixtools@x064:[/home/aixtools/py39-3.9]kill -9 8519830
aixtools@x064:[/home/aixtools/py39-3.9]ps -p 8519830
      PID    TTY  TIME CMD
  8519830            - <exiting>
aixtools@x064:[/home/aixtools/py39-3.9]fg
./python -m test test_builtin
[2] + Stopped (SIGTSTP)        ./python -m test test_builtin
aixtools@x064:[/home/aixtools/py39-3.9]ps -p 8519830
      PID    TTY  TIME CMD
  8519830            - <exiting>
msg365601 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020-04-02 14:37
FYI: in child block:

calling os.exit(0), rather than os._exit() gives same result.
msg365602 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020-04-02 15:30
To get it to move forward: as it is not solely and AIX thing (see bpo-40-140) 

This works: but is it what is wanted?

Tests result: SUCCESS
aixtools@x064:[/home/aixtools/py39-3.9]git diff
diff --git a/Lib/test/test_builtin.py b/Lib/test/test_builtin.py
index eaada1b504..4a5addc6fe 100644
--- a/Lib/test/test_builtin.py
+++ b/Lib/test/test_builtin.py
@@ -1893,12 +1893,18 @@ class PtyTests(unittest.TestCase):
             self.fail("got %d lines in pipe but expected 2, child output was:\n%s"
                       % (len(lines), child_output))

-        # Wait until the child process completes before closing the PTY to
-        # prevent sending SIGHUP to the child process.
-        support.wait_process(pid, exitcode=0)
+        if sys.platform == "linux" or not os.name == "posix":
+            # Wait until the child process completes before closing the PTY to
+            # prevent sending SIGHUP to the child process.
+            support.wait_process(pid, exitcode=0)

-        # Close the PTY
-        os.close(fd)
+            # Close the PTY
+            os.close(fd)
+        else:
+            # Other posix need to close the pty for the child to exit normally
+            # Close the PTY
+            os.close(fd)
+            support.wait_process(pid, exitcode=0)
msg365619 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-02 20:10
Michael Felt: Can you please test if PR 19312 fix the issue for you on AIX?
msg365642 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-02 22:41
I pushed this change which should fix the issue on AIX:

New changeset 7a51a7e19f0143f75f8fc9ff68f93ed40937aec6 by Victor Stinner in branch 'master':
bpo-40140: test_builtin.PtyTests registers SIGHUP handler (GH-19314)
https://github.com/python/cpython/commit/7a51a7e19f0143f75f8fc9ff68f93ed40937aec6

I'm waiting for buildbots before closing this issue.
msg365649 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-03 00:12
> bpo-40140: test_builtin.PtyTests registers SIGHUP handler (GH-19314)

With this change, PPC64 AIX 3.x is back to green. I close this issue. Thanks for the bug report Michael Felt.

FYI I also backported my fix to 3.8: PR 19316.
msg365687 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020-04-03 09:38
Thanks for the fix!
History
Date User Action Args
2020-04-03 09:38:01Michael.Feltsetmessages: + msg365687
2020-04-03 00:12:43vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg365649

stage: patch review -> resolved
2020-04-02 22:41:11vstinnersetmessages: + msg365642
2020-04-02 20:10:53vstinnersetmessages: + msg365619
2020-04-02 20:08:03vstinnersetpull_requests: + pull_request18677
2020-04-02 17:44:14Michael.Feltsetkeywords: + patch
stage: patch review
pull_requests: + pull_request18671
2020-04-02 15:30:27Michael.Feltsettype: behavior
messages: + msg365602
2020-04-02 14:37:53Michael.Feltsetmessages: + msg365601
2020-04-02 14:35:30Michael.Feltsetmessages: + msg365600
2020-04-02 14:30:28Michael.Feltsetfiles: + test_builtin.txt

messages: + msg365599
2020-04-02 14:20:27Michael.Feltsetmessages: + msg365598
2020-04-02 14:13:16Michael.Feltsetmessages: + msg365597
2020-04-02 13:01:24BTaskayasetnosy: + BTaskaya
2020-04-02 12:46:28vstinnersetnosy: + vstinner

messages: + msg365586
title: AIX: bot status: stuck at: failed test (failure) uploading test-results.xml (failure) -> AIX: test_builtin.test_input_no_stdout_fileno() hangs
2020-04-02 12:43:10Michael.Feltsetmessages: + msg365585
2020-04-02 12:34:23Michael.Feltcreate