Title: test_input_tty hangs when run multiple times in the same process on macOS 10.15
Components: Versions: Python 3.11, Python 3.10, Python 3.9, Python 3.8
Assigned To: Nosy List: andrei.avk, kj, lukasz.langa, vstinner
Created on 2021-08-11 10:04 by lukasz.langa, last changed 2022-04-11 14:59 by admin.

msg399380 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-08-11 10:04
(I'm still investigating at the moment whether something changed in my environment.)

Running the following right now hangs on test_input_tty for me:

./python.exe -m test test_builtin test_builtin -v

This fails on all branches up to and including 3.7, so I assume this is environment-specific unless it's a regression due to a change that was backported all the way back to 3.7, which is out of the question as the last functional commit on 3.7 was back in June.

Things I tried so far:
- rebooting;
- using another terminal app (I use iTerm2 by default, tried too);
- another shell (I use fish by default, tried bash 5.0 as well);
- a non-pydebug build (I use pydebug builds by default to run -R:)

The test in question is using deadline if available and `sysconfig.get_config_vars()['HAVE_LIBREADLINE']` returns 1. I'll be trying to check if that works for me next.
msg399382 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-08-11 10:44
Hynek confirmed on Big Sur with Python 3.9.5 from asdf that test_input_tty hangs, too, if ran for the second time in the same process.

Moreover, readline is not it. First of all, it's libedit on macOS:

❯ ll /usr/lib/libreadline.dylib
lrwxr-xr-x  1 root  wheel    15B Feb  2  2020 /usr/lib/libreadline.dylib -> libedit.3.dylib

So Python uses that by default:
>>> import readline
'EditLine wrapper'
>>> readline._READLINE_VERSION

Unless you instruct it to use readline (for example by providing "-I$(brew --prefix readline)/include" to CFLAGS and "-L$(brew --prefix readline)/lib" to LDFLAGS before running ./configure):
>>> import readline
>>> readline._READLINE_VERSION

The hang is the same in both cases. 

Next course of action, checking if it's not due to fork shenanigans in _run_child():
msg399384 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-08-11 11:29
Parent process hangs on:
* thread #1, queue = '', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff6741181e libsystem_kernel.dylib`read + 10
    frame #1: 0x000000010226a117 python.exe`_Py_read(fd=3, buf=0x00007f8d24009840, count=8192) at fileutils.c:1744:13
    frame #2: 0x00000001022f1335 python.exe`_io_FileIO_readinto_impl(self=0x0000000103b284d0, buffer=0x00007ffeedcbe928) at fileio.c:645:9
    frame #3: 0x00000001022f063e python.exe`_io_FileIO_readinto(self=0x0000000103b284d0, arg=0x0000000102f5d090) at fileio.c.h:205:20
    frame #4: 0x00000001020050e9 python.exe`method_vectorcall_O(func=0x00000001026fd970, args=0x00007ffeedcbeaf0, nargsf=2, kwnames=0x0000000000000000) at descrobject.c:462:24
    frame #5: 0x0000000101ff323d python.exe`_PyObject_VectorcallTstate(tstate=0x00007f8d20d04f00, callable=0x00000001026fd970, args=0x00007ffeedcbeaf0, nargsf=2, kwnames=0x0000000000000000) at abstract.h:114:11
    frame #6: 0x0000000101ff30c9 python.exe`PyObject_VectorcallMethod(name=0x00000001026fcbe0, args=0x00007ffeedcbeaf0, nargsf=2, kwnames=0x0000000000000000) at call.c:770:24
    frame #7: 0x00000001022f92a0 python.exe`PyObject_CallMethodOneArg(self=0x0000000103b284d0, name=0x00000001026fcbe0, arg=0x0000000102f5d090) at abstract.h:204:12

where "name" in frame #7 is the "readinto" method of <_io.FileIO name=3 mode='rb' closefd=True> and "arg" is <memory at 0x102f5d090>.

Child process hangs on:

* thread #1, queue = '', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff67413bf6 libsystem_kernel.dylib`write + 10
    frame #1: 0x000000010226a3e0 python.exe`_Py_write_impl(fd=2, buf=0x00007ffeedcbcbcb, count=1, gil_held=0) at fileutils.c:1813:17
    frame #2: 0x000000010226a535 python.exe`_Py_write_noraise(fd=2, buf=0x00007ffeedcbcbcb, count=1) at fileutils.c:1871:12
    frame #3: 0x0000000102257834 python.exe`_Py_DumpASCII(fd=2, text=0x0000000102a1bed0) at traceback.c:1002:13
    frame #4: 0x0000000102258ba5 python.exe`dump_frame(fd=2, frame=0x00000001025dbba8) at traceback.c:1035:9
    frame #5: 0x00000001022579fa python.exe`dump_traceback(fd=2, tstate=0x00007f8d20d04f00, write_header=0) at traceback.c:1084:9
    frame #6: 0x0000000102257bc6 python.exe`_Py_DumpTracebackThreads(fd=2, interp=0x00007f8d2281b010, current_tstate=0x00007f8d20d04f00) at traceback.c:1186:9
    frame #7: 0x0000000102311dc3 python.exe`faulthandler_dump_traceback(fd=2, all_threads=1, interp=0x00007f8d2281b010) at faulthandler.c:245:15
    frame #8: 0x000000010231224b python.exe`faulthandler_user(signum=14) at faulthandler.c:843:5
    frame #9: 0x00007fff674c85fd libsystem_platform.dylib`_sigtramp + 29
    frame #10: 0x00007fff6741435f libsystem_kernel.dylib`__ioctl + 11
    frame #11: 0x00007fff6741434b libsystem_kernel.dylib`ioctl + 150
    frame #12: 0x00007fff6734ad63 libsystem_c.dylib`tcsetattr + 111
    frame #13: 0x0000000103c772ee libreadline.8.dylib`_set_tty_settings + 28
    frame #14: 0x0000000103c76d87 libreadline.8.dylib`rl_prep_terminal + 683
    frame #15: 0x0000000103c88ce9 libreadline.8.dylib`_rl_callback_newline + 51
    frame #16: 0x0000000103c5ae75`readline_until_enter_or_signal(prompt="prompt", signal=0x00007ffeedcbd63c) at readline.c:1318:5
    frame #17: 0x0000000103c58637`call_readline(sys_stdin=0x00007fff8d9c8d90, sys_stdout=0x00007fff8d9c8e28, prompt="prompt") at readline.c:1396:9
    frame #18: 0x0000000101fad9b6 python.exe`PyOS_Readline(sys_stdin=0x00007fff8d9c8d90, sys_stdout=0x00007fff8d9c8e28, prompt="prompt") at myreadline.c:391:14
    frame #19: 0x00000001021a071d python.exe`builtin_input_impl(module=0x000000010268c0b0, prompt=0x00000001027a0400) at bltinmodule.c:2188:13
msg399385 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-08-11 11:31
This might be a long-standing problem. I haven't encountered it before because I was always running -R: with -j and in this case the test is skipped:

test_input_tty (test.test_builtin.PtyTests) ... skipped 'stdin and stdout must be ttys'
msg399396 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-08-11 14:53
Amazingly, excluding every other test function with a bunch of `-i` patterns still makes it hang when ran twice. On the other hand, only including the test function with `-m` works fine.

This is very weird. Looking further.

Semi-relatedly, I found BPO-26228, could reproduce it, and finished an open PR on it. While those are separate issues, I'm hoping to solve them both.
msg399398 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-08-11 15:10
I found the high-level reason why test_builtin hangs: it runs doctests as well. What's the root cause? I don't know yet.

But to confirm, I can also hang the tests by running:

$ python3.9 -m test test_doctest test_builtin -v

Now to discover what it is that doctest does...
msg399412 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-08-11 18:50
The doctest runner sets an output redirecting debugger, which subclasses Pdb, around actually running the doctest. This action causes the hang. New finding, we can hang the test with test_pdb too:

$ python3.9 -m test test_pdb test_builtin -v
msg399413 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-08-11 19:41
It *is* readline-related after all O_O

Commenting out this section in Pdb.__init__ makes the issue go away:

time ./python.exe -E -Wd -m test test_builtin test_builtin
0:00:00 load avg: 2.12 Run tests sequentially
0:00:00 load avg: 2.12 [1/2] test_builtin
0:00:00 load avg: 2.12 [2/2] test_builtin

== Tests result: SUCCESS ==

All 2 tests OK.

Total duration: 1.3 sec
Tests result: SUCCESS
        1.56 real         1.42 user         0.10 sys

I'll be continuing on this tomorrow to find the root cause.
msg400631 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-08-30 16:14
Is it related to ?
msg406262 - (view) Author: Andrei Kulakov (andrei.avk) * (Python triager) Date: 2021-11-13 00:49
I've looked into this and the hang happens on this line:

So the issue is that on the second run, there's nothing to read on that fd. I've tried using os.stat to check if there's data on the fd, but it returned 0 data in both 1st and 2nd runs.

However, if a small sleep is added before running os.stat, it does return size of data on 1st run and returns 0 on 2nd run, meaning it's possible to avoid the hang and error out instead (is that an improvement?)

This is on MacOS 11.4 Big Sur by the way.

This is my test debug branch:
