classification
Title: Python interactive console doesn't use sys.stdin for input
Type: enhancement Stage: needs patch
Components: Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Drekin, benjamin.peterson, brett.cannon, eric.araujo, georg.brandl, gvanrossum, ncoghlan, paul.moore, pitrou, steve.dower, tshepang
Priority: normal Keywords:

Created on 2013-04-02 16:59 by Drekin, last changed 2015-11-28 20:06 by Drekin.

Messages (26)
msg185848 - (view) Author: Adam Bartoš (Drekin) * Date: 2013-04-02 16:59
The Python interactive console actually doesn't use sys.stdin but standard C stdin for input. Is there any reason for this? Why it then uses its encoding attribute? (Assigning sys.stdin something, that doesn't have encoding attribute freezes the interpreter.) If anything, wouldn't it make more sense if it used sys.__stdin__.encoding instead of sys.stdin? sys.stdin is intended to be set by user (it affects input() and code.inpterrupt() which tries to minic standard interactive console).
msg186121 - (view) Author: Adam Bartoš (Drekin) * Date: 2013-04-06 09:41
Sorry for typos.
• interactive console doesn't use sys.stdin for input, why?
• it uses sys.stdin.encoding, shouldn't it rather use sys.__stdin__.encoding if anything?
• input() and hence code.interact() uses sys.stdin
msg186553 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-11 09:56
> • interactive console doesn't use sys.stdin for input, why?

Modules/main.c calls PyRun_AnyFileFlags(stdin, "<stdin>", ...). At this point, sys.stdin *is* the same as C stdin by construction, so I'm not sure how you came to encounter the issue.

However, it's also true that if you later redirect sys.stdin, it will be ignored and the original C stdin (as passed to PyRun_InteractiveLoopFlags) will continue to be used. On the other hand, the input() implementation has dedicated logic to find out whether sys.stdin is the same as C stdin.

(by the way, the issue should also apply to 2.7)

> • it uses sys.stdin.encoding, shouldn't it rather use sys.__stdin__.encoding if anything?

Assuming the previous bug gets fixed, then no :-)
msg186576 - (view) Author: Adam Bartoš (Drekin) * Date: 2013-04-11 18:40
I encountered it when I changed sys.stdin at runtime (I thought it was a supported feature) to affect the interactive console, see http://bugs.python.org/issue1602 .
msg186580 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-11 18:56
Ok, I guess it would need a new API (PyRun_Stdio()?) to run the interactive loop from sys.stdin, rather than from a fixed FILE*.
msg193815 - (view) Author: Adam Bartoš (Drekin) * Date: 2013-07-28 10:40
Is there any chance the API will be added and used by python.exe?
msg221176 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-06-21 12:29
Steve, another one to look at in the context of improving the Unicode handling situation at the Windows command prompt.
msg221179 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2014-06-21 15:15
Thanks Nick, but this has a pretty clear scope that may help the Unicode situation in cmd but doesn't directly relate to it.
msg223414 - (view) Author: Adam Bartoš (Drekin) * Date: 2014-07-18 15:09
There is still the serious inconsistency that the `sys.stdin` is not used for input by interactive loop but its encoding is. So if I replace `sys.stdin` with a custom object with its own `encoding` attribute, the standard interactive loop tries to use this encoding which may result in an exception on any input.
msg224312 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-07-30 15:16
Is this at all related to the use of GNU readline?
msg224313 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-07-30 15:29
Yes, it is. GNU readline will use a FILE*. Apparently, one can customize this behaviour, see http://cnswww.cns.cwru.edu/php/chet/readline/readline.html#SEC25

"""Variable: rl_getc_func_t * rl_getc_function
    If non-zero, Readline will call indirectly through this pointer to get a character from the input stream. By default, it is set to rl_getc, the default Readline character input function (see section 2.4.8 Character Input). In general, an application that sets rl_getc_function should consider setting rl_input_available_hook as well. """

It is not obvious how that interacts with special keys, e.g. arrows.
msg224330 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-07-30 17:49
I propose not to mess with GNU readline.  But that doesn't mean we can't try to fix this issue by detecting that sys.stdin has changed and use it if it isn't referring to the original process stdin.  It will be tricky however to make sure nothing breaks.

(The passage quoted from the GNU readline docs seems to imply that it's in non-blocking mode, and that the FD is a raw tty device, probably with echo off.  It will give escape sequences for e.g. arrow keys.)
msg224334 - (view) Author: Adam Bartoš (Drekin) * Date: 2014-07-30 18:04
My naive picture of ideal situation looks like this: When the interactive loop wants input, it just calls sys.stdin.readline, which delegates to sys.stdin.buffer.raw.readinto or .read, these can use GNU readline if available to get the data. May I ask, what's wrong with my picture?
msg224338 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-07-30 18:31
sys.stdin.readline() never delegates to GNU readline.  The REPL calls GNU readline directly.  There's clearly some condition that determines whether to call GNU readline or sys.stdin.readline, but it may not correspond to what you want (e.g. it may just test whether FD 0 is a tty).  Can you find in the CPython source code where this determination is made?
msg224396 - (view) Author: Adam Bartoš (Drekin) * Date: 2014-07-31 11:38
I looked to the sourcecode and found the following.

First, the codepath of how interactive loop gets its input follows:
Python/pythonrun.c:PyRun_InteractiveLoopFlags
Python/pythonrun.c:PyRun_InteractiveOneObject
Python/pythonrun.c:PyParser_ASTFromFileObject
Parse/parsetok.c:PyParser_ParseFileObject
Parse/parsetok.c:parsetok
Parse/tokenizer.c:PyTokenizer_Get
Parse/tokenizer.c:tok_get
Parse/tokenizer.c:tok_nextc
Parser/myreadline.c:PyOS_Readline OR Parse/tokenizer.c:decoding_fgets

PyRun_InteractiveOneObject tries to get the input encoding via sys.stdin.encoding. The encoding 

is then passed along and finally stored in a tokenizer object. It is tok_nextc function that gets 

the input. If the prompt is not NULL it gets the data via PyOS_Readline and uses the encoding to 

recode it to UTF-8. This is unfortunate since the encoding, which originates in 

sys.stdin.encoding, can have nothing to do with the data returned by PyOS_Readline. Αlso note 

that there is hardcoded stdin argument to PyOS_Readline, but it probably holds tok->fp == stdin 

so it doesn't matter.

If the prompt in tok_nextc is NULL then the data are gotten by decoding_fgets function, which 

either use fp_readl > tok->decoding_readline or Objects/fileobject.c:Py_UniversalNewlineFgets 

depending on tokenizer state. tok->decoding_readline handler may be set to io.open("isisOOO", 

fileno(tok->fp), …) (I have no idea what "isisOOO" might be).

PyOS_Readline function either calls PyOS_StdioReadline or the function pointed to by 

PyOS_ReadlineFunctionPointer which is by default again PyOS_StdioReadline, but usually is set to 

support GNU readline by the code in Modules/readline.c. PyOS_StdioReadline function uses my_fgets 

which calls fgets.

Now what input() function does. input is implemented as Python/bltinmodule.c:builtin_input. It 

tests if we are on tty by comparing sys.stdin.fileno() to fileno(stdin) and testing isatty. Note 

that this may not be enough – if I inslall a custom sys.stdin but let it have standard fileno 

then the test may succeed. If we are tty then PyOS_Readline is used (and again together with 

sys.std*.encoding), if we aren't then Objects/fileobject.c:PyFile_WriteObject > sys.stdout.write 

(for prompt) and :PyFile_GetLine > sys.stdin.readline are used.

As we can see, the API is rather FILE* based. The only places where sys.std* objects are used are 

in one branch of builtin_input, and when getting the encoding used in tokenizer. Could it be 

possible to configure the tokenizer so it uses sys.stdin.readline for input, and also rewrite 

builtin_input to allways use sys.std*? Then it would be sys.stdin.buffer.raw.read* methods' 

responsibility to decide whether to use GNU readline or whatever PyOS_Readline uses or something 

else (e.g. ReadConsoleW on Windows tty), and also check for Ctrl-C afterwards.
msg224397 - (view) Author: Adam Bartoš (Drekin) * Date: 2014-07-31 11:40
Sorry for formating in the previous message. Repeating…

I looked to the sourcecode and found the following.

First, the codepath of how interactive loop gets its input follows:
Python/pythonrun.c:PyRun_InteractiveLoopFlags
Python/pythonrun.c:PyRun_InteractiveOneObject
Python/pythonrun.c:PyParser_ASTFromFileObject
Parse/parsetok.c:PyParser_ParseFileObject
Parse/parsetok.c:parsetok
Parse/tokenizer.c:PyTokenizer_Get
Parse/tokenizer.c:tok_get
Parse/tokenizer.c:tok_nextc
Parser/myreadline.c:PyOS_Readline OR Parse/tokenizer.c:decoding_fgets

PyRun_InteractiveOneObject tries to get the input encoding via sys.stdin.encoding. The encoding is then passed along and finally stored in a tokenizer object. It is tok_nextc function that gets the input. If the prompt is not NULL it gets the data via PyOS_Readline and uses the encoding to recode it to UTF-8. This is unfortunate since the encoding, which originates in sys.stdin.encoding, can have nothing to do with the data returned by PyOS_Readline. Αlso note that there is hardcoded stdin argument to PyOS_Readline, but it probably holds tok->fp == stdin so it doesn't matter.

If the prompt in tok_nextc is NULL then the data are gotten by decoding_fgets function, which either use fp_readl > tok->decoding_readline or Objects/fileobject.c:Py_UniversalNewlineFgets depending on tokenizer state. tok->decoding_readline handler may be set to io.open("isisOOO", fileno(tok->fp), …) (I have no idea what "isisOOO" might be).

PyOS_Readline function either calls PyOS_StdioReadline or the function pointed to by PyOS_ReadlineFunctionPointer which is by default again PyOS_StdioReadline, but usually is set to support GNU readline by the code in Modules/readline.c. PyOS_StdioReadline function uses my_fgets which calls fgets.

Now what input() function does. input is implemented as Python/bltinmodule.c:builtin_input. It tests if we are on tty by comparing sys.stdin.fileno() to fileno(stdin) and testing isatty. Note that this may not be enough – if I inslall a custom sys.stdin but let it have standard fileno then the test may succeed. If we are tty then PyOS_Readline is used (and again together with sys.std*.encoding), if we aren't then Objects/fileobject.c:PyFile_WriteObject > sys.stdout.write (for prompt) and :PyFile_GetLine > sys.stdin.readline are used.

As we can see, the API is rather FILE* based. The only places where sys.std* objects are used are in one branch of builtin_input, and when getting the encoding used in tokenizer. Could it be possible to configure the tokenizer so it uses sys.stdin.readline for input, and also rewrite builtin_input to allways use sys.std*? Then it would be sys.stdin.buffer.raw.read* methods' responsibility to decide whether to use GNU readline or whatever PyOS_Readline uses or something else (e.g. ReadConsoleW on Windows tty), and also check for Ctrl-C afterwards.
msg226021 - (view) Author: Adam Bartoš (Drekin) * Date: 2014-08-28 12:45
I have found another example of where the current interaction between readline and Python core lead to confussion. It started with following report on my package: https://github.com/Drekin/win-unicode-console/issues/2 .

Basically, IPython interactive console on Windows uses pyreadline package, which provides GNU readline functionality. To get input from user, it just calls input(prompt). Input calls readline both for writing prompt and reading the input. It interprets ANSI control sequences so colored prompt is displayed rather than garbage. And when user types, things like auto-completion work. sys.stdin is not used at all and points to standard object.

One easily gets the impression that since sys.stdin is bypassed, changing it doesn't mind, but it actually does. With changed sys.stdin, input() now uses it rather than readline and ANSI control sequences result in a mess. See https://github.com/ipython/ipython/issues/17#issuecomment-53696541 .

I just think that it would be better when input() allways delegated to sys.stdin and print() to sys.stdout() and this was the standard way to interact with terminal. It would then be the responsibility of sys.std* objects to do right thing – to read from file, to delegate to readline, to directly interact with console some way, to interpret or not the ANSI control sequences.

Solving issues like #1602 or #18597 or adding readline support to Windows would then be just matter of providing the right sys.std* implementation.
msg226098 - (view) Author: Adam Bartoš (Drekin) * Date: 2014-08-29 22:57
I realized that the behavior I want can be achieved by setting PyOS_ReadlineFunctionPointer to a function calling sys.stdin.readline(). However I found another problem: Python REPL just doesn't work, when sys.stdin.encoding is UTF-16-LE. The tokenizer (Parser/tokenizer.c:tok_nextc) reads a line using PyOS_Readline and then tries to recode it to UTF-8. The problem is that PyOS_Readline returns just plain *char and strlen() is used to determine its length when decoding, which makes no sense on UTF-16-LE encoded line, since it's full of nullbytes.

Why does PyOS_Readline return *char, rather than Python string object? In the situation when PyOS_ReadlineFunctionPointer points to something producing Unicode string (e.g. my new approach to solve #1602 or pyreadline package), it must be encoded and cast to *char to return from PyOS_Readline, then it is decoded by the tokenizer and again encoded to UTF-8.
msg226099 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-08-29 23:00
> Why does PyOS_Readline return *char, rather than Python string object?

For historical reasons and now for compatibility: we can't change the hook's signature without breaking obvious applications, obviously.
If necessary, we could add a new hook that would take precedence over the old one if defined. Feel free to post a patch for that.
msg226100 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-08-29 23:01
> without breaking obvious applications

without breaking *existing* applications ;-)
msg226126 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-08-30 07:59
The Python parser works well with UTF8. If you know the encoding, decode
from your encoding and encode to UTF8. You should pass the UTF8 flag to the
parser.
msg226140 - (view) Author: Adam Bartoš (Drekin) * Date: 2014-08-30 15:30
Antoine Pitrou: I understand. It would be nice to have that new Python string based readline hook. Its default implementation could be to call PyOS_Readline and decode the bytes using sys.stdin.encoding (as the tokenizer currently does). Tokenizer then woudn't need to decode if it called the new hook.

Victor Stinner: I'm going to try the approach of reencoding my stream to UTF-8. So then my UTF-16-LE encoded stream is decoded, then encoded to UTF-8, interpreted as null-terminated *char, which is returned to the tokenizer, which again decodes it and encodes to UTF-8. I wonder if the last step could be short-circuited. What is this UTF8 flag to Python parser? I couldn't find any information.
msg226933 - (view) Author: Adam Bartoš (Drekin) * Date: 2014-09-15 19:04
I have found another problem. PyOS_Readline can be called from two different places – from Parser/tokenizer.c:tok_nextc (by REPL), which uses sys.stdin.encoding to encode prompt argument, and from Python/bltinmodule.c:builtin_input_impl (by input() function), which uses sys.stdout.encoding. So readline hook cannot be implemented correctly if sys.stdin and sys.stdout don't have the same encoding.

Either the tokenizer should have two encodings – one for input and one for output - or better no encoding at all and should use Python string based alternative to PyOS_Readline, which could be added.
msg234439 - (view) Author: Adam Bartoš (Drekin) * Date: 2015-01-21 15:45
Unfortunately, I have little or no experience with Python C code and I even don't have a C compiler installed so I cannot experiment. I'll just put my ideas how to solve this here.

• Add sys.__readlinehook__ attribute, which can be set to a function taking a prompt string and returing a line.
• Add C function PyOS_UnicodeReadline (possibly with a better name) which has the same signature as sys.__readlinehook__ (in contrast with the signature of PyOS_Readline). If sys.__readlinehook__ is set, call it; otherwise encode the prompt string using stdout encoding and delegate to PyOS_Readline and decode the string returned using stdin encoding.
• Change the tokenizer and the implementation of input() so it uses PyOS_UnicodeReadline rather than PyOS_Readline.

This would solve the problem that utf-16 encoded string cannot be given to the tokenizer and also would bypass the silent assumption that stdin and stdout encodings are the same. Also, readline hook could be easily set from Python code – no need for ctypes. The package pyreadline could use this. Also, the issue #1602 could be then solved just by changing sys.std* streams and providing a trivial sys.__readlinehook__ delegating to sys.stdout.write and sys.stdin.readline.
msg242173 - (view) Author: Adam Bartoš (Drekin) * Date: 2015-04-28 10:06
Note that under status quo PyOS_Readline is called from two places: the tokenizer during an interactive session and the builtin function input. The tokenizer passes promptstring encoded in sys.stdin.encoding while input() passes promtstring encoded in sys.stdout.encoding, so it is not possible to implement a readline hook correctly in the case the encodings are different. This might be considered a bug.
msg255549 - (view) Author: Adam Bartoš (Drekin) * Date: 2015-11-28 20:06
I've formulated a proposal regarding this issue: https://mail.python.org/pipermail/python-dev/2015-November/142246.html . Does it make sense?
History
Date User Action Args
2015-11-28 20:06:01Drekinsetmessages: + msg255549
2015-08-08 14:22:29eryksunlinkissue12854 superseder
2015-05-15 22:42:49hayposetnosy: - haypo
2015-05-12 05:09:39ncoghlanlinkissue22555 dependencies
2015-05-11 08:07:25ncoghlanlinkissue1602 dependencies
2015-05-10 14:48:39paul.mooresetnosy: + paul.moore
2015-04-28 10:06:35Drekinsetmessages: + msg242173
2015-01-21 15:45:24Drekinsetmessages: + msg234439
2014-09-15 19:04:15Drekinsetmessages: + msg226933
2014-08-30 15:30:25Drekinsetmessages: + msg226140
2014-08-30 07:59:55hayposetmessages: + msg226126
2014-08-29 23:01:02pitrousetmessages: + msg226100
2014-08-29 23:00:25pitrousetmessages: + msg226099
2014-08-29 22:57:47Drekinsetmessages: + msg226098
2014-08-28 12:45:18Drekinsetmessages: + msg226021
2014-07-31 11:40:34Drekinsetmessages: + msg224397
2014-07-31 11:38:42Drekinsetmessages: + msg224396
2014-07-30 18:31:53gvanrossumsetmessages: + msg224338
2014-07-30 18:04:22Drekinsetmessages: + msg224334
2014-07-30 17:49:57gvanrossumsetmessages: + msg224330
2014-07-30 15:29:16pitrousetmessages: + msg224313
2014-07-30 15:16:12gvanrossumsetnosy: + gvanrossum
messages: + msg224312
2014-07-18 15:09:34Drekinsetmessages: + msg223414
2014-06-21 15:15:37steve.dowersetnosy: brett.cannon, georg.brandl, ncoghlan, pitrou, haypo, benjamin.peterson, eric.araujo, tshepang, Drekin, steve.dower
messages: + msg221179
2014-06-21 12:29:27ncoghlansetnosy: + steve.dower
messages: + msg221176
2013-07-28 10:40:56Drekinsetmessages: + msg193815
2013-04-11 18:56:48pitrousettype: behavior -> enhancement
stage: needs patch
messages: + msg186580
versions: - Python 3.3
2013-04-11 18:40:26Drekinsetmessages: + msg186576
versions: - Python 2.7
2013-04-11 09:56:36pitrousetnosy: + brett.cannon, georg.brandl, ncoghlan, benjamin.peterson

messages: + msg186553
versions: + Python 2.7
2013-04-10 14:40:15ezio.melottisetnosy: + haypo
2013-04-06 23:18:07pitrousetassignee: pitrou ->
2013-04-06 10:21:46georg.brandlsetassignee: pitrou

nosy: + pitrou
2013-04-06 09:41:21Drekinsetmessages: + msg186121
2013-04-05 21:14:56tshepangsetnosy: + tshepang
2013-04-03 01:19:41eric.araujosetnosy: + eric.araujo
2013-04-02 16:59:14Drekincreate