Issue17620
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2013-04-02 16:59 by Drekin, last changed 2022-04-11 14:57 by admin.
Messages (29) | |||
---|---|---|---|
msg185848 - (view) | Author: Adam Bartoš (Drekin) * | Date: 2013-04-02 16:59 | |
The Python interactive console actually doesn't use sys.stdin but standard C stdin for input. Is there any reason for this? Why it then uses its encoding attribute? (Assigning sys.stdin something, that doesn't have encoding attribute freezes the interpreter.) If anything, wouldn't it make more sense if it used sys.__stdin__.encoding instead of sys.stdin? sys.stdin is intended to be set by user (it affects input() and code.inpterrupt() which tries to minic standard interactive console). |
|||
msg186121 - (view) | Author: Adam Bartoš (Drekin) * | Date: 2013-04-06 09:41 | |
Sorry for typos. • interactive console doesn't use sys.stdin for input, why? • it uses sys.stdin.encoding, shouldn't it rather use sys.__stdin__.encoding if anything? • input() and hence code.interact() uses sys.stdin |
|||
msg186553 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-04-11 09:56 | |
> • interactive console doesn't use sys.stdin for input, why? Modules/main.c calls PyRun_AnyFileFlags(stdin, "<stdin>", ...). At this point, sys.stdin *is* the same as C stdin by construction, so I'm not sure how you came to encounter the issue. However, it's also true that if you later redirect sys.stdin, it will be ignored and the original C stdin (as passed to PyRun_InteractiveLoopFlags) will continue to be used. On the other hand, the input() implementation has dedicated logic to find out whether sys.stdin is the same as C stdin. (by the way, the issue should also apply to 2.7) > • it uses sys.stdin.encoding, shouldn't it rather use sys.__stdin__.encoding if anything? Assuming the previous bug gets fixed, then no :-) |
|||
msg186576 - (view) | Author: Adam Bartoš (Drekin) * | Date: 2013-04-11 18:40 | |
I encountered it when I changed sys.stdin at runtime (I thought it was a supported feature) to affect the interactive console, see http://bugs.python.org/issue1602 . |
|||
msg186580 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-04-11 18:56 | |
Ok, I guess it would need a new API (PyRun_Stdio()?) to run the interactive loop from sys.stdin, rather than from a fixed FILE*. |
|||
msg193815 - (view) | Author: Adam Bartoš (Drekin) * | Date: 2013-07-28 10:40 | |
Is there any chance the API will be added and used by python.exe? |
|||
msg221176 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2014-06-21 12:29 | |
Steve, another one to look at in the context of improving the Unicode handling situation at the Windows command prompt. |
|||
msg221179 - (view) | Author: Steve Dower (steve.dower) * | Date: 2014-06-21 15:15 | |
Thanks Nick, but this has a pretty clear scope that may help the Unicode situation in cmd but doesn't directly relate to it. |
|||
msg223414 - (view) | Author: Adam Bartoš (Drekin) * | Date: 2014-07-18 15:09 | |
There is still the serious inconsistency that the `sys.stdin` is not used for input by interactive loop but its encoding is. So if I replace `sys.stdin` with a custom object with its own `encoding` attribute, the standard interactive loop tries to use this encoding which may result in an exception on any input. |
|||
msg224312 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2014-07-30 15:16 | |
Is this at all related to the use of GNU readline? |
|||
msg224313 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2014-07-30 15:29 | |
Yes, it is. GNU readline will use a FILE*. Apparently, one can customize this behaviour, see http://cnswww.cns.cwru.edu/php/chet/readline/readline.html#SEC25 """Variable: rl_getc_func_t * rl_getc_function If non-zero, Readline will call indirectly through this pointer to get a character from the input stream. By default, it is set to rl_getc, the default Readline character input function (see section 2.4.8 Character Input). In general, an application that sets rl_getc_function should consider setting rl_input_available_hook as well. """ It is not obvious how that interacts with special keys, e.g. arrows. |
|||
msg224330 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2014-07-30 17:49 | |
I propose not to mess with GNU readline. But that doesn't mean we can't try to fix this issue by detecting that sys.stdin has changed and use it if it isn't referring to the original process stdin. It will be tricky however to make sure nothing breaks. (The passage quoted from the GNU readline docs seems to imply that it's in non-blocking mode, and that the FD is a raw tty device, probably with echo off. It will give escape sequences for e.g. arrow keys.) |
|||
msg224334 - (view) | Author: Adam Bartoš (Drekin) * | Date: 2014-07-30 18:04 | |
My naive picture of ideal situation looks like this: When the interactive loop wants input, it just calls sys.stdin.readline, which delegates to sys.stdin.buffer.raw.readinto or .read, these can use GNU readline if available to get the data. May I ask, what's wrong with my picture? |
|||
msg224338 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2014-07-30 18:31 | |
sys.stdin.readline() never delegates to GNU readline. The REPL calls GNU readline directly. There's clearly some condition that determines whether to call GNU readline or sys.stdin.readline, but it may not correspond to what you want (e.g. it may just test whether FD 0 is a tty). Can you find in the CPython source code where this determination is made? |
|||
msg224396 - (view) | Author: Adam Bartoš (Drekin) * | Date: 2014-07-31 11:38 | |
I looked to the sourcecode and found the following. First, the codepath of how interactive loop gets its input follows: Python/pythonrun.c:PyRun_InteractiveLoopFlags Python/pythonrun.c:PyRun_InteractiveOneObject Python/pythonrun.c:PyParser_ASTFromFileObject Parse/parsetok.c:PyParser_ParseFileObject Parse/parsetok.c:parsetok Parse/tokenizer.c:PyTokenizer_Get Parse/tokenizer.c:tok_get Parse/tokenizer.c:tok_nextc Parser/myreadline.c:PyOS_Readline OR Parse/tokenizer.c:decoding_fgets PyRun_InteractiveOneObject tries to get the input encoding via sys.stdin.encoding. The encoding is then passed along and finally stored in a tokenizer object. It is tok_nextc function that gets the input. If the prompt is not NULL it gets the data via PyOS_Readline and uses the encoding to recode it to UTF-8. This is unfortunate since the encoding, which originates in sys.stdin.encoding, can have nothing to do with the data returned by PyOS_Readline. Αlso note that there is hardcoded stdin argument to PyOS_Readline, but it probably holds tok->fp == stdin so it doesn't matter. If the prompt in tok_nextc is NULL then the data are gotten by decoding_fgets function, which either use fp_readl > tok->decoding_readline or Objects/fileobject.c:Py_UniversalNewlineFgets depending on tokenizer state. tok->decoding_readline handler may be set to io.open("isisOOO", fileno(tok->fp), …) (I have no idea what "isisOOO" might be). PyOS_Readline function either calls PyOS_StdioReadline or the function pointed to by PyOS_ReadlineFunctionPointer which is by default again PyOS_StdioReadline, but usually is set to support GNU readline by the code in Modules/readline.c. PyOS_StdioReadline function uses my_fgets which calls fgets. Now what input() function does. input is implemented as Python/bltinmodule.c:builtin_input. It tests if we are on tty by comparing sys.stdin.fileno() to fileno(stdin) and testing isatty. Note that this may not be enough – if I inslall a custom sys.stdin but let it have standard fileno then the test may succeed. If we are tty then PyOS_Readline is used (and again together with sys.std*.encoding), if we aren't then Objects/fileobject.c:PyFile_WriteObject > sys.stdout.write (for prompt) and :PyFile_GetLine > sys.stdin.readline are used. As we can see, the API is rather FILE* based. The only places where sys.std* objects are used are in one branch of builtin_input, and when getting the encoding used in tokenizer. Could it be possible to configure the tokenizer so it uses sys.stdin.readline for input, and also rewrite builtin_input to allways use sys.std*? Then it would be sys.stdin.buffer.raw.read* methods' responsibility to decide whether to use GNU readline or whatever PyOS_Readline uses or something else (e.g. ReadConsoleW on Windows tty), and also check for Ctrl-C afterwards. |
|||
msg224397 - (view) | Author: Adam Bartoš (Drekin) * | Date: 2014-07-31 11:40 | |
Sorry for formating in the previous message. Repeating… I looked to the sourcecode and found the following. First, the codepath of how interactive loop gets its input follows: Python/pythonrun.c:PyRun_InteractiveLoopFlags Python/pythonrun.c:PyRun_InteractiveOneObject Python/pythonrun.c:PyParser_ASTFromFileObject Parse/parsetok.c:PyParser_ParseFileObject Parse/parsetok.c:parsetok Parse/tokenizer.c:PyTokenizer_Get Parse/tokenizer.c:tok_get Parse/tokenizer.c:tok_nextc Parser/myreadline.c:PyOS_Readline OR Parse/tokenizer.c:decoding_fgets PyRun_InteractiveOneObject tries to get the input encoding via sys.stdin.encoding. The encoding is then passed along and finally stored in a tokenizer object. It is tok_nextc function that gets the input. If the prompt is not NULL it gets the data via PyOS_Readline and uses the encoding to recode it to UTF-8. This is unfortunate since the encoding, which originates in sys.stdin.encoding, can have nothing to do with the data returned by PyOS_Readline. Αlso note that there is hardcoded stdin argument to PyOS_Readline, but it probably holds tok->fp == stdin so it doesn't matter. If the prompt in tok_nextc is NULL then the data are gotten by decoding_fgets function, which either use fp_readl > tok->decoding_readline or Objects/fileobject.c:Py_UniversalNewlineFgets depending on tokenizer state. tok->decoding_readline handler may be set to io.open("isisOOO", fileno(tok->fp), …) (I have no idea what "isisOOO" might be). PyOS_Readline function either calls PyOS_StdioReadline or the function pointed to by PyOS_ReadlineFunctionPointer which is by default again PyOS_StdioReadline, but usually is set to support GNU readline by the code in Modules/readline.c. PyOS_StdioReadline function uses my_fgets which calls fgets. Now what input() function does. input is implemented as Python/bltinmodule.c:builtin_input. It tests if we are on tty by comparing sys.stdin.fileno() to fileno(stdin) and testing isatty. Note that this may not be enough – if I inslall a custom sys.stdin but let it have standard fileno then the test may succeed. If we are tty then PyOS_Readline is used (and again together with sys.std*.encoding), if we aren't then Objects/fileobject.c:PyFile_WriteObject > sys.stdout.write (for prompt) and :PyFile_GetLine > sys.stdin.readline are used. As we can see, the API is rather FILE* based. The only places where sys.std* objects are used are in one branch of builtin_input, and when getting the encoding used in tokenizer. Could it be possible to configure the tokenizer so it uses sys.stdin.readline for input, and also rewrite builtin_input to allways use sys.std*? Then it would be sys.stdin.buffer.raw.read* methods' responsibility to decide whether to use GNU readline or whatever PyOS_Readline uses or something else (e.g. ReadConsoleW on Windows tty), and also check for Ctrl-C afterwards. |
|||
msg226021 - (view) | Author: Adam Bartoš (Drekin) * | Date: 2014-08-28 12:45 | |
I have found another example of where the current interaction between readline and Python core lead to confussion. It started with following report on my package: https://github.com/Drekin/win-unicode-console/issues/2 . Basically, IPython interactive console on Windows uses pyreadline package, which provides GNU readline functionality. To get input from user, it just calls input(prompt). Input calls readline both for writing prompt and reading the input. It interprets ANSI control sequences so colored prompt is displayed rather than garbage. And when user types, things like auto-completion work. sys.stdin is not used at all and points to standard object. One easily gets the impression that since sys.stdin is bypassed, changing it doesn't mind, but it actually does. With changed sys.stdin, input() now uses it rather than readline and ANSI control sequences result in a mess. See https://github.com/ipython/ipython/issues/17#issuecomment-53696541 . I just think that it would be better when input() allways delegated to sys.stdin and print() to sys.stdout() and this was the standard way to interact with terminal. It would then be the responsibility of sys.std* objects to do right thing – to read from file, to delegate to readline, to directly interact with console some way, to interpret or not the ANSI control sequences. Solving issues like #1602 or #18597 or adding readline support to Windows would then be just matter of providing the right sys.std* implementation. |
|||
msg226098 - (view) | Author: Adam Bartoš (Drekin) * | Date: 2014-08-29 22:57 | |
I realized that the behavior I want can be achieved by setting PyOS_ReadlineFunctionPointer to a function calling sys.stdin.readline(). However I found another problem: Python REPL just doesn't work, when sys.stdin.encoding is UTF-16-LE. The tokenizer (Parser/tokenizer.c:tok_nextc) reads a line using PyOS_Readline and then tries to recode it to UTF-8. The problem is that PyOS_Readline returns just plain *char and strlen() is used to determine its length when decoding, which makes no sense on UTF-16-LE encoded line, since it's full of nullbytes. Why does PyOS_Readline return *char, rather than Python string object? In the situation when PyOS_ReadlineFunctionPointer points to something producing Unicode string (e.g. my new approach to solve #1602 or pyreadline package), it must be encoded and cast to *char to return from PyOS_Readline, then it is decoded by the tokenizer and again encoded to UTF-8. |
|||
msg226099 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2014-08-29 23:00 | |
> Why does PyOS_Readline return *char, rather than Python string object? For historical reasons and now for compatibility: we can't change the hook's signature without breaking obvious applications, obviously. If necessary, we could add a new hook that would take precedence over the old one if defined. Feel free to post a patch for that. |
|||
msg226100 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2014-08-29 23:01 | |
> without breaking obvious applications without breaking *existing* applications ;-) |
|||
msg226126 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-08-30 07:59 | |
The Python parser works well with UTF8. If you know the encoding, decode from your encoding and encode to UTF8. You should pass the UTF8 flag to the parser. |
|||
msg226140 - (view) | Author: Adam Bartoš (Drekin) * | Date: 2014-08-30 15:30 | |
Antoine Pitrou: I understand. It would be nice to have that new Python string based readline hook. Its default implementation could be to call PyOS_Readline and decode the bytes using sys.stdin.encoding (as the tokenizer currently does). Tokenizer then woudn't need to decode if it called the new hook. Victor Stinner: I'm going to try the approach of reencoding my stream to UTF-8. So then my UTF-16-LE encoded stream is decoded, then encoded to UTF-8, interpreted as null-terminated *char, which is returned to the tokenizer, which again decodes it and encodes to UTF-8. I wonder if the last step could be short-circuited. What is this UTF8 flag to Python parser? I couldn't find any information. |
|||
msg226933 - (view) | Author: Adam Bartoš (Drekin) * | Date: 2014-09-15 19:04 | |
I have found another problem. PyOS_Readline can be called from two different places – from Parser/tokenizer.c:tok_nextc (by REPL), which uses sys.stdin.encoding to encode prompt argument, and from Python/bltinmodule.c:builtin_input_impl (by input() function), which uses sys.stdout.encoding. So readline hook cannot be implemented correctly if sys.stdin and sys.stdout don't have the same encoding. Either the tokenizer should have two encodings – one for input and one for output - or better no encoding at all and should use Python string based alternative to PyOS_Readline, which could be added. |
|||
msg234439 - (view) | Author: Adam Bartoš (Drekin) * | Date: 2015-01-21 15:45 | |
Unfortunately, I have little or no experience with Python C code and I even don't have a C compiler installed so I cannot experiment. I'll just put my ideas how to solve this here. • Add sys.__readlinehook__ attribute, which can be set to a function taking a prompt string and returing a line. • Add C function PyOS_UnicodeReadline (possibly with a better name) which has the same signature as sys.__readlinehook__ (in contrast with the signature of PyOS_Readline). If sys.__readlinehook__ is set, call it; otherwise encode the prompt string using stdout encoding and delegate to PyOS_Readline and decode the string returned using stdin encoding. • Change the tokenizer and the implementation of input() so it uses PyOS_UnicodeReadline rather than PyOS_Readline. This would solve the problem that utf-16 encoded string cannot be given to the tokenizer and also would bypass the silent assumption that stdin and stdout encodings are the same. Also, readline hook could be easily set from Python code – no need for ctypes. The package pyreadline could use this. Also, the issue #1602 could be then solved just by changing sys.std* streams and providing a trivial sys.__readlinehook__ delegating to sys.stdout.write and sys.stdin.readline. |
|||
msg242173 - (view) | Author: Adam Bartoš (Drekin) * | Date: 2015-04-28 10:06 | |
Note that under status quo PyOS_Readline is called from two places: the tokenizer during an interactive session and the builtin function input. The tokenizer passes promptstring encoded in sys.stdin.encoding while input() passes promtstring encoded in sys.stdout.encoding, so it is not possible to implement a readline hook correctly in the case the encodings are different. This might be considered a bug. |
|||
msg255549 - (view) | Author: Adam Bartoš (Drekin) * | Date: 2015-11-28 20:06 | |
I've formulated a proposal regarding this issue: https://mail.python.org/pipermail/python-dev/2015-November/142246.html . Does it make sense? |
|||
msg272681 - (view) | Author: Steve Dower (steve.dower) * | Date: 2016-08-14 16:30 | |
I'm working on this as part of my fix for issue1602. Not yet sure how this will come out - compatibility with GNU readline seems to be the biggest issue, as if we want to keep that then we can't allow embedded '\0' in the encoded text (i.e. UTF-16 cannot be used, which implies that sys.stdin.encoding cannot always be used directly). Adding __readlinehook__ as an alternative may be feasible, but a decent amount of work given how we call into the current readline implementation. Unfortunately, it looks like detecting when a readline hook has been added is going to involve significant changes to the tokenizer, which I really don't want to do. The easiest approach wrt issue1602 seems to be to special case the console by reencoding from utf-16-le to utf-8 and forcing the encoding in the tokenizer to utf-8 (instead of sys.stdin.encoding) in this case. I'll start here so that at least we can parse Unicode from the interactive prompt. |
|||
msg272683 - (view) | Author: Adam Bartoš (Drekin) * | Date: 2016-08-14 16:58 | |
> Unfortunately, it looks like detecting when a readline hook has been added is going to involve significant changes to the tokenizer, which I really don't want to do. We don't need to detect the presence of readline hook, it may be so that there is always a readline hook. Whenever we have interactive stdio, and so PyOS_Readline is called, the new proposed API PyIO_Readline would be called instead. This would return Unicode str Py_Object*, so the result can be directly returned by input() and should be somehow encoded afterwards by the tokenizer (these are the only consumers of PyOS_Readline). We may even leave the tokenizer alone and redefine PyOS_Readline as a wrapper of PyIO_Readline, having full control of the encoding process there. So it would be enough to set up the tokenizer with UTF-8 encoding despite the fact that sys.std*.encoding would be UTF-16. (I hope that if the tokenizer was desiged nowdays, it would operate on strings rather than bytes so there won't be any encoding problems at all.) Also, third parties would benefit from sys.readlinehook – at least win_unicode_console and pyreadline would just set the attribute rather than messing with ctypes. |
|||
msg275242 - (view) | Author: Steve Dower (steve.dower) * | Date: 2016-09-09 03:27 | |
Unassigning this. I meant to close it with another fix, but that would be wrong as we really ought to keep this open until we solve it properly. All I've done is make it use the right APIs on Windows, but we still don't handle it properly when we change stdin. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:43 | admin | set | github: 61820 |
2021-03-15 21:02:22 | eryksun | link | issue24829 dependencies |
2021-03-15 21:01:39 | eryksun | set | versions: + Python 3.8, Python 3.9, Python 3.10, - Python 3.6 |
2020-03-06 20:05:16 | brett.cannon | set | nosy:
- brett.cannon |
2018-06-09 11:03:02 | ncoghlan | unlink | issue22555 dependencies |
2016-09-09 16:42:53 | steve.dower | unlink | issue1602 dependencies |
2016-09-09 03:27:34 | steve.dower | set | assignee: steve.dower -> messages: + msg275242 |
2016-08-14 16:58:20 | Drekin | set | messages: + msg272683 |
2016-08-14 16:30:34 | steve.dower | set | assignee: steve.dower messages: + msg272681 versions: + Python 3.6, - Python 3.4 |
2015-11-28 20:06:01 | Drekin | set | messages: + msg255549 |
2015-08-08 14:22:29 | eryksun | link | issue12854 superseder |
2015-05-15 22:42:49 | vstinner | set | nosy:
- vstinner |
2015-05-12 05:09:39 | ncoghlan | link | issue22555 dependencies |
2015-05-11 08:07:25 | ncoghlan | link | issue1602 dependencies |
2015-05-10 14:48:39 | paul.moore | set | nosy:
+ paul.moore |
2015-04-28 10:06:35 | Drekin | set | messages: + msg242173 |
2015-01-21 15:45:24 | Drekin | set | messages: + msg234439 |
2014-09-15 19:04:15 | Drekin | set | messages: + msg226933 |
2014-08-30 15:30:25 | Drekin | set | messages: + msg226140 |
2014-08-30 07:59:55 | vstinner | set | messages: + msg226126 |
2014-08-29 23:01:02 | pitrou | set | messages: + msg226100 |
2014-08-29 23:00:25 | pitrou | set | messages: + msg226099 |
2014-08-29 22:57:47 | Drekin | set | messages: + msg226098 |
2014-08-28 12:45:18 | Drekin | set | messages: + msg226021 |
2014-07-31 11:40:34 | Drekin | set | messages: + msg224397 |
2014-07-31 11:38:42 | Drekin | set | messages: + msg224396 |
2014-07-30 18:31:53 | gvanrossum | set | messages: + msg224338 |
2014-07-30 18:04:22 | Drekin | set | messages: + msg224334 |
2014-07-30 17:49:57 | gvanrossum | set | messages: + msg224330 |
2014-07-30 15:29:16 | pitrou | set | messages: + msg224313 |
2014-07-30 15:16:12 | gvanrossum | set | nosy:
+ gvanrossum messages: + msg224312 |
2014-07-18 15:09:34 | Drekin | set | messages: + msg223414 |
2014-06-21 15:15:37 | steve.dower | set | nosy:
brett.cannon, georg.brandl, ncoghlan, pitrou, vstinner, benjamin.peterson, eric.araujo, tshepang, Drekin, steve.dower messages: + msg221179 |
2014-06-21 12:29:27 | ncoghlan | set | nosy:
+ steve.dower messages: + msg221176 |
2013-07-28 10:40:56 | Drekin | set | messages: + msg193815 |
2013-04-11 18:56:48 | pitrou | set | type: behavior -> enhancement stage: needs patch messages: + msg186580 versions: - Python 3.3 |
2013-04-11 18:40:26 | Drekin | set | messages:
+ msg186576 versions: - Python 2.7 |
2013-04-11 09:56:36 | pitrou | set | nosy:
+ brett.cannon, georg.brandl, ncoghlan, benjamin.peterson messages: + msg186553 versions: + Python 2.7 |
2013-04-10 14:40:15 | ezio.melotti | set | nosy:
+ vstinner |
2013-04-06 23:18:07 | pitrou | set | assignee: pitrou -> (no value) |
2013-04-06 10:21:46 | georg.brandl | set | assignee: pitrou nosy: + pitrou |
2013-04-06 09:41:21 | Drekin | set | messages: + msg186121 |
2013-04-05 21:14:56 | tshepang | set | nosy:
+ tshepang |
2013-04-03 01:19:41 | eric.araujo | set | nosy:
+ eric.araujo |
2013-04-02 16:59:14 | Drekin | create |