classification
Title: input function truncates prompt by NULL byte
Type: behavior Stage: patch review
Components: Interpreter Core Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: kushal.das Nosy List: eryksun, kushal.das, serhiy.storchaka, terry.reedy, Костя Чолак
Priority: normal Keywords:

Created on 2017-05-22 15:09 by Костя Чолак, last changed 2021-03-04 19:49 by eryksun.

Pull Requests
URL Status Linked Edit
PR 1738 open kushal.das, 2017-05-22 23:51
Messages (7)
msg294153 - (view) Author: Костя Чолак (Костя Чолак) Date: 2017-05-22 15:09
input('some\x00 text')
will prompt for `some` instead of `some text`
msg294176 - (view) Author: Kushal Das (kushal.das) * (Python committer) Date: 2017-05-22 19:58
I am picking this up.
msg294179 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-05-22 20:24
PyOS_Readline uses a null-terminated prompt string. So input() needs to raise a ValueError if the prompt contains null characters, e.g. if PyBytes_GET_SIZE(po) != strlen(promptstr).
msg294589 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-05-27 14:12
Experiments running 3.6.1 on Windows in console:

>python -c "print('some\0 text')
some  text  # \0 printed as ' ', as determined by
# >python -c "for c in 'some  text': print(ord(c))" printing 32 32

>python -c "input('some\0 text')
some

In IDLE, both print full string with actual null byte.  As a result, attempting the ord() test above generates "SyntaxError: source code string cannot contain null bytes".  Cutting the output from IDLE and pasting here (FireFox) results in a truncated 'some'.

Conclusions: 
1. Python is emitting \0 to stdout.  That is what python should do when asked to, as documented.
2. Currently, if one wants to prompt with strings containing \0, use IDLE or a similar GUI-based shell.
3. input() should *not* reject prompts with \0
4. If \0 is a problem for a particular stdout, its handler could raise ValueError, or replace \0 with r'\0', or replace \0 with ' ' (as with print to Widows console.
5. When running in Windows console, the prompt output part of input(prompt) should treat \0 the same as print(prompt) does.  I am surprised it does not, as "input(prompt)" has been described as shorthand for "print(prompt, end=''); input()"
msg294595 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-05-27 14:31
As near as I can tell, the patch introduces a regression.  The doc for input(prompt) says "If the prompt argument is present, it is written to standard output without a trailing newline."  The handling of the prompt is up to stdout.write and thence physical displays devices.  Python does not specify what happens with any particular character, including \r, \t, and astral chars.

C:\Users\Terry>python -c "import sys; sys.stdout.write('some\0 text\n')
some  text

The bug on Windows is that input('some\0 text\n') does not produce the same display.
msg294612 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-05-27 21:33
The solution to raise a ValueError for the PyOS_Readline case won't change the behavior of input() in cases where the prompt is written to sys.stdout.

As to copying what print() would do, I guess it's possible to fake it for common platforms. You'd have to process the prompt string to emulate the platform behavior, which commonly means either ignoring '\0' or replacing it with space. However, this can't emulate raw console or terminal modes that actually write '\0'. 

A Linux terminal ignores '\0' characters written to it, e.g. "some\0text" prints as "sometext". I didn't look into modifying the terminal settings, but it wouldn't surprise me if different behavior is possible. 

The Windows console replaces '\0' with a space when the screen buffer is in the default cooked mode (i.e. ENABLE_PROCESSED_OUTPUT). For raw output it writes '\0' to the screen buffer, which displays like a space, but copy and paste won't work as expected.

> I am surprised it does not, as "input(prompt)" has been 
> described as shorthand for "print(prompt, end=''); input()"

That's how the fallback case works, such as in IDLE. In the readline case, it doesn't even write the prompt to C stdout. It gets written to stderr...

For Windows, if stderr is a console screen buffer, it transcodes the prompt from UTF-8 to UTF-16 and writes it via WriteConsoleW. Otherwise it calls fprintf to write it to stderr.
msg294630 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-28 07:56
If stdin and stdout are attached to a terminal, input() uses the readline library (if it is available) for displaying the prompt and reading user input. This is the limitation of the readline library and Python C API that this works with null-terminated C strings. You also can't enter the null character.

UnicodeEncodingError is raised if the prompt contains characters not encodable with the stdout encoding. Consider ValueError raised if the prompt contains the null character as similar restriction.
History
Date User Action Args
2021-03-04 19:49:59eryksunsetcomponents: + Interpreter Core, - IO
versions: + Python 3.8, Python 3.9, Python 3.10, - Python 2.7, Python 3.6, Python 3.7
2017-05-28 07:56:05serhiy.storchakasetmessages: + msg294630
2017-05-27 21:33:23eryksunsetmessages: + msg294612
2017-05-27 14:31:30terry.reedysetmessages: + msg294595
2017-05-27 14:12:10terry.reedysetnosy: + terry.reedy
messages: + msg294589
2017-05-23 06:48:01serhiy.storchakasetnosy: + serhiy.storchaka
stage: patch review

versions: - Python 3.3, Python 3.4, Python 3.5
2017-05-22 23:51:26kushal.dassetpull_requests: + pull_request1828
2017-05-22 20:24:41eryksunsetnosy: + eryksun
messages: + msg294179
2017-05-22 19:58:00kushal.dassetassignee: kushal.das
messages: + msg294176
2017-05-22 19:07:56kushal.dassetnosy: + kushal.das
2017-05-22 15:09:04Костя Чолакcreate