New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
input() with Unicode prompt produces mojibake on Windows #72520
Comments
In my setting (Python 3.6b1 on Windows), trying to prompt a non-ASCII character via input() results in mojibake. This is related to the recent fix of bpo-1602 and so is Windows-specific. >>> input("α")
╬▒ The result corresponds to print("α".encode("utf-8").decode("cp852")). That cp852 the default terminal encoding in my locale. |
Same output with cp437. |
This is a regression from 3.5.2, where input("α") displays "α". |
This may force bpo-17620 into 3.6 - we really ought to be getting and using sys.stdin and sys.stderr in PyOS_StdioReadline() rather than going directly to the raw streams. The problem here is that we're still using fprintf to output the prompt, even though we know (assume) the input is utf-8. I haven't looked closely at how safely we can use Python objects from this code, except to see that it's not obviously safe, but we should really figure out how to deal in Python str rather than C char* for the default readline implementation (and then only fall back on the GNU protocol when someone asks for it). The faster fix here would be to decode the prompt from utf-8 to utf-16-le in PyOS_StdioReadline and then write it using a wide-char output function. |
When I pointed this issue out in code reviews, I assumed you would add the relatively simple fix to decode the prompt and call WriteConsoleW. The long-term fix in bpo-17620 has to be worked out with cross-platform support, and ISTM that it can wait for 3.7. Off topic: I just noticed that you're not calling PyOS_InputHook in the new PyOS_StdioReadline code. Tkinter registers this function pointer to call its EventHook. Do you want a separate issue for this, or is there a reason its was omitted? |
I'm sure Steve already has this covered, but FWIW here's a patch to call WriteConsoleW. Here's the result with the patch applied: >>> sys.ps1 = '»»» '
»»» input("αβψδ: ")
αβψδ: spam
'spam' and with interactive stdin and stdout/stderr redirected to a file:
>type out.txt
Python 3.6.0b1+ (default, Oct 7 2016, 23:47:58)
[MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> αβψδ: 'spam'
>>> If it can't write the prompt for some reason (e.g. out of memory, decoding fails, WriteConsole fails), it doesn't fall back on fprintf to write the prompt. Should it? This should also get a test that calls ReadConsoleOutputCharacter to verify that the correct prompt is written. |
New changeset faf5493e6f61 by Steve Dower in branch '3.6': New changeset cb62e921bd06 by Steve Dower in branch 'default': |
New changeset 63ceadf8410f by Steve Dower in branch '3.6': New changeset d76c8f9ea787 by Steve Dower in branch 'default': |
I made some minor tweaks to the patch (no need for strlen() - passing -1 works equivalently), but otherwise it's exactly what I would have done so I committed it. We currently have no tests to check which characters are written to a console output buffer. bpo-28217 was tracking those, but considering how little code we have on top of output I don't think it's worth blocking anything on automating those tests. |
MultibyteToWideChar includes the trailing NUL when it gets the string length, so the WriteConsoleW call needs to use (wlen - 1). |
Not sure how I missed it originally, but that extra 1 char is actually very important: Python 3.6.0b2 (v3.6.0b2:b9fadc7d1c3f, Oct 10 2016, 20:36:51) [MSC v.1900 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.ps1='> '
> sys The extra space is because of that. Really ought to fix this before the next beta. |
I forgot to include the link to the python-list thread where this came up: https://mail.python.org/pipermail/python-list/2016-October/715428.html |
New changeset 6b46c3deea2c by Steve Dower in branch '3.6': New changeset 44d15ba67d2e by Steve Dower in branch 'default': |
Misc/NEWS
so that it is managed by towncrier #552Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: