Issue37275
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2019-06-14 05:46 by methane, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Messages (4) | |||
---|---|---|---|
msg345551 - (view) | Author: Inada Naoki (methane) * | Date: 2019-06-14 05:46 | |
When stdout is redirected to file and cp65001 is used, stdout encoding is unexpectable: # Power Shell 6 (use cp65001 by default) PS C:¥> python3 -c "print('おはよう')" > ps.txt # cmd.exe C:¥> chcp 65001 C:¥> python3 -c "print('おはよう')" > cmd.txt Now, ps.txt is encoded by UTF-8, but cmd.txt is encoded by cp932 (ACP). This is because: * TextIOWrapper tries `os.device_encoding(1)` * `os.device_encoding(1)` use GetConsoleOutputCP() without checking stdout is console In the example above, a console is attached when python is called from Power Shell 6, but it is not attached when python is called from cmd.exe. I think using GetConsoleOutputCP() for non console is abusing. --- There is a relating issue: UTF-8 mode doesn't override stdin,stdout,stderr encoding when console is attached. On Unix, os.device_encoding() uses locale encoding and UTF-8 mode overrides locale encoding. Good. But on Windows, os.device_encoding() uses GetConsole(Output)CP(). UTF-8 mode doesn't override it. If we stop abusing GetConsoleOutputCP(), this issue is fixed automatically. But if we keep using GetConsoleOutputCP() for stdout which is not a console, UTF-8 mode should override it. |
|||
msg345602 - (view) | Author: Steve Dower (steve.dower) * | Date: 2019-06-14 15:54 | |
Isn't the point that device_encoding(FD) gets the encoding of the specified file? In this case stdout? It seems odd that chcp doesn't actually update the console code page here, as that is its entire purpose. Perhaps TextIOWrapper is actually getting ACP rather than the console encoding through some other path? |
|||
msg345620 - (view) | Author: Eryk Sun (eryksun) * | Date: 2019-06-14 17:43 | |
> # Power Shell 6 (use cp65001 by default) > PS C:¥> python3 -c "print('おはよう')" > ps.txt PowerShell standard I/O redirection is different from any shell I've ever used. In this case, it runs Python with StandardOutput set to a handle for a pipe instead of a handle for the file. It decodes Python's output using whatever encoding is configured for input and re-encodes it with whatever encoding is configured for output. To see what Python is actually writing, try using Start-Process with StandardOutput redirected to the file. For example: Start-Process -FilePath python3.exe -ArgumentList "-c `"print('おはよう')`"" -NoNewWindow -Wait -RedirectStandardOutput ps.txt > # cmd.exe > C:¥> chcp 65001 > C:¥> python3 -c "print('おはよう')" > cmd.txt CMD uses simple redirection, like every other shell I've ever used. It runs python3.exe with a handle for the file as its StandardOutput. So "cmd.txt" contains exactly what Python writes. > * TextIOWrapper tries `os.device_encoding(1)` > * `os.device_encoding(1)` use GetConsoleOutputCP() without checking stdout is console No, _Py_device_encoding returns None if the isatty(fd) is false, i.e. for a pipe or disk file. In this case, TextIOWrapper defaults to the encoding from locale.getpreferredencoding(). The current preferred encoding is the system ANSI codepage from GetACP(). Changing the default to UTF-8 would be disruptive. You can use UTF-8 mode (i.e. -X utf8). Or, to override just stdin, stdout, and stderr, set the environment variable "PYTHONIOENCODING=utf-8". > In the example above, a console is attached when python is called from > Power Shell 6, but it is not attached when python is called from > cmd.exe. In both cases the Python process inherits the console of the parent shell. The only way to run python.exe without a console is to use the CreateProcess creation flag DETACHED_PROCESS. > There is a relating issue: UTF-8 mode doesn't override > stdin,stdout,stderr encoding when console is attached. It works for me. For example, testing with stdout redirected to a pipe: C:\>python -c "import sys;print(sys.stdout.encoding)" | more cp1252 C:\>python -X utf8 -c "import sys;print(sys.stdout.encoding)" | more utf-8 |
|||
msg345646 - (view) | Author: Inada Naoki (methane) * | Date: 2019-06-14 23:22 | |
On Sat, Jun 15, 2019 at 2:43 AM Eryk Sun <report@bugs.python.org> wrote: > > Eryk Sun <eryksun@gmail.com> added the comment: > > > # Power Shell 6 (use cp65001 by default) > > PS C:¥> python3 -c "print('おはよう')" > ps.txt > > PowerShell standard I/O redirection is different from any shell I've ever used. In this case, it runs Python with StandardOutput set to a handle for a pipe instead of a handle for the file. It decodes Python's output using whatever encoding is configured for input and re-encodes it with whatever encoding is configured for output. I'm sorry, I mixed my assumption. I checked `os.device_encoding()` in cmd, but forgot to check it on Power Shell. All I said about Power Shell was just my assumption and it was wrong. And thank you for clarifying. I confirmed writing UTF-8 to pipe cause mojibake, because Power Shell decodes it using cp932. ``` PS C:\Users\inada-n> python3 -Xutf8 -c "import os,sys; print(os.device_encoding(1), sys.stdout.encoding, file=sys.stderr); print('こんにちは')" >x None utf-8 PS C:\Users\inada-n> type x ``` Hmm, how can I teach to Power Shell about python3 is using UTF-8 for stdout? It seems cmd.exe with chcp 65001 and PYTHONUTF8=1 is better than PowerShell when I want to use UTF-8 on Windows. Anyway, nothing is wrong about python. I just didn't understand PowerShell at all. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:16 | admin | set | github: 81456 |
2019-06-14 23:22:05 | methane | set | messages: + msg345646 |
2019-06-14 23:21:54 | methane | set | status: open -> closed resolution: not a bug stage: resolved |
2019-06-14 17:43:31 | eryksun | set | messages: + msg345620 |
2019-06-14 15:54:20 | steve.dower | set | messages: + msg345602 |
2019-06-14 06:30:26 | xtreak | set | nosy:
+ eryksun |
2019-06-14 05:46:30 | methane | create |