classification
Title: GetConsole(Output)CP is used even when stdin/stdout is redirected
Type: behavior Stage: resolved
Components: Windows Versions: Python 3.9, Python 3.8
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, methane, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords:

Created on 2019-06-14 05:46 by methane, last changed 2019-06-14 23:22 by methane. This issue is now closed.

Messages (4)
msg345551 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-06-14 05:46
When stdout is redirected to file and cp65001 is used, stdout encoding is unexpectable:

  # Power Shell 6 (use cp65001 by default)
  PS C:¥> python3 -c "print('おはよう')" > ps.txt

  # cmd.exe
  C:¥> chcp 65001
  C:¥> python3 -c "print('おはよう')" > cmd.txt

Now, ps.txt is encoded by UTF-8, but cmd.txt is encoded by cp932 (ACP).


This is because:

* TextIOWrapper tries `os.device_encoding(1)`
* `os.device_encoding(1)` use GetConsoleOutputCP() without checking stdout is console

In the example above, a console is attached when python is called from Power Shell 6, but it is not attached when python is called from cmd.exe.

I think using GetConsoleOutputCP() for non console is abusing.

---

There is a relating issue: UTF-8 mode doesn't override stdin,stdout,stderr encoding when console is attached.

On Unix, os.device_encoding() uses locale encoding and UTF-8 mode overrides locale encoding.  Good.

But on Windows, os.device_encoding() uses GetConsole(Output)CP().  UTF-8 mode doesn't override it.

If we stop abusing GetConsoleOutputCP(), this issue is fixed automatically.
But if we keep using GetConsoleOutputCP() for stdout which is not a console, UTF-8 mode should override it.
msg345602 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-06-14 15:54
Isn't the point that device_encoding(FD) gets the encoding of the specified file? In this case stdout?

It seems odd that chcp doesn't actually update the console code page here, as that is its entire purpose. Perhaps TextIOWrapper is actually getting ACP rather than the console encoding through some other path?
msg345620 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-06-14 17:43
> # Power Shell 6 (use cp65001 by default)
> PS C:¥> python3 -c "print('おはよう')" > ps.txt

PowerShell standard I/O redirection is different from any shell I've ever used. In this case, it runs Python with StandardOutput set to a handle for a pipe instead of a handle for the file. It decodes Python's output using whatever encoding is configured for input and re-encodes it with whatever encoding is configured for output. 

To see what Python is actually writing, try using Start-Process with StandardOutput redirected to the file. For example:

    Start-Process -FilePath python3.exe -ArgumentList "-c `"print('おはよう')`"" -NoNewWindow -Wait -RedirectStandardOutput ps.txt

> # cmd.exe
> C:¥> chcp 65001
> C:¥> python3 -c "print('おはよう')" > cmd.txt

CMD uses simple redirection, like every other shell I've ever used. It runs python3.exe with a handle for the file as its StandardOutput. So "cmd.txt" contains exactly what Python writes.

> * TextIOWrapper tries `os.device_encoding(1)`
> * `os.device_encoding(1)` use GetConsoleOutputCP() without checking stdout is console

No, _Py_device_encoding returns None if the isatty(fd) is false, i.e. for a pipe or disk file. In this case, TextIOWrapper defaults to the encoding from locale.getpreferredencoding().

The current preferred encoding is the system ANSI codepage from GetACP(). Changing the default to UTF-8 would be disruptive. You can use UTF-8 mode (i.e. -X utf8). Or, to override just stdin, stdout, and stderr, set the environment variable "PYTHONIOENCODING=utf-8".

> In the example above, a console is attached when python is called from 
> Power Shell 6, but it is not attached when python is called from 
> cmd.exe.

In both cases the Python process inherits the console of the parent shell. The only way to run python.exe without a console is to use the CreateProcess creation flag DETACHED_PROCESS.

> There is a relating issue: UTF-8 mode doesn't override 
> stdin,stdout,stderr encoding when console is attached.

It works for me. For example, testing with stdout redirected to a pipe:

    C:\>python -c "import sys;print(sys.stdout.encoding)" | more
    cp1252

    C:\>python -X utf8 -c "import sys;print(sys.stdout.encoding)" | more
    utf-8
msg345646 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-06-14 23:22
On Sat, Jun 15, 2019 at 2:43 AM Eryk Sun <report@bugs.python.org> wrote:
>
> Eryk Sun <eryksun@gmail.com> added the comment:
>
> > # Power Shell 6 (use cp65001 by default)
> > PS C:¥> python3 -c "print('おはよう')" > ps.txt
>
> PowerShell standard I/O redirection is different from any shell I've ever used. In this case, it runs Python with StandardOutput set to a handle for a pipe instead of a handle for the file. It decodes Python's output using whatever encoding is configured for input and re-encodes it with whatever encoding is configured for output.

I'm sorry,  I mixed my assumption.  I checked `os.device_encoding()` in cmd,
but forgot to check it on Power Shell.  All I said about Power Shell was just
my assumption and it was wrong.  And thank you for clarifying.

I confirmed writing UTF-8 to pipe cause mojibake, because Power Shell decodes
it using cp932.

```
PS C:\Users\inada-n> python3 -Xutf8 -c "import os,sys;
print(os.device_encoding(1), sys.stdout.encoding, file=sys.stderr);
print('こんにちは')" >x
None utf-8
PS C:\Users\inada-n> type x
```

Hmm, how can I teach to Power Shell about python3 is using
UTF-8 for stdout?
It seems cmd.exe with chcp 65001 and PYTHONUTF8=1 is better
than PowerShell when I want to use UTF-8 on Windows.

Anyway, nothing is wrong about python.  I just didn't understand
PowerShell at all.
History
Date User Action Args
2019-06-14 23:22:05methanesetmessages: + msg345646
2019-06-14 23:21:54methanesetstatus: open -> closed
resolution: not a bug
stage: resolved
2019-06-14 17:43:31eryksunsetmessages: + msg345620
2019-06-14 15:54:20steve.dowersetmessages: + msg345602
2019-06-14 06:30:26xtreaksetnosy: + eryksun
2019-06-14 05:46:30methanecreate