Author u36959
Recipients paul.moore, steve.dower, tim.golden, u36959, zach.ware
Date 2020-12-21.18:59:30
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>

first of all, I hope this was not already discussed (I searched the bugs but it might have been discussed elsewhere) and it's really a bug.

I've been struggling to understand today why a simple file redirection couldn't work properly today (encoding issues) and I think I finally understand the whole thing.

There's an IO codepage set on Windows consoles (`chcp` for cmd, `[Console]::InputEncoding; [Console]::OutputEncoding` for PowerShell ; chcp will not work on Powershell while it displays it set the CP), 850 for my locale.
When there's no redirection / piping, PyWindowsConsoleIO take cares of the encoding (utf-8 is seems), but when there's redirection or piping, encoding falls back to ANSI CP (from config_get_locale_encoding).

This behavior seems to be incorrect / breaking things, an example:
* (file encoded as utf-8)
#!/usr/bin/env python3
# -*- coding: utf-8


* using cmd:
# Test condition
Page de codes active : 850

# We're fine here
L:\Cop>py -c "import sys; print(sys.stdout.encoding)"

# Now with piping
L:\Cop>py -c "import sys; print(sys.stdout.encoding)" | more

L:\Cop>py | more
L:\Cop>py > lol && type lol

# If we adjust cmd CP, it's fine too:
L:\Cop>chcp 1252
Page de codes active : 1252
L:\Cop>py | more

* with pwsh:
PS L:\Cop> ([Console]::InputEncoding, [Console]::OutputEncoding) | select CodePage


# Fine without redirection
PS L:\Cop> py .\

# Here, write-host expect cp850
PS L:\Cop> py .\ | write-output
# Same with Out-file (used by ">")
PS L:\Cop> py .\ > lol; Get-Content lol

PS L:\Cop> py .\ | more

By reading some sources today to solve my issue, I found many solutions:
* in PS `[Console]::OutputEncoding = [Text.Utf8Encoding]::new($false); $env:PYTHONIOENCODING="utf8"` or `[Console]::OutputEncoding = [Text.Encoding]::GetEncoding(1252)`
* in CMD `chcp 65001 && set PYTHONIOENCODING=utf8` (but this seems to break more) or `chcp 1252`

But reading (and trusting) (, I understand Python should be using reading the current CP (from GetConsoleOutputCP, like or using the default OEM CP, and not assuming ANSI CP for stdio : 
> * the OEM code page for use by legacy console applications,
> * the ANSI code page for use by legacy GUI applications.

The init path I could trace : 
> init_sys_streams
>> create_stdio (
>>> open.raw :
>> fallback to ini_sys_stream encoding
> config_init_stdio_encoding
> config_get_locale_encoding
> GetACP()

Some test with GetConsoleCP:
L:\Cop>py -c "import os; print(os.device_encoding(0), os.device_encoding(1))" | more
cp850 None

L:\Cop>type nul | py -c "import os; print(os.device_encoding(0), os.device_encoding(1))"
None cp850

L:\Cop>type nul | py -c "import ctypes; print(ctypes.windll.kernel32.GetConsoleCP(), ctypes.windll.kernel32.GetConsoleOutputCP())"
850 850

L:\Cop>py -c "import ctypes; print(ctypes.windll.kernel32.GetConsoleCP(), ctypes.windll.kernel32.GetConsoleOutputCP())" | more
850 850

Some links / documentations, if useful:
* Maybe related:
* (will probably break things :) )

Please note I took time to write this issue as best as I could, I hope it won't be closed without explaining why the current behavior is normal (not that I suppose this will happen, I just don't know how people react here :) ).

Thanks a lot for Python, I really enjoy using it, 
Date User Action Args
2020-12-21 18:59:31u36959setrecipients: + u36959, paul.moore, tim.golden, zach.ware, steve.dower
2020-12-21 18:59:31u36959setmessageid: <>
2020-12-21 18:59:31u36959linkissue42707 messages
2020-12-21 18:59:30u36959create