classification
Title: Documentation for sys.stdout encoding does not reflect the new Windows behavior in Python 3.6+
Type: behavior Stage:
Components: Documentation, Windows Versions: Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, eryksun, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: easy

Created on 2017-05-20 08:58 by paul.moore, last changed 2017-05-21 00:53 by eryksun.

Messages (4)
msg294020 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2017-05-20 08:58
The documentation for the encoding of sys.stdin/out/err (see https://docs.python.org/3.6/library/sys.html#sys.stdout) does not reflect the change in Python 3.6 on Windows to use the console Unicode APIs, and hence UTF-8 for the encoding.
msg294046 - (view) Author: Eryk Sun (eryksun) * Date: 2017-05-20 18:37
How about this?

    The character encoding is platform-dependent. Non-Windows 
    platforms use the locale encoding (see 
    locale.getpreferredencoding()).

    On Windows, UTF-8 is used for console character 
    devices (i.e. CON, CONIN$, and CONOUT$). However, this
    can be overridden to use the console as a generic 
    character device by setting the environment variable 
    PYTHONLEGACYWINDOWSSTDIO before starting Python. Non-
    character devices such as disk files and pipes use the 
    system locale encoding (i.e. the ANSI codepage). 
    Character devices such as NUL (i.e. isatty() returns 
    True) use the value of the console input and output
    codepages at startup, respectively for stdin and
    stdout/stderr. This defaults to the system locale
    encoding if the process is not initially attached to a
    console.

    Under all platforms, you can override this value by
    setting the PYTHONIOENCODING environment variable before
    starting Python. However, for the Windows console, this
    only applies when PYTHONLEGACYWINDOWSSTDIO is also set.
msg294061 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2017-05-20 23:59
Looks great, though I wonder whether the rest of the paragraph after "Character devices such as NUL" would be more confusing than it's worth?

Can you create a PR? (And having links to the environment variable docs would be great.)
msg294063 - (view) Author: Eryk Sun (eryksun) * Date: 2017-05-21 00:53
I discussed character devices mostly because of the NUL device. It could be surprising that Python dies on an encoding error when output is redirected to NUL:

    C:\>chcp 1252
    Active code page: 1252

    C:\>python -c "print('\u20ac')" > nul
    C:\>chcp 437
    Active code page: 437

    C:\>python -c "print('\u20ac')" > nul
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Program Files\Python36\lib\encodings\cp437.py", line 19, in encode
        return codecs.charmap_encode(input,self.errors,encoding_map)[0]
    UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in position 0:
    character maps to <undefined>

Unix has a similar problem:

    $ LANG=C python3 -c 'print("\u20ac")' > /dev/null
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in position 0:
    ordinal not in range(128)

Except /dev/null isn't a TTY. Also, it's rare nowadays for the locale encoding in Unix systems to be something other than UTF-8.

It would be useful if we special-cased NUL like we do for the Windows console, but just to make it use the backslashreplace error handler. Unfortunately I don't know how to do that without calling NtQueryObject, for which ObjectNameInformation (1) can't be used because it's undocumented [1]. GetFinalPathNameByHandle also can't be used because it requires file-system devices. As a crude workaround, we could lump together all non-console character devices (i.e. isatty() but not a console). That will affect serial devices, too, but I can't think of a good reason someone would redirect stdout or stderr to a COM port.

[1]: https://msdn.microsoft.com/en-us/library/ff550964
History
Date User Action Args
2017-05-21 00:53:24eryksunsetmessages: + msg294063
2017-05-20 23:59:14steve.dowersetmessages: + msg294061
2017-05-20 18:37:01eryksunsetnosy: + eryksun
messages: + msg294046
2017-05-20 08:58:49paul.moorecreate