classification
Title: Documentation for sys.stdout encoding does not reflect the new Windows behavior in Python 3.6+
Type: behavior Stage: patch review
Components: Documentation, Windows Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, eryksun, lys.nikolaou, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: easy, patch

Created on 2017-05-20 08:58 by paul.moore, last changed 2018-12-02 22:06 by lys.nikolaou.

Pull Requests
URL Status Linked Edit
PR 10264 open lys.nikolaou, 2018-10-31 20:01
Messages (9)
msg294020 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2017-05-20 08:58
The documentation for the encoding of sys.stdin/out/err (see https://docs.python.org/3.6/library/sys.html#sys.stdout) does not reflect the change in Python 3.6 on Windows to use the console Unicode APIs, and hence UTF-8 for the encoding.
msg294046 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-05-20 18:37
How about this?

    The character encoding is platform-dependent. Non-Windows 
    platforms use the locale encoding (see 
    locale.getpreferredencoding()).

    On Windows, UTF-8 is used for console character 
    devices (i.e. CON, CONIN$, and CONOUT$). However, this
    can be overridden to use the console as a generic 
    character device by setting the environment variable 
    PYTHONLEGACYWINDOWSSTDIO before starting Python. Non-
    character devices such as disk files and pipes use the 
    system locale encoding (i.e. the ANSI codepage). 
    Character devices such as NUL (i.e. isatty() returns 
    True) use the value of the console input and output
    codepages at startup, respectively for stdin and
    stdout/stderr. This defaults to the system locale
    encoding if the process is not initially attached to a
    console.

    Under all platforms, you can override this value by
    setting the PYTHONIOENCODING environment variable before
    starting Python. However, for the Windows console, this
    only applies when PYTHONLEGACYWINDOWSSTDIO is also set.
msg294061 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2017-05-20 23:59
Looks great, though I wonder whether the rest of the paragraph after "Character devices such as NUL" would be more confusing than it's worth?

Can you create a PR? (And having links to the environment variable docs would be great.)
msg294063 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-05-21 00:53
I discussed character devices mostly because of the NUL device. It could be surprising that Python dies on an encoding error when output is redirected to NUL:

    C:\>chcp 1252
    Active code page: 1252

    C:\>python -c "print('\u20ac')" > nul
    C:\>chcp 437
    Active code page: 437

    C:\>python -c "print('\u20ac')" > nul
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Program Files\Python36\lib\encodings\cp437.py", line 19, in encode
        return codecs.charmap_encode(input,self.errors,encoding_map)[0]
    UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in position 0:
    character maps to <undefined>

Unix has a similar problem:

    $ LANG=C python3 -c 'print("\u20ac")' > /dev/null
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in position 0:
    ordinal not in range(128)

Except /dev/null isn't a TTY. Also, it's rare nowadays for the locale encoding in Unix systems to be something other than UTF-8.

It would be useful if we special-cased NUL like we do for the Windows console, but just to make it use the backslashreplace error handler. Unfortunately I don't know how to do that without calling NtQueryObject, for which ObjectNameInformation (1) can't be used because it's undocumented [1]. GetFinalPathNameByHandle also can't be used because it requires file-system devices. As a crude workaround, we could lump together all non-console character devices (i.e. isatty() but not a console). That will affect serial devices, too, but I can't think of a good reason someone would redirect stdout or stderr to a COM port.

[1]: https://msdn.microsoft.com/en-us/library/ff550964
msg328766 - (view) Author: Lysandros Nikolaou (lys.nikolaou) * Date: 2018-10-28 22:24
Shall I create a PR for this?
msg328798 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-10-29 10:34
Please do!
msg330764 - (view) Author: Lysandros Nikolaou (lys.nikolaou) * Date: 2018-11-30 09:38
Ping.
msg330765 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2018-11-30 09:58
The proposed wording seems a bit over-complex to me. Maybe the following re-wording would be easier to understand?

    The character encoding is platform-dependent. Non-Windows 
    platforms use the locale encoding (see 
    locale.getpreferredencoding()).

    On Windows, UTF-8 is used for the console device.  Non-character
    devices such as disk files and pipes use the system locale
    encoding (i.e. the ANSI codepage).  Non-console character
    devices such as NUL (i.e. where isatty() returns True) use the
    value of the console input and output codepages at startup,
    respectively for stdin and stdout/stderr. This defaults to the
    system locale encoding if the process is not initially attached
    to a console.

    The special behaviour of the console can be overridden
    by setting the environment variable PYTHONLEGACYWINDOWSSTDIO
    before starting Python. In that case, the console codepages are
    used as for any other character device.

    Under all platforms, you can override this value by
    setting the PYTHONIOENCODING environment variable before
    starting Python. However, for the Windows console, this
    only applies when PYTHONLEGACYWINDOWSSTDIO is also set.
msg330901 - (view) Author: Lysandros Nikolaou (lys.nikolaou) * Date: 2018-12-02 22:06
I updated the PR with the new wording by Paul, since I found it easier to understand as well.
History
Date User Action Args
2018-12-02 22:06:32lys.nikolaousetmessages: + msg330901
2018-11-30 09:58:49paul.mooresetmessages: + msg330765
2018-11-30 09:38:07lys.nikolaousetmessages: + msg330764
2018-10-31 20:01:49lys.nikolaousetkeywords: + patch
stage: patch review
pull_requests: + pull_request9575
2018-10-29 10:34:17steve.dowersetmessages: + msg328798
versions: + Python 3.8
2018-10-28 22:24:59lys.nikolaousetnosy: + lys.nikolaou
messages: + msg328766
2017-05-21 00:53:24eryksunsetmessages: + msg294063
2017-05-20 23:59:14steve.dowersetmessages: + msg294061
2017-05-20 18:37:01eryksunsetnosy: + eryksun
messages: + msg294046
2017-05-20 08:58:49paul.moorecreate