This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Documentation for sys.stdout encoding does not reflect the new Windows behavior in Python 3.6+
Type: behavior Stage: resolved
Components: Documentation, Windows Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Mariatta, docs@python, eryksun, lys.nikolaou, miss-islington, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: easy, patch

Created on 2017-05-20 08:58 by paul.moore, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 10264 merged lys.nikolaou, 2018-10-31 20:01
PR 11860 merged miss-islington, 2019-02-14 23:35
Messages (12)
msg294020 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2017-05-20 08:58
The documentation for the encoding of sys.stdin/out/err (see https://docs.python.org/3.6/library/sys.html#sys.stdout) does not reflect the change in Python 3.6 on Windows to use the console Unicode APIs, and hence UTF-8 for the encoding.
msg294046 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-05-20 18:37
How about this?

    The character encoding is platform-dependent. Non-Windows 
    platforms use the locale encoding (see 
    locale.getpreferredencoding()).

    On Windows, UTF-8 is used for console character 
    devices (i.e. CON, CONIN$, and CONOUT$). However, this
    can be overridden to use the console as a generic 
    character device by setting the environment variable 
    PYTHONLEGACYWINDOWSSTDIO before starting Python. Non-
    character devices such as disk files and pipes use the 
    system locale encoding (i.e. the ANSI codepage). 
    Character devices such as NUL (i.e. isatty() returns 
    True) use the value of the console input and output
    codepages at startup, respectively for stdin and
    stdout/stderr. This defaults to the system locale
    encoding if the process is not initially attached to a
    console.

    Under all platforms, you can override this value by
    setting the PYTHONIOENCODING environment variable before
    starting Python. However, for the Windows console, this
    only applies when PYTHONLEGACYWINDOWSSTDIO is also set.
msg294061 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2017-05-20 23:59
Looks great, though I wonder whether the rest of the paragraph after "Character devices such as NUL" would be more confusing than it's worth?

Can you create a PR? (And having links to the environment variable docs would be great.)
msg294063 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-05-21 00:53
I discussed character devices mostly because of the NUL device. It could be surprising that Python dies on an encoding error when output is redirected to NUL:

    C:\>chcp 1252
    Active code page: 1252

    C:\>python -c "print('\u20ac')" > nul
    C:\>chcp 437
    Active code page: 437

    C:\>python -c "print('\u20ac')" > nul
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Program Files\Python36\lib\encodings\cp437.py", line 19, in encode
        return codecs.charmap_encode(input,self.errors,encoding_map)[0]
    UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in position 0:
    character maps to <undefined>

Unix has a similar problem:

    $ LANG=C python3 -c 'print("\u20ac")' > /dev/null
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in position 0:
    ordinal not in range(128)

Except /dev/null isn't a TTY. Also, it's rare nowadays for the locale encoding in Unix systems to be something other than UTF-8.

It would be useful if we special-cased NUL like we do for the Windows console, but just to make it use the backslashreplace error handler. Unfortunately I don't know how to do that without calling NtQueryObject, for which ObjectNameInformation (1) can't be used because it's undocumented [1]. GetFinalPathNameByHandle also can't be used because it requires file-system devices. As a crude workaround, we could lump together all non-console character devices (i.e. isatty() but not a console). That will affect serial devices, too, but I can't think of a good reason someone would redirect stdout or stderr to a COM port.

[1]: https://msdn.microsoft.com/en-us/library/ff550964
msg328766 - (view) Author: Lysandros Nikolaou (lys.nikolaou) * (Python committer) Date: 2018-10-28 22:24
Shall I create a PR for this?
msg328798 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-10-29 10:34
Please do!
msg330764 - (view) Author: Lysandros Nikolaou (lys.nikolaou) * (Python committer) Date: 2018-11-30 09:38
Ping.
msg330765 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2018-11-30 09:58
The proposed wording seems a bit over-complex to me. Maybe the following re-wording would be easier to understand?

    The character encoding is platform-dependent. Non-Windows 
    platforms use the locale encoding (see 
    locale.getpreferredencoding()).

    On Windows, UTF-8 is used for the console device.  Non-character
    devices such as disk files and pipes use the system locale
    encoding (i.e. the ANSI codepage).  Non-console character
    devices such as NUL (i.e. where isatty() returns True) use the
    value of the console input and output codepages at startup,
    respectively for stdin and stdout/stderr. This defaults to the
    system locale encoding if the process is not initially attached
    to a console.

    The special behaviour of the console can be overridden
    by setting the environment variable PYTHONLEGACYWINDOWSSTDIO
    before starting Python. In that case, the console codepages are
    used as for any other character device.

    Under all platforms, you can override this value by
    setting the PYTHONIOENCODING environment variable before
    starting Python. However, for the Windows console, this
    only applies when PYTHONLEGACYWINDOWSSTDIO is also set.
msg330901 - (view) Author: Lysandros Nikolaou (lys.nikolaou) * (Python committer) Date: 2018-12-02 22:06
I updated the PR with the new wording by Paul, since I found it easier to understand as well.
msg335573 - (view) Author: miss-islington (miss-islington) Date: 2019-02-14 23:35
New changeset 5723263a3a39a05b6a2f567e0e7771792e6e2f5b by Miss Islington (bot) (Lysandros Nikolaou) in branch 'master':
bpo-30410: Documentation of sys.stdin/out/err update to reflect change in 3.6 (GH-10264)
https://github.com/python/cpython/commit/5723263a3a39a05b6a2f567e0e7771792e6e2f5b
msg335574 - (view) Author: Mariatta (Mariatta) * (Python committer) Date: 2019-02-14 23:36
Fixed in 3.8 and 3.7.
Thanks!
msg335575 - (view) Author: miss-islington (miss-islington) Date: 2019-02-14 23:45
New changeset b8bcec35e01cac018f6ccfc8323d35886340efe0 by Miss Islington (bot) in branch '3.7':
bpo-30410: Documentation of sys.stdin/out/err update to reflect change in 3.6 (GH-10264)
https://github.com/python/cpython/commit/b8bcec35e01cac018f6ccfc8323d35886340efe0
History
Date User Action Args
2022-04-11 14:58:46adminsetgithub: 74595
2019-02-14 23:45:23miss-islingtonsetmessages: + msg335575
2019-02-14 23:36:54Mariattasetstatus: open -> closed

nosy: + Mariatta
messages: + msg335574

resolution: fixed
stage: patch review -> resolved
2019-02-14 23:35:48miss-islingtonsetpull_requests: + pull_request11893
2019-02-14 23:35:28miss-islingtonsetnosy: + miss-islington
messages: + msg335573
2018-12-02 22:06:32lys.nikolaousetmessages: + msg330901
2018-11-30 09:58:49paul.mooresetmessages: + msg330765
2018-11-30 09:38:07lys.nikolaousetmessages: + msg330764
2018-10-31 20:01:49lys.nikolaousetkeywords: + patch
stage: patch review
pull_requests: + pull_request9575
2018-10-29 10:34:17steve.dowersetmessages: + msg328798
versions: + Python 3.8
2018-10-28 22:24:59lys.nikolaousetnosy: + lys.nikolaou
messages: + msg328766
2017-05-21 00:53:24eryksunsetmessages: + msg294063
2017-05-20 23:59:14steve.dowersetmessages: + msg294061
2017-05-20 18:37:01eryksunsetnosy: + eryksun
messages: + msg294046
2017-05-20 08:58:49paul.moorecreate