Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for sys.stdout encoding does not reflect the new Windows behavior in Python 3.6+ #74595

Closed
pfmoore opened this issue May 20, 2017 · 12 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes docs Documentation in the Doc dir easy OS-windows type-bug An unexpected behavior, bug, or error

Comments

@pfmoore
Copy link
Member

pfmoore commented May 20, 2017

BPO 30410
Nosy @pfmoore, @tjguk, @zware, @eryksun, @zooba, @Mariatta, @lysnikolaou, @miss-islington
PRs
  • bpo-30410: Documentation of sys.stdin/out/err update to reflect change in 3.6 #10264
  • [3.7] bpo-30410: Documentation of sys.stdin/out/err update to reflect change in 3.6 (GH-10264) #11860
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-02-14.23:36:54.782>
    created_at = <Date 2017-05-20.08:58:49.633>
    labels = ['easy', 'type-bug', '3.8', '3.7', 'OS-windows', 'docs']
    title = 'Documentation for sys.stdout encoding does not reflect the new Windows behavior in Python 3.6+'
    updated_at = <Date 2019-02-14.23:45:23.357>
    user = 'https://github.com/pfmoore'

    bugs.python.org fields:

    activity = <Date 2019-02-14.23:45:23.357>
    actor = 'miss-islington'
    assignee = 'docs@python'
    closed = True
    closed_date = <Date 2019-02-14.23:36:54.782>
    closer = 'Mariatta'
    components = ['Documentation', 'Windows']
    creation = <Date 2017-05-20.08:58:49.633>
    creator = 'paul.moore'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 30410
    keywords = ['patch', 'easy']
    message_count = 12.0
    messages = ['294020', '294046', '294061', '294063', '328766', '328798', '330764', '330765', '330901', '335573', '335574', '335575']
    nosy_count = 9.0
    nosy_names = ['paul.moore', 'tim.golden', 'docs@python', 'zach.ware', 'eryksun', 'steve.dower', 'Mariatta', 'lys.nikolaou', 'miss-islington']
    pr_nums = ['10264', '11860']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue30410'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']

    @pfmoore
    Copy link
    Member Author

    pfmoore commented May 20, 2017

    The documentation for the encoding of sys.stdin/out/err (see https://docs.python.org/3.6/library/sys.html#sys.stdout) does not reflect the change in Python 3.6 on Windows to use the console Unicode APIs, and hence UTF-8 for the encoding.

    @pfmoore pfmoore added the 3.7 (EOL) end of life label May 20, 2017
    @pfmoore pfmoore added docs Documentation in the Doc dir OS-windows easy type-bug An unexpected behavior, bug, or error labels May 20, 2017
    @eryksun
    Copy link
    Contributor

    eryksun commented May 20, 2017

    How about this?

    The character encoding is platform-dependent. Non-Windows 
    platforms use the locale encoding (see 
    locale.getpreferredencoding()).
    
    On Windows, UTF-8 is used for console character 
    devices (i.e. CON, CONIN$, and CONOUT$). However, this
    can be overridden to use the console as a generic 
    character device by setting the environment variable 
    PYTHONLEGACYWINDOWSSTDIO before starting Python. Non-
    character devices such as disk files and pipes use the 
    system locale encoding (i.e. the ANSI codepage). 
    Character devices such as NUL (i.e. isatty() returns 
    True) use the value of the console input and output
    codepages at startup, respectively for stdin and
    stdout/stderr. This defaults to the system locale
    encoding if the process is not initially attached to a
    console.
    
    Under all platforms, you can override this value by
    setting the PYTHONIOENCODING environment variable before
    starting Python. However, for the Windows console, this
    only applies when PYTHONLEGACYWINDOWSSTDIO is also set.
    

    @zooba
    Copy link
    Member

    zooba commented May 20, 2017

    Looks great, though I wonder whether the rest of the paragraph after "Character devices such as NUL" would be more confusing than it's worth?

    Can you create a PR? (And having links to the environment variable docs would be great.)

    @eryksun
    Copy link
    Contributor

    eryksun commented May 21, 2017

    I discussed character devices mostly because of the NUL device. It could be surprising that Python dies on an encoding error when output is redirected to NUL:

    C:\>chcp 1252
    Active code page: 1252
    
    C:\>python -c "print('\u20ac')" > nul
    C:\>chcp 437
    Active code page: 437
    
        C:\>python -c "print('\u20ac')" > nul
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "C:\Program Files\Python36\lib\encodings\cp437.py", line 19, in encode
            return codecs.charmap_encode(input,self.errors,encoding_map)[0]
        UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in position 0:
        character maps to <undefined>

    Unix has a similar problem:

        $ LANG=C python3 -c 'print("\u20ac")' > /dev/null
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
        UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in position 0:
        ordinal not in range(128)

    Except /dev/null isn't a TTY. Also, it's rare nowadays for the locale encoding in Unix systems to be something other than UTF-8.

    It would be useful if we special-cased NUL like we do for the Windows console, but just to make it use the backslashreplace error handler. Unfortunately I don't know how to do that without calling NtQueryObject, for which ObjectNameInformation (1) can't be used because it's undocumented 1. GetFinalPathNameByHandle also can't be used because it requires file-system devices. As a crude workaround, we could lump together all non-console character devices (i.e. isatty() but not a console). That will affect serial devices, too, but I can't think of a good reason someone would redirect stdout or stderr to a COM port.

    @lysnikolaou
    Copy link
    Contributor

    Shall I create a PR for this?

    @zooba
    Copy link
    Member

    zooba commented Oct 29, 2018

    Please do!

    @zooba zooba added the 3.8 only security fixes label Oct 29, 2018
    @lysnikolaou
    Copy link
    Contributor

    Ping.

    @pfmoore
    Copy link
    Member Author

    pfmoore commented Nov 30, 2018

    The proposed wording seems a bit over-complex to me. Maybe the following re-wording would be easier to understand?

    The character encoding is platform-dependent. Non-Windows 
    platforms use the locale encoding (see 
    locale.getpreferredencoding()).
    
    On Windows, UTF-8 is used for the console device.  Non-character
    devices such as disk files and pipes use the system locale
    encoding (i.e. the ANSI codepage).  Non-console character
    devices such as NUL (i.e. where isatty() returns True) use the
    value of the console input and output codepages at startup,
    respectively for stdin and stdout/stderr. This defaults to the
    system locale encoding if the process is not initially attached
    to a console.
    
    The special behaviour of the console can be overridden
    by setting the environment variable PYTHONLEGACYWINDOWSSTDIO
    before starting Python. In that case, the console codepages are
    used as for any other character device.
    
    Under all platforms, you can override this value by
    setting the PYTHONIOENCODING environment variable before
    starting Python. However, for the Windows console, this
    only applies when PYTHONLEGACYWINDOWSSTDIO is also set.
    

    @lysnikolaou
    Copy link
    Contributor

    I updated the PR with the new wording by Paul, since I found it easier to understand as well.

    @miss-islington
    Copy link
    Contributor

    New changeset 5723263 by Miss Islington (bot) (Lysandros Nikolaou) in branch 'master':
    bpo-30410: Documentation of sys.stdin/out/err update to reflect change in 3.6 (GH-10264)
    5723263

    @Mariatta
    Copy link
    Member

    Fixed in 3.8 and 3.7.
    Thanks!

    @miss-islington
    Copy link
    Contributor

    New changeset b8bcec3 by Miss Islington (bot) in branch '3.7':
    bpo-30410: Documentation of sys.stdin/out/err update to reflect change in 3.6 (GH-10264)
    b8bcec3

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes docs Documentation in the Doc dir easy OS-windows type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants