Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDLE: make sys.stdxxx.encoding always be utf-8 #85324

Closed
terryjreedy opened this issue Jun 28, 2020 · 10 comments
Closed

IDLE: make sys.stdxxx.encoding always be utf-8 #85324

terryjreedy opened this issue Jun 28, 2020 · 10 comments
Assignees
Labels
3.8 only security fixes 3.9 only security fixes 3.10 only security fixes topic-IDLE type-bug An unexpected behavior, bug, or error

Comments

@terryjreedy
Copy link
Member

BPO 41152
Nosy @terryjreedy, @taleinat, @ned-deily, @serhiy-storchaka, @miss-islington
PRs
  • bpo-41152: Revise setting idlelib.iomenu.encoding #21206
  • bpo-41152: IDLE: always use UTF-8 for standard IO streams #21214
  • [3.9] bpo-41152: IDLE: always use UTF-8 for standard IO streams (GH-21214) #21225
  • [3.8] bpo-41152: IDLE: always use UTF-8 for standard IO streams (GH-21214) #21226
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/terryjreedy'
    closed_at = <Date 2020-06-30.12:23:13.900>
    created_at = <Date 2020-06-28.17:56:42.823>
    labels = ['expert-IDLE', 'type-bug', '3.8', '3.9', '3.10']
    title = 'IDLE: make sys.stdxxx.encoding always be utf-8'
    updated_at = <Date 2020-06-30.12:23:13.897>
    user = 'https://github.com/terryjreedy'

    bugs.python.org fields:

    activity = <Date 2020-06-30.12:23:13.897>
    actor = 'terry.reedy'
    assignee = 'terry.reedy'
    closed = True
    closed_date = <Date 2020-06-30.12:23:13.900>
    closer = 'terry.reedy'
    components = ['IDLE']
    creation = <Date 2020-06-28.17:56:42.823>
    creator = 'terry.reedy'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 41152
    keywords = ['patch']
    message_count = 10.0
    messages = ['372527', '372528', '372540', '372543', '372546', '372567', '372642', '372643', '372644', '372684']
    nosy_count = 5.0
    nosy_names = ['terry.reedy', 'taleinat', 'ned.deily', 'serhiy.storchaka', 'miss-islington']
    pr_nums = ['21206', '21214', '21225', '21226']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue41152'
    versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

    @terryjreedy
    Copy link
    Member Author

    When testing and on Windows, iomenu.encoding and .errors are set to utf-8 and surrogateescape*. When running otherwise, these are set with baroque code I don't understand. (Currently lines 31 to 61.)

    1. Combine the two conditional statements for testing and Windows.

    2. Ned, on my Catalina Macbook, the 30-line 'else' sections sets encoding, errors to 'utf-8', 'strict'. Should there ever be any other result on Mac we care about? If not, I would like to directly set them, as on Windows.

    3. Serhiy, does the 'baroque code' look right to you, for Linux (or *nix in general)?

    @terryjreedy terryjreedy added the 3.10 only security fixes label Jun 28, 2020
    @terryjreedy terryjreedy self-assigned this Jun 28, 2020
    @terryjreedy terryjreedy added topic-IDLE type-bug An unexpected behavior, bug, or error labels Jun 28, 2020
    @serhiy-storchaka
    Copy link
    Member

    I think it makes sense if we want to use the locale encoding for IO streams.

    But on other hand, it may be worth to drop support of locale-depending and configurable IO encoding and always use UTF-8. It is the IO encoding always used on Windows and the encoding of most locales on modern Linux and macOS.

    @terryjreedy
    Copy link
    Member Author

    The PR is for 1. The *nix code is a bit clearer without the Windows code in the middle.

    Is there a good reason why when encoding is 'utf-8', errors should be 'surrogateescape' on Windows and 'strict' elsewhere? Surrogateescape seems like it is made for when using ascii or other limited encoding.

    @terryjreedy
    Copy link
    Member Author

    The main use for the iomenu settings is for the socket-transport file classes, in run.py. The default encoding='utf-8' and errors='strict' are not used but are overriden with the iomenu values, or for stderr, 'backslashreplace'.

    Since user code can print any unicode, I think the defaults should used as is to transparently pass on and possibly display anything the user sends. Such a change should have no back-compatibility issues.

    Thinking more about errors. With utf-8 encoding of proper strings, there should never be any, but Python does allow construction of 'improper' strings with, say, single surrogates. The transport mechanism should never raise, so maybe surrogateescape or backslashreplace should always be used.

    What do you two think?

    Another use is for writing bytes to an OutputWindow, as with find-in-files. But I can think of no case where IDLE sends bytes to an OutputWindow. User files are all opened in an editor.

    I believe these are all the uses of 'iomenu.encoding' outside of iomenu. 'from iomenu ...' is never used.

    Within iomenu, the only use is part of reading an encoding cookie.
    # The only use of 'encoding' below is in _decode as initial value
    # of deprecated block asking user for encoding.
    I am not sure if this use can be reached now. Even if so, I believe this code duplicates code elsewhere in the stdlib that might be used.

    So maybe the encoding calculation is not really needed.

    @terryjreedy
    Copy link
    Member Author

    I got the 'within iomenu' part a bit wrong. To open a file to edit, iomenu.IOBinging('IO').open tells filelist to use IO.loadfile. This reads bytes 'so that we can handle end-of-line convention ourselves'. (I suspect that this predates 3.x and might not be needed any more.) IO.loadfile calls IO._decode which looks for a utf-8 BOM, looks for a coding cookie, tries ascii (not needed in 3.x), tries utf-8, and asks the user for an encoding, using iomenu.encoding as the initial value in the query box. This box is deprecated in the sense that for 3.x, a python file should either be utf-8 or have an encoding cookie.

    @serhiy-storchaka
    Copy link
    Member

    PR 21214 sets the encoding of stdin/stdout/stderr to 'utf-8'. Error handler is set to 'surrogatepass' or 'surrogateescape' because these error handlers are used when convert strings between Python and Tcl. It guarantees that reading from stdin and writing back to stdout will never fail, even if you paste garbage from clipboard. Printing file paths will never fail too.

    @terryjreedy
    Copy link
    Member Author

    New changeset 2515a28 by Serhiy Storchaka in branch 'master':
    bpo-41152: IDLE: always use UTF-8 for standard IO streams (GH-21214)
    2515a28

    @miss-islington
    Copy link
    Contributor

    New changeset 01638ce by Miss Islington (bot) in branch '3.9':
    bpo-41152: IDLE: always use UTF-8 for standard IO streams (GH-21214)
    01638ce

    @miss-islington
    Copy link
    Contributor

    New changeset 00fd04b by Miss Islington (bot) in branch '3.8':
    bpo-41152: IDLE: always use UTF-8 for standard IO streams (GH-21214)
    00fd04b

    @terryjreedy
    Copy link
    Member Author

    Thank you for this and the next patch.

    @terryjreedy terryjreedy added 3.8 only security fixes 3.9 only security fixes labels Jun 30, 2020
    @terryjreedy terryjreedy changed the title IDLE: revise setting of iomenu.encoding and .errors IDLE: make sys.stdxxx.encoding always be utf-8 Jun 30, 2020
    @terryjreedy terryjreedy added 3.8 only security fixes 3.9 only security fixes labels Jun 30, 2020
    @terryjreedy terryjreedy changed the title IDLE: revise setting of iomenu.encoding and .errors IDLE: make sys.stdxxx.encoding always be utf-8 Jun 30, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes 3.9 only security fixes 3.10 only security fixes topic-IDLE type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants