classification
Title: IDLE: make sys.stdxxx.encoding always be utf-8
Type: behavior Stage: resolved
Components: IDLE Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: terry.reedy Nosy List: miss-islington, ned.deily, serhiy.storchaka, taleinat, terry.reedy
Priority: normal Keywords: patch

Created on 2020-06-28 17:56 by terry.reedy, last changed 2020-06-30 12:23 by terry.reedy. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 21206 closed terry.reedy, 2020-06-29 04:06
PR 21214 merged serhiy.storchaka, 2020-06-29 11:52
PR 21225 merged miss-islington, 2020-06-30 00:18
PR 21226 merged miss-islington, 2020-06-30 00:18
Messages (10)
msg372527 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-06-28 17:56
When testing and on Windows, iomenu.encoding and .errors are set to utf-8 and surrogateescape*.  When running otherwise, these are set with baroque code I don't understand.  (Currently lines 31 to 61.)

1. Combine the two conditional statements for testing and Windows.

2. Ned, on my Catalina Macbook, the 30-line 'else' sections sets encoding, errors to 'utf-8', 'strict'.  Should there ever be any other result on Mac we care about?  If not, I would like to directly set them, as on Windows.

3. Serhiy, does the 'baroque code' look right to you, for Linux (or *nix in general)?
msg372528 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-06-28 18:10
I think it makes sense if we want to use the locale encoding for IO streams.

But on other hand, it may be worth to drop support of locale-depending and configurable IO encoding and always use UTF-8. It is the IO encoding always used on Windows and the encoding of most locales on modern Linux and macOS.
msg372540 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-06-29 04:11
The PR is for 1.  The *nix code is a bit clearer without the Windows code in the middle.

Is there a good reason why when encoding is 'utf-8', errors should be 'surrogateescape' on Windows and 'strict' elsewhere?  Surrogateescape seems like it is made for when using ascii or other limited encoding.
msg372543 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-06-29 05:02
The main use for the iomenu settings is for the socket-transport file classes, in run.py.  The default encoding='utf-8' and errors='strict' are not used but are overriden with the iomenu values, or for stderr, 'backslashreplace'.

Since user code can print any unicode, I think the defaults should used as is to transparently pass on and possibly display anything the user sends.  Such a change should have no back-compatibility issues.

Thinking more about errors.  With utf-8 encoding of proper strings, there should never be any, but Python does allow construction of 'improper' strings with, say, single surrogates.  The transport mechanism should never raise, so maybe surrogateescape or backslashreplace should always be used. 

What do you two think?

Another use is for writing bytes to an OutputWindow, as with find-in-files.  But I can think of no case where IDLE sends bytes to an OutputWindow.  User files are all opened in an editor.

I believe these are all the uses of 'iomenu.encoding' outside of iomenu.  'from iomenu ...' is never used.

Within iomenu, the only use is part of reading an encoding cookie.
    # The only use of 'encoding' below is in _decode as initial value
    # of deprecated block asking user for encoding.
I am not sure if this use can be reached now.  Even if so, I believe this code duplicates code elsewhere in the stdlib that might be used.

So maybe the encoding calculation is not really needed.
msg372546 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-06-29 05:36
I got the 'within iomenu' part a bit wrong.  To open a file to edit, iomenu.IOBinging('IO').open tells filelist to use IO.loadfile.  This reads bytes 'so that we can handle end-of-line convention ourselves'.  (I suspect that this predates 3.x and might not be needed any more.)  IO.loadfile calls IO._decode which looks for a utf-8 BOM, looks for a coding cookie, tries ascii (not needed in 3.x), tries utf-8, and asks the user for an encoding, using iomenu.encoding as the initial value in the query box.  This box is deprecated in the sense that for 3.x, a python file should either be utf-8 or have an encoding cookie.
msg372567 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-06-29 12:15
PR 21214 sets the encoding of stdin/stdout/stderr to 'utf-8'. Error handler is set to 'surrogatepass' or 'surrogateescape' because these error handlers are used when convert strings between Python and Tcl. It guarantees that reading from stdin and writing back to stdout will never fail, even if you paste garbage from clipboard. Printing file paths will never fail too.
msg372642 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-06-30 00:18
New changeset 2515a28230b1a011205f30263da6b01c6bd167a3 by Serhiy Storchaka in branch 'master':
bpo-41152: IDLE: always use UTF-8 for standard IO streams (GH-21214)
https://github.com/python/cpython/commit/2515a28230b1a011205f30263da6b01c6bd167a3
msg372643 - (view) Author: miss-islington (miss-islington) Date: 2020-06-30 00:36
New changeset 01638ce51a63afe5af3f778e7403702703bb41b9 by Miss Islington (bot) in branch '3.9':
bpo-41152: IDLE: always use UTF-8 for standard IO streams (GH-21214)
https://github.com/python/cpython/commit/01638ce51a63afe5af3f778e7403702703bb41b9
msg372644 - (view) Author: miss-islington (miss-islington) Date: 2020-06-30 00:39
New changeset 00fd04b9b7537c473c3f9396a861868b8ddd3bb2 by Miss Islington (bot) in branch '3.8':
bpo-41152: IDLE: always use UTF-8 for standard IO streams (GH-21214)
https://github.com/python/cpython/commit/00fd04b9b7537c473c3f9396a861868b8ddd3bb2
msg372684 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-06-30 12:23
Thank you for this and the next patch.
History
Date User Action Args
2020-06-30 12:23:13terry.reedysetstatus: open -> closed
versions: + Python 3.8, Python 3.9
title: IDLE: revise setting of iomenu.encoding and .errors -> IDLE: make sys.stdxxx.encoding always be utf-8
messages: + msg372684

resolution: fixed
stage: patch review -> resolved
2020-06-30 00:39:05miss-islingtonsetmessages: + msg372644
2020-06-30 00:36:54miss-islingtonsetmessages: + msg372643
2020-06-30 00:18:55miss-islingtonsetpull_requests: + pull_request20379
2020-06-30 00:18:47miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request20378
2020-06-30 00:18:39terry.reedysetmessages: + msg372642
2020-06-29 12:15:20serhiy.storchakasetmessages: + msg372567
2020-06-29 11:52:29serhiy.storchakasetstage: needs patch -> patch review
pull_requests: + pull_request20367
2020-06-29 05:36:21terry.reedysetmessages: + msg372546
2020-06-29 05:02:27terry.reedysetmessages: + msg372543
2020-06-29 04:11:40terry.reedysetmessages: + msg372540
stage: patch review -> needs patch
2020-06-29 04:06:52terry.reedysetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request20361
2020-06-28 18:10:28serhiy.storchakasetmessages: + msg372528
2020-06-28 17:56:42terry.reedycreate