classification
Title: PEP 597: Implemente encoding="locale" option and EncodingWarning
Type: Stage: resolved
Components: IO Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, lemburg, methane, vstinner
Priority: normal Keywords: patch

Created on 2021-03-16 04:19 by methane, last changed 2021-04-06 03:46 by methane. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 19481 merged methane, 2021-03-16 04:25
PR 25103 merged methane, 2021-03-31 04:21
PR 25107 closed methane, 2021-03-31 05:53
PR 25108 merged methane, 2021-03-31 06:24
PR 25146 merged methane, 2021-04-02 07:25
Messages (15)
msg388809 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-03-16 04:19
PEP 597 is accepted.
msg389092 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-19 14:16
I replied to INADA-san message on bpo-43552:
https://bugs.python.org/issue43552#msg389091

> I had forgot to consider about UTF-8 mode while finishing PEP 597. If possible, I want to ignore UTF-8 mode when `encoding="locale"` is specified from Python 3.10.

In this case, the PEP 597 statement that open(filename, encoding="locale") is the same  than open(filename) is wrong. It would mean that users which got the UTF-8 Mode enabled (implicitly or explicitly) would switch to a legacy encoding like latin1 rather than using the UTF-8 encoding, if they add encoding="locale" to their open() calls?

Since the final goal is to move everybody towards to UTF-8, I'm not sure how it's a good thing.
msg389094 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-03-19 14:28
> Since the final goal is to move everybody towards to UTF-8, I'm not sure how it's a good thing.

The final goal (the third motivation of the pep 597) is changing the default encoding (i.e. encoding used when it is not specified) to UTF-8.

But forcing people to use UTF-8 even they specify locale encoding explicitly is not the goal. That's why I want to ignore UTF-8 mode when `encoding="locale"` is specified.

I think this is almost Windows-only issue, and "mbcs" can be used in Windows already. It is documented in https://docs.python.org/3/using/windows.html#utf-8-mode

So this is not a blocker. Just my preference.
msg389095 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-19 14:31
I see different cases when open() is called with no encoding argument:

(A) User wants to use UTF-8: add encoding="utf-8"

(B) Windows user wants to use the ANSI code page of their computer, local file not intended to be shared with other computers: add encoding="mbcs". This makes the code specific to Windows ("mbcs" alias doesn't exist on Unix).

(C) User wants to use the locale encoding and is fine with the UTF-8 Mode: add encoding=getpreferredencoding(False)

(D) Unix user wants to use the locale encoding but not the UTF-8 Mode: encoding=get_current_locale_encoding() (function proposed in bpo-43552) or nl_langinfo(CODESET) (should work on any Python version). I don't know if nl_langinfo(CODESET) is available on Windows.

(E) User has no idea of what they are doing and don't understand anything to Unicode: please trust us and specify explicitly UTF-8 :-)

Apart the encoding="utf-8" case, I understand that they are two main complex cases:

(1) "UTF-8" in the UTF-8 Mode, or the locale encoding
(2) Always use the locale encoding, ignore the UTF-8 Mode

What I don't expect is the current behavior, before PEP 597. Who uses open() without specifying an encoding but always want to use the locale encoding? (case 2) So this use case is already broken when the UTF-8 Mode is enabled explicitly?
msg389099 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-03-19 14:57
> (1) "UTF-8" in the UTF-8 Mode, or the locale encoding
> (2) Always use the locale encoding, ignore the UTF-8 Mode
>
> What I don't expect is the current behavior, before PEP 597. Who uses open() without specifying an encoding but always want to use the locale encoding? (case 2) So this use case is already broken when the UTF-8 Mode is enabled explicitly?

Yes, it is broken already.  So they can not use UTF-8 mode.

If `encoding="locale"` ignore UTF-8 mode, it save the use case. They can add `encoding="locale"` where they need to use locale/GetACP encoding and enable UTF-8 mode.

That's why it is important If we enable UTF-8 mode by default in the future.
msg389656 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-03-29 03:28
New changeset 4827483f47906fecee6b5d9097df2a69a293a85c by Inada Naoki in branch 'master':
bpo-43510: Implement PEP 597 opt-in EncodingWarning. (GH-19481)
https://github.com/python/cpython/commit/4827483f47906fecee6b5d9097df2a69a293a85c
msg389685 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-29 11:53
Yeah! Congrats INADA-san for implementing your PEP!
msg389796 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-03-30 07:25
I created bpo-43651 to track fixing EncodingError in Python stdlibs.
I close this issue for now.
msg389822 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-03-30 12:07
In bpo-43651, I found code pattern that it's difficult to use io.text_encoding():

    class OpenWrapper:
        def __new__(cls, *args, **kwargs):
            return open(*args, **kwargs)

`kwargs["encoding"] = text_encoding(kwargs.get("encoding)` doesn't work because `open(filename, "b", encoding="locale")` raises `ValueError: binary mode doesn't take an encoding argument`.

I think we should accept `encoding="locale"` even in binary mode. It makes easy to use `text_encoding()` and `encoding="locale"`.
msg389873 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-03-31 05:26
New changeset ff3c9739bd69aa8b58007e63c9e40e6708b4761e by Inada Naoki in branch 'master':
bpo-43510: PEP 597: Accept `encoding="locale"` in binary mode (GH-25103)
https://github.com/python/cpython/commit/ff3c9739bd69aa8b58007e63c9e40e6708b4761e
msg389877 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-03-31 06:23
I'm sorry, I was wrong. Allowing `encoding="locale"` didn't help OpenWrapper. See GH-25107.

If we use `encoding = text_encoding(encoding)` in binary mode, `open(filename, "rb")` will be warned. This doesn't make sense at all.

Adding `mode` parameter to the `text_encoding()` doesn't make sense too. Because it is used for functions wrapping not only open(), but also TextIOWrapper().

So we must not call `text_encoding()` in binary mode. Allowing `encoding="locale"` in binary mode doesn't make it easy. I will revert GH-25103.
msg389879 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-31 09:30
To me, it sounds really weird to accept an encoding when a file is opened in binary mode. open(filename, "rb", encoding="locale") looks like a bug.
msg389881 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2021-03-31 09:36
On 31.03.2021 11:30, STINNER Victor wrote:
> 
> To me, it sounds really weird to accept an encoding when a file is opened in binary mode. open(filename, "rb", encoding="locale") looks like a bug.

Same here.

If encoding is used as an argument and then not used, this is a bug,
not a feature :-)
msg389882 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-03-31 09:49
New changeset cfa176685a5e788bafc7749d7a93f43ea3e4de9f by Inada Naoki in branch 'master':
Revert "bpo-43510: PEP 597: Accept `encoding="locale"` in binary mode (GH-25103)" (#25108)
https://github.com/python/cpython/commit/cfa176685a5e788bafc7749d7a93f43ea3e4de9f
msg390044 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-04-02 08:39
New changeset bec8c787ec72d73b39011bde3f3a93e9bb1174b7 by Inada Naoki in branch 'master':
bpo-43510: Fix emitting EncodingWarning from _io module. (GH-25146)
https://github.com/python/cpython/commit/bec8c787ec72d73b39011bde3f3a93e9bb1174b7
History
Date User Action Args
2021-04-06 03:46:15methanesetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-04-02 08:39:19methanesetmessages: + msg390044
2021-04-02 07:25:32methanesetpull_requests: + pull_request23893
2021-03-31 09:49:47methanesetmessages: + msg389882
2021-03-31 09:36:33lemburgsetnosy: + lemburg
messages: + msg389881
2021-03-31 09:30:53vstinnersetmessages: + msg389879
2021-03-31 07:11:56eryksunsetmessages: - msg389828
2021-03-31 06:24:20methanesetpull_requests: + pull_request23852
2021-03-31 06:23:33methanesetmessages: + msg389877
2021-03-31 05:53:00methanesetpull_requests: + pull_request23851
2021-03-31 05:26:15methanesetmessages: + msg389873
2021-03-31 04:21:53methanesetstage: resolved -> patch review
pull_requests: + pull_request23848
2021-03-31 04:20:19methanelinkissue43651 superseder
2021-03-30 14:38:08eryksunsetnosy: + eryksun
messages: + msg389828
2021-03-30 12:07:35methanesetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg389822
2021-03-30 07:25:41methanesetstatus: open -> closed
resolution: fixed
messages: + msg389796

stage: patch review -> resolved
2021-03-29 11:53:04vstinnersetmessages: + msg389685
2021-03-29 03:28:22methanesetmessages: + msg389656
2021-03-19 14:57:12methanesetmessages: + msg389099
2021-03-19 14:31:58vstinnersetmessages: + msg389095
2021-03-19 14:28:46methanesetmessages: + msg389094
2021-03-19 14:16:06vstinnersetnosy: + vstinner
messages: + msg389092
2021-03-16 04:25:21methanesetkeywords: + patch
stage: patch review
pull_requests: + pull_request23653
2021-03-16 04:19:49methanecreate