Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 597: Implemente encoding="locale" option and EncodingWarning #87676

Closed
methane opened this issue Mar 16, 2021 · 15 comments
Closed

PEP 597: Implemente encoding="locale" option and EncodingWarning #87676

methane opened this issue Mar 16, 2021 · 15 comments
Labels
3.10 only security fixes topic-IO

Comments

@methane
Copy link
Member

methane commented Mar 16, 2021

BPO 43510
Nosy @malemburg, @vstinner, @methane, @eryksun
PRs
  • bpo-43510: Implement PEP 597 opt-in EncodingWarning. #19481
  • bpo-43510: PEP 597: Accept encoding="locale" in binary mode. #25103
  • bpo-43510: Remove _pyio.OpenWrapper class #25107
  • Revert "bpo-43510: PEP 597: Accept encoding="locale" in binary mode." #25108
  • bpo-43510: Fix emitting EncodingWarning from _io module. #25146
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-04-06.03:46:15.688>
    created_at = <Date 2021-03-16.04:19:49.156>
    labels = ['3.10', 'expert-IO']
    title = 'PEP 597: Implemente encoding="locale" option and EncodingWarning'
    updated_at = <Date 2021-04-06.03:46:15.688>
    user = 'https://github.com/methane'

    bugs.python.org fields:

    activity = <Date 2021-04-06.03:46:15.688>
    actor = 'methane'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-04-06.03:46:15.688>
    closer = 'methane'
    components = ['IO']
    creation = <Date 2021-03-16.04:19:49.156>
    creator = 'methane'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 43510
    keywords = ['patch']
    message_count = 15.0
    messages = ['388809', '389092', '389094', '389095', '389099', '389656', '389685', '389796', '389822', '389873', '389877', '389879', '389881', '389882', '390044']
    nosy_count = 4.0
    nosy_names = ['lemburg', 'vstinner', 'methane', 'eryksun']
    pr_nums = ['19481', '25103', '25107', '25108', '25146']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue43510'
    versions = ['Python 3.10']

    @methane
    Copy link
    Member Author

    methane commented Mar 16, 2021

    PEP-597 is accepted.

    @methane methane added 3.10 only security fixes topic-IO labels Mar 16, 2021
    @vstinner
    Copy link
    Member

    I replied to INADA-san message on bpo-43552:
    https://bugs.python.org/issue43552#msg389091

    I had forgot to consider about UTF-8 mode while finishing PEP-597. If possible, I want to ignore UTF-8 mode when encoding="locale" is specified from Python 3.10.

    In this case, the PEP-597 statement that open(filename, encoding="locale") is the same than open(filename) is wrong. It would mean that users which got the UTF-8 Mode enabled (implicitly or explicitly) would switch to a legacy encoding like latin1 rather than using the UTF-8 encoding, if they add encoding="locale" to their open() calls?

    Since the final goal is to move everybody towards to UTF-8, I'm not sure how it's a good thing.

    @methane
    Copy link
    Member Author

    methane commented Mar 19, 2021

    Since the final goal is to move everybody towards to UTF-8, I'm not sure how it's a good thing.

    The final goal (the third motivation of the PEP-597) is changing the default encoding (i.e. encoding used when it is not specified) to UTF-8.

    But forcing people to use UTF-8 even they specify locale encoding explicitly is not the goal. That's why I want to ignore UTF-8 mode when encoding="locale" is specified.

    I think this is almost Windows-only issue, and "mbcs" can be used in Windows already. It is documented in https://docs.python.org/3/using/windows.html#utf-8-mode

    So this is not a blocker. Just my preference.

    @vstinner
    Copy link
    Member

    I see different cases when open() is called with no encoding argument:

    (A) User wants to use UTF-8: add encoding="utf-8"

    (B) Windows user wants to use the ANSI code page of their computer, local file not intended to be shared with other computers: add encoding="mbcs". This makes the code specific to Windows ("mbcs" alias doesn't exist on Unix).

    (C) User wants to use the locale encoding and is fine with the UTF-8 Mode: add encoding=getpreferredencoding(False)

    (D) Unix user wants to use the locale encoding but not the UTF-8 Mode: encoding=get_current_locale_encoding() (function proposed in bpo-43552) or nl_langinfo(CODESET) (should work on any Python version). I don't know if nl_langinfo(CODESET) is available on Windows.

    (E) User has no idea of what they are doing and don't understand anything to Unicode: please trust us and specify explicitly UTF-8 :-)

    Apart the encoding="utf-8" case, I understand that they are two main complex cases:

    (1) "UTF-8" in the UTF-8 Mode, or the locale encoding
    (2) Always use the locale encoding, ignore the UTF-8 Mode

    What I don't expect is the current behavior, before PEP-597. Who uses open() without specifying an encoding but always want to use the locale encoding? (case 2) So this use case is already broken when the UTF-8 Mode is enabled explicitly?

    @methane
    Copy link
    Member Author

    methane commented Mar 19, 2021

    (1) "UTF-8" in the UTF-8 Mode, or the locale encoding
    (2) Always use the locale encoding, ignore the UTF-8 Mode

    What I don't expect is the current behavior, before PEP-597. Who uses open() without specifying an encoding but always want to use the locale encoding? (case 2) So this use case is already broken when the UTF-8 Mode is enabled explicitly?

    Yes, it is broken already. So they can not use UTF-8 mode.

    If encoding="locale" ignore UTF-8 mode, it save the use case. They can add encoding="locale" where they need to use locale/GetACP encoding and enable UTF-8 mode.

    That's why it is important If we enable UTF-8 mode by default in the future.

    @methane
    Copy link
    Member Author

    methane commented Mar 29, 2021

    New changeset 4827483 by Inada Naoki in branch 'master':
    bpo-43510: Implement PEP-597 opt-in EncodingWarning. (GH-19481)
    4827483

    @vstinner
    Copy link
    Member

    Yeah! Congrats INADA-san for implementing your PEP!

    @methane
    Copy link
    Member Author

    methane commented Mar 30, 2021

    I created bpo-43651 to track fixing EncodingError in Python stdlibs.
    I close this issue for now.

    @methane
    Copy link
    Member Author

    methane commented Mar 30, 2021

    In bpo-43651, I found code pattern that it's difficult to use io.text_encoding():

        class OpenWrapper:
            def __new__(cls, *args, **kwargs):
                return open(*args, **kwargs)

    kwargs["encoding"] = text_encoding(kwargs.get("encoding) doesn't work because open(filename, "b", encoding="locale") raises ValueError: binary mode doesn't take an encoding argument.

    I think we should accept encoding="locale" even in binary mode. It makes easy to use text_encoding() and encoding="locale".

    @methane methane reopened this Mar 30, 2021
    @methane methane reopened this Mar 30, 2021
    @methane
    Copy link
    Member Author

    methane commented Mar 31, 2021

    New changeset ff3c973 by Inada Naoki in branch 'master':
    bpo-43510: PEP-597: Accept encoding="locale" in binary mode (GH-25103)
    ff3c973

    @methane
    Copy link
    Member Author

    methane commented Mar 31, 2021

    I'm sorry, I was wrong. Allowing encoding="locale" didn't help OpenWrapper. See #69294.

    If we use encoding = text_encoding(encoding) in binary mode, open(filename, "rb") will be warned. This doesn't make sense at all.

    Adding mode parameter to the text_encoding() doesn't make sense too. Because it is used for functions wrapping not only open(), but also TextIOWrapper().

    So we must not call text_encoding() in binary mode. Allowing encoding="locale" in binary mode doesn't make it easy. I will revert #69290.

    @vstinner
    Copy link
    Member

    To me, it sounds really weird to accept an encoding when a file is opened in binary mode. open(filename, "rb", encoding="locale") looks like a bug.

    @malemburg
    Copy link
    Member

    On 31.03.2021 11:30, STINNER Victor wrote:

    To me, it sounds really weird to accept an encoding when a file is opened in binary mode. open(filename, "rb", encoding="locale") looks like a bug.

    Same here.

    If encoding is used as an argument and then not used, this is a bug,
    not a feature :-)

    @methane
    Copy link
    Member Author

    methane commented Mar 31, 2021

    New changeset cfa1766 by Inada Naoki in branch 'master':
    Revert "bpo-43510: PEP-597: Accept encoding="locale" in binary mode (GH-25103)" (bpo-25108)
    cfa1766

    @methane
    Copy link
    Member Author

    methane commented Apr 2, 2021

    New changeset bec8c78 by Inada Naoki in branch 'master':
    bpo-43510: Fix emitting EncodingWarning from _io module. (GH-25146)
    bec8c78

    @methane methane closed this as completed Apr 6, 2021
    @methane methane closed this as completed Apr 6, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes topic-IO
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants