classification
Title: Deprecate sys._enablelegacywindowsfsencoding()
Type: Stage: resolved
Components: Windows Versions: Python 3.9
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: ZackerySpytz, eryksun, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords: patch

Created on 2019-06-26 15:15 by vstinner, last changed 2020-02-12 09:38 by steve.dower. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 18396 closed vstinner, 2020-02-07 08:18
Messages (9)
msg346631 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-26 15:15
sys._enablelegacywindowsfsencoding() was added late in PEP 529 design "just in case" if something goes wrong. But I'm not aware of anyone using it. Do we want to keep supporting the *legacy* Windows filesystem encoding (ANSI code page) forever? IMHO using UTF-8 is a way more practical solution to design portable applications working unmodified on Windows *and* Unix. Well, it's the purpose of the PEP 529.

I propose to deprecate sys._enablelegacywindowsfsencoding() and PYTHONLEGACYWINDOWSFSENCODING environment variable in Python 3.9 and remove it from Python 3.10. Calling sys._enablelegacywindowsfsencoding() would emit a DeprecationWarning in 3.9.

I dislike sys._enablelegacywindowsfsencoding() because it can lead to mojibake: filenames decoded from the ANSI code page but then encoded to UTF-8. In the PEP 587 "Python Initialization Configuration" I tried to ensure that encodings are set early: in a new "pre-initialization" phase. Encodings should not change after the pre-initialization.

--

By the way, I'm not aware of any issue with io._WindowsConsoleIO. Should we also deprecated PYTHONLEGACYWINDOWSSTDIO environment variable which opt-out from the new io._WindowsConsoleIO?

Extract of open() code in Modules/_io/_iomodule.c:

    /* Create the Raw file stream */
    {
        PyObject *RawIO_class = (PyObject *)&PyFileIO_Type;
#ifdef MS_WINDOWS
        PyConfig *config = &_PyInterpreterState_GET_UNSAFE()->config;
        if (!config->legacy_windows_stdio && _PyIO_get_console_type(path_or_fd) != '\0') {
            RawIO_class = (PyObject *)&PyWindowsConsoleIO_Type;
            encoding = "utf-8";
        }
#endif
        raw = PyObject_CallFunction(RawIO_class,
                                    "OsiO", path_or_fd, rawmode, closefd, opener);
    }
msg346632 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-26 15:16
> By the way, I'm not aware of any issue with io._WindowsConsoleIO. Should we also deprecated PYTHONLEGACYWINDOWSSTDIO environment variable which opt-out from the new io._WindowsConsoleIO?

It was added to Python 3.6 by PEP 528 "Change Windows console encoding to UTF-8".
msg346640 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-06-26 15:50
It's probably worth at least a post to python-dev to expand the audience, but unless someone speaks up with some really good reason why they must update to Python 3.10 without updating their own code then let's deprecate it.

FWIW, even with the flag the behaviour changed in 3.6. Previously it would use the *A APIs and let Windows do the encoding, but then it changed to the *W APIs and Python would encode with our own mbcs:replace, where mbcs is just calling through to the WideCharToMultiByte API. So there isn't really anything to maintain long term apart from the additional options.
msg361530 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-07 02:14
See also bpo-29241.
msg361539 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-07 08:12
One of sys._enablelegacywindowsfsencoding() issue is that os.fsdecode() and os.fsencode() are not updated, they continue to use UTF-8. Example on Windows:

>>> import sys, os
>>> sys.getfilesystemencoding()
'utf-8'
>>> os.fsencode('\xe9')
b'\xc3\xa9'
>>> sys._enablelegacywindowsfsencoding()
>>> sys.getfilesystemencoding()
'mbcs'
>>> os.fsencode('\xe9')
b'\xc3\xa9'

See bpo-29241 for larger issues caused by this function.

--

The first reason is deprecate this function is that it sounds dangerous to me and it doesn't seem to be used.

I only found one project which used it temporarily until they fixed their code to encode/decode filenames on Windows. It was used to workaround a in bug in their code.
msg361541 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-07 08:19
See also draft PEP 597 which proposes to use UTF-8 by default on Windows in Python 3.10:
https://www.python.org/dev/peps/pep-0597/
msg361545 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-07 09:01
> It's probably worth at least a post to python-dev to expand the audience, but unless someone speaks up with some really good reason why they must update to Python 3.10 without updating their own code then let's deprecate it.

I don't think that it deserves to be discussed on python-dev. I propose PR 18396 to add a deprecation in Python 3.9. We can open a discussion once we will reach the point of actually removing the feature. It's easy to revert a deprecation if needed.
msg361556 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-07 10:16
Oh, INADA-san found an issue of this function: Mercurial. I close this issue and I closed my PR. We can reconsider to deprecate the function once Mercurial will stop to use it.

Copy of INADA-san message:

"""
I think we should keep this several years.

Currently, mercurial depends on it.
https://www.mercurial-scm.org/repo/hg/file/tip/mercurial/pycompat.py#l103

There is a plan for moving to UTF-8 path, and legacyfsencoding is not needed if this plan is implemented. But I don't know about current progress of this plan.
https://www.mercurial-scm.org/wiki/WindowsUTF8Plan
"""

https://github.com/python/cpython/pull/18396#issuecomment-583284632
msg361870 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-02-12 09:38
I think we can deprecate it but leave it there - the original idea (courtesy of Guido) was to enable apps to transition to the change on their timeline, but it certainly should not be considered a core CPython feature for the rest of time.

I don't have a problem with Mercurial using it, though. The risk was that libraries would use it, which is somewhat helped by the fact that it doesn't work reliably if you don't call it early enough :)

On the console encoding, I haven't heard of any issues either. Deprecating that environment variable is also fine, in my opinion.

Neither of these flags need to be their own special initialisation option. Embedders have always had other (better) ways to change these settings - unfortunately, PEP 587 didn't spend long enough gathering requirements before being implemented to avoid committing these design flaws...
History
Date User Action Args
2020-02-12 09:38:30steve.dowersetmessages: + msg361870
2020-02-07 10:16:48vstinnersetstatus: open -> closed
resolution: rejected
messages: + msg361556

stage: patch review -> resolved
2020-02-07 09:01:25vstinnersetmessages: + msg361545
2020-02-07 08:19:43vstinnersettitle: Deprecate sys._enablelegacywindowsfsencoding()? -> Deprecate sys._enablelegacywindowsfsencoding()
2020-02-07 08:19:35vstinnersetmessages: + msg361541
2020-02-07 08:18:08vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request17772
2020-02-07 08:12:22vstinnersetmessages: + msg361539
2020-02-07 02:14:28vstinnersetmessages: + msg361530
2019-06-26 15:50:13steve.dowersetmessages: + msg346640
2019-06-26 15:16:43vstinnersetmessages: + msg346632
2019-06-26 15:15:51vstinnercreate