This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: built-in open() doesn't use locale.getpreferredencoding() as the default encoding
Type: behavior Stage: resolved
Components: Documentation, IO, Library (Lib) Versions: Python 3.10, Python 3.9
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, eryksun, smallbigcake, terry.reedy
Priority: normal Keywords:

Created on 2021-02-06 05:16 by smallbigcake, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (5)
msg386544 - (view) Author: Cake Xu (smallbigcake) Date: 2021-02-06 05:16
In the document about build-in function open() https://docs.python.org/3/library/functions.html#open, it says that
"encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.
"
But as I tried, I found that after I set locale using locale.setlocale(), the default encoding used by open() was changed and thus sometimes led to a UnicodeDecodeError.

So I think that the default encoding used by open() is the second element of whatever locale returned by locale.getlocale().
msg386547 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-02-06 06:39
On most platforms, unless UTF-8 mode is enabled, locale.getpreferredencoding(False) returns the LC_CTYPE encoding of the current locale. For example, in Linux:

    >>> locale.setlocale(locale.LC_CTYPE, 'en_US.UTF-8')
    'en_US.UTF-8'
    >>> locale.getpreferredencoding(False)
    'UTF-8'
    >>> locale.setlocale(locale.LC_CTYPE, 'en_US.iso-88591')
    'en_US.iso-88591'
    >>> locale.getpreferredencoding(False)
    'ISO-8859-1'

If the designers of the io module had wanted the preferred encoding to always be the default encoding from setlocale(LC_CTYPE, ""), they would have used and documented locale.getpreferredencoding(True).

---

In Windows, locale.getpreferredencoding(False) always returns the default encoding from locale.getdefaultlocale(), which is the process active (ANSI) code page. Changing it to track the LC_CTYPE locale would be convenient for applications and scripts running in Windows 10, for which the CRT's POSIX locale implementation has supported UTF-8 since spring of 2018.

The base behavior can't be changed at this point, but a -X option and/or environment variable could enable locale.getpreferredencoding(False) --  i.e. locale._get_locale_encoding() -- to return the current LC_CTYPE encoding in Windows, as it does in POSIX.
msg386550 - (view) Author: Cake Xu (smallbigcake) Date: 2021-02-06 07:53
Thank Eryk for answering my question.

So I get it now. I use this in Linux.

If my understanding is right, the open() will invoke locale.getpreferredencoding() by setting the do_setlocale=False -- i.e. locale.getpreferredencoding(False) -- to avoid invoking setlocale(LC_CTYPE, "").

Previously I had thought it's locale.getpreferredencoding(True) that invoked.
msg386879 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-02-12 21:16
Eryk, are you suggesting that this should be closed as 'Not a bug'?
msg386884 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-02-12 22:12
> If my understanding is right, the open() will invoke 
> locale.getpreferredencoding() by setting the do_setlocale=False 
> -- i.e. locale.getpreferredencoding(False) -- to avoid invoking 
> setlocale(LC_CTYPE, "").

Yes, that's the case in POSIX systems. With do_setlocale=False, getpreferredencoding() gets the current locale's LC_CTYPE codeset via nl_langinfo(CODESET). This is thread safe, whereas calling setlocale(LC_CTYPE, "") beforehand is not thread safe.

In Windows, locale.getpreferredencoding() always returns the encoding of the default locale, regardless of do_setlocale. It's needlessly inconsistent with POSIX.

> are you suggesting that this should be closed as 'Not a bug'?

Sorry, Terry. I forgot to close the issue.
History
Date User Action Args
2022-04-11 14:59:41adminsetgithub: 87306
2021-02-12 22:12:24eryksunsetstatus: open -> closed
resolution: not a bug
messages: + msg386884

stage: resolved
2021-02-12 21:16:13terry.reedysetnosy: + terry.reedy
messages: + msg386879
2021-02-06 07:53:32smallbigcakesetmessages: + msg386550
2021-02-06 06:42:38eryksunsettitle: build-in open() doesn't use whatever locale.getpreferredencoding() returns as default encoding. -> built-in open() doesn't use locale.getpreferredencoding() as the default encoding
components: + Library (Lib), IO
versions: + Python 3.10
2021-02-06 06:39:41eryksunsetnosy: + eryksun
messages: + msg386547
2021-02-06 05:16:32smallbigcakesettype: behavior
2021-02-06 05:16:14smallbigcakecreate