This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients lemburg, vstinner
Date 2021-03-19.10:17:35
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1616149055.42.0.727843855431.issue43552@roundup.psfhosted.org>
In-reply-to
Content
I created this issue while reviewing the implementation of the PEP 597: PR 19481.

Copy of my comments on the PR related to this issue.


_locale.get_locale_encoding() calls _Py_GetLocaleEncoding() which returns UTF-8 if the Python UTF-8 Mode is enabled.

Maybe the function could have a flag: please don't lie to me and return the current locale encoding ;-)

Or we could add a function to get the *current* locale encoding: **locale.get_current_locale_encoding()**.

This one would ignore the UTF-8 Mode and call nl_langinfo(CODESET). There are APIs to use the *current* locale encoding: PyUnicode_EncodeLocale/PyUnicode_DecodeLocale and _Py_EncodeLocaleEx/_Py_DecodeLocaleEx with current_locale=1. You can see which functions use it:

* decode tm_zone field of localtime_r() and gmtime()
* decode tzname[0] and tzname[1] strings
* decode setlocale() result
* decode some localeconv() fields (this function requires to switch to different locale encoding, it's bad!)
* decode nl_langinfo() result
* decode gettext(), dgettext(), dcgettext(), textdomain(), bindtextdomain(), bind_textdomain_codeset() result
* decode strerror() and dlerror() result
* encode/decode in the readline module
* encode format string for strftime() in time.strftime() (only used on Windows, Unix provides wcsftime) and then decode strftime() result


> encoding="locale" : Uses locale encoding regardless UTF-8 mode.

Currently, open(encoding=None) doesn't work like that. For example, on macOS, Android and VxWorks, it always use UTF-8. And if the UTF-8 Mode is used, UTF-8 is used.

In the PEP 597, I read the encoding="locale" is the same than encoding=None but don't emit an EncodingWarning. Where the PEP 597 changes the chosen encoding for encoding=None case? The PEP says "locale encoding" without specifying exactly what it is. In Python, it means different things depending on the context. There is subtle difference the **current** locale encoding and "the locale encoding". I agree that it needs some clarification :-)

While we discuss encodings, I never understood why open() gets the current locale encoding from nl_langinfo(CODESET), encoding which can change at runtime while Python is running. For example, if thread A calls open(filename, encoding=None), thread B calls locale.localeconv(), and the LC_MONETARY locale uses a different encoding than the LC_CTYPE locale, thread A can get the LC_MONETARY encoding because of how locale.localeconv() is currently implemented: it changes temporarily LC_CTYPE to LC_MONETARY to decode the monetary fields of localeconv() result.

I would prefer that Python uses the same encoding for the whole lifetime of the process, since the beginning until the end. The Python filesystem encoding is a good choice for that. It's the same than locale.getpreferredencoding(False) (currently used by open() and friends), but becomes different if the LC_CTYPE is changed (temporarily or permanently).
History
Date User Action Args
2021-03-19 10:17:35vstinnersetrecipients: + vstinner, lemburg
2021-03-19 10:17:35vstinnersetmessageid: <1616149055.42.0.727843855431.issue43552@roundup.psfhosted.org>
2021-03-19 10:17:35vstinnerlinkissue43552 messages
2021-03-19 10:17:35vstinnercreate