This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients vstinner
Date 2021-03-19.09:17:13
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1616145433.55.0.350148222552.issue43552@roundup.psfhosted.org>
In-reply-to
Content
I propose to add two new functions:

* locale.get_locale_encoding(): it's exactly the same than locale.getpreferredencoding(False).

* locale.get_current_locale_encoding(): always get the current locale encoding. Read the ANSI code page on Windows, or nl_langinfo(CODESET) on other platforms. Ignore the UTF-8 Mode. Don't always return "UTF-8" on macOS, Android, VxWorks.


Technically, locale.get_locale_encoding() would simply expose _locale.get_locale_encoding() that I added recently. It calls the new private _Py_GetLocaleEncoding() function (which has no argument).

By the way, Python requires nl_langinfo(CODESET) to be built. It's not a new requirement of Python 3.10, but I wanted to note that, I noticed it when I implemented _locale.get_locale_encoding() :-)


Python has a bad habit of lying to the user: locale.getpreferredencoding(False) is *NOT* the current locale encoding in multiple cases.

* locale.getpreferredencoding(False) always return "UTF-8" on macOS, Android and VxWorks
* locale.getpreferredencoding(False) always return "UTF-8" if the UTF-8 Mode is enabled
* otherwise, it returns the current locale encoding: ANSI code page on Windwos, or nl_langinfo(CODESET) on other platforms


Even if locale.getpreferredencoding(False) already exists, I propose to add locale.get_locale_encoding() because I dislike locale.getpreferredencoding() API. By default, this function sets temporarily LC_CTYPE to the user preferred locale. It can cause mojibake in other threads since setlocale(LC_CTYPE, "") affects all threads :-( Calling locale.getpreferredencoding(), rather than locale.getpreferredencoding(False), is not what most people expect. This API can be misused.

On the other side, locale.get_locale_encoding() does exactly what it says: only *get* the encoding, don't *set* temporarily a locale to something else.

By the way, the locale.localeconv() function can change temporarily LC_CTYPE locale to the LC_MONETARY locale which can cause other threads to use the wrong LC_CTYPE locale! But this is a different issue.
History
Date User Action Args
2021-03-19 09:17:13vstinnersetrecipients: + vstinner
2021-03-19 09:17:13vstinnersetmessageid: <1616145433.55.0.350148222552.issue43552@roundup.psfhosted.org>
2021-03-19 09:17:13vstinnerlinkissue43552 messages
2021-03-19 09:17:13vstinnercreate