This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients lemburg, vstinner
Date 2021-03-19.11:05:29
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1616151929.89.0.714213045925.issue43552@roundup.psfhosted.org>
In-reply-to
Content
Attached encodings.py lists the different "locale encodings" used by Python. Example:
---
$ LANG=fr_FR ./python -X utf8 encodings.py fr_FR@euro
Set LC_CTYPE to 'fr_FR@euro'

LC_ALL env var: ''
LC_CTYPE env var: ''
LANG env var: 'fr_FR'
LC_CTYPE locale: 'fr_FR@euro'
Coerce C locale: 0
Python UTF-8 Mode: 1

(1) Python FS encoding
sys.getfilesystemencoding(): 'utf-8'

(2) Python locale encoding
_locale._get_locale_encoding(): 'UTF-8'
locale.getpreferredencoding(False): 'UTF-8'

(3) Current locale encoding
locale.get_current_locale_encoding(): 'ISO-8859-15'

(4) And more encodings for more fun!
locale.getdefaultlocale()[1]: 'ISO8859-1'
locale.getpreferredencoding(True): 'UTF-8'
---

Python starts with LC_CTYPE locale set to fr_FR (ISO8859-1), then the script sets the LC_CTYPE locale to fr_FR@euro (ISO-8859-15). The Python UTF-8 Mode is enabled explicitly. We get a funny combination of not less than 3 encodings!

* UTF-8
* ISO-8859-1
* ISO-8859-15

Which one is the correct one? Wel... It depends :-)

(1) The Python filesystem encoding is used to call almost all operating system functions: encode to the OS and decode from the OS. Filenames, environment variables, command line options, etc.

(2) The "Python" locale encoding is used by open() when no encoding is specific.

(3) The current locale encoding is used for a limited amount of functions that I listed in msg389063. Most users should not use it.

(4) locale.getpreferredencoding(True) is a weird beast. It is Python locale encoding until setlocale(LC_CTYPE, locale) is called for the first time. But it can be same if the Python UTF-8 Mode is enabled. I'm not sure in which category we should put this function :-(

(4 bis) locale.getdefaultlocale()[1] is the only function returning the ISO-8859-1 encoding. This encoding is not used by any function. I'm not sure of the purpose of this function. It sounds confusing.


I suggest to deprecate locale.getpreferredencoding(True).

I'm not sure what to do with locale.getdefaultlocale(). Should we deprecate it? I never used this function. How is it used? For which purpose?

I undertand that in 2000, locale.getdefaultlocale() was interesting to avoid calling setlocale(LC_CTYPE, ""). But Python 3 calls setlocale(LC_CTYPE, "") by default at startup since the early versions, and it's now called on all platforms since Python 3.8. Moreover, its internal database seems to be outdated and is painful to maintain (especially if we consider all platforms supported by Python, not only Linux, there are many issues on macOS).
History
Date User Action Args
2021-03-19 11:05:29vstinnersetrecipients: + vstinner, lemburg
2021-03-19 11:05:29vstinnersetmessageid: <1616151929.89.0.714213045925.issue43552@roundup.psfhosted.org>
2021-03-19 11:05:29vstinnerlinkissue43552 messages
2021-03-19 11:05:29vstinnercreate