Issue 30755: locale.normalize() and getdefaultlocale() convert C.UTF-8 to en_US.UTF-8

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/74940

classification

Title:	locale.normalize() and getdefaultlocale() convert C.UTF-8 to en_US.UTF-8
Type:		Stage:	patch review
Components:		Versions:	Python 3.8, Python 3.7, Python 3.6, Python 3.4, Python 3.5

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	Jeffrey.Kintscher, benjamin.peterson, gordonmessmer, hroncok, lemburg, mattheww, ncoghlan, serhiy.storchaka
Priority:	normal	Keywords:	patch

Created on 2017-06-25 16:58 by mattheww, last changed 2022-04-11 14:58 by admin.

Pull Requests
URL	Status	Linked	Edit
PR 14925	open	gordonmessmer, 2019-07-24 02:57

Messages (8)
msg296828 - (view)	Author: Matthew Woodcraft (mattheww)	Date: 2017-06-25 16:58
I have a system where the default locale is C.UTF-8, and en_US.UTF-8 is not installed. But locale.normalize() unhelpfully converts "C.UTF-8" to "en_US.UTF-8". So the following crashes for me: python3.6 -c "import locale;locale.setlocale(locale.LC_ALL, ('C', 'UTF-8'))" Similarly getdefaultlocale() returns ('en_US', 'UTF-8'), so this crashes too: export LANG=C.UTF-8 unset LC_CTYPE unset LC_ALL unset LANGUAGE python3.6 -c "import locale;locale.setlocale(locale.LC_ALL, locale.getdefaultlocale())" This behaviour is caused by a locale_alias entry in Lib/locale.py . https://bugs.python.org/issue20076 documents its addition but doesn't provide a rationale. I can see that it might be helpful to provide such a conversion if C.UTF-8 doesn't exist and en_US.UTF-8 does, but the current code is breaking modern correctly-configured systems for the benefit of old misconfigured ones (C.UTF-8 shouldn't really be in the environment if it isn't available on the system, after all).
msg297342 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2017-06-30 02:20
I'm honestly not sure how our Python level locale handling really works (I've mainly worked on the lower level C locale manipulation), so adding folks to the nosy list based on #20076 and #29571. I agree we shouldn't be aliasing C.UTF-8 to en_US.UTF-8 though - we took en_US.UTF-8 out of the locale coercion fallback list in PEP 538 because it wasn't really right.
msg302981 - (view)	Author: Matthew Woodcraft (mattheww)	Date: 2017-09-25 22:37
I've investigated a bit more. First, I've tried with Python 3.7.0a1 . As you'd expect, PEP 537 means this behaviour now also occurs when no locale environment variables at all are set. Second, I've looked through locale.py a bit. I believe what it calls the "aliasing engine" is applied for: - getlocale() - getdefaultlocale() - setlocale() when passed a tuple, but not when passed a string This leads to some rather odd results. With 3.7.0a1 and no locale environment variables: >>> import locale >>> locale.getlocale() ('en_US', 'UTF-8') # getlocale() is lying: the effective locale is really C.UTF-8 >>> sorted("abcABC", key=locale.strxfrm) ['A', 'B', 'C', 'a', 'b', 'c'] Third, I've checked on a system which does have en_US.UTF-8 installed, and (as you'd expect) instead of crashing it gives wrong results: >>> import locale >>> locale.setlocale(locale.LC_ALL, ('C', 'UTF-8')) 'en_US.UTF-8' >>> locale.getlocale() ('en_US', 'UTF-8') # now getlocale() is telling the truth, and the user isn't getting the # collation they requested >>> sorted("abcABC", key=locale.strxfrm) ['a', 'A', 'b', 'B', 'c', 'C']
msg302982 - (view)	Author: Matthew Woodcraft (mattheww)	Date: 2017-09-25 22:39
(For PEP 537 please read PEP 538, sorry)
msg347520 - (view)	Author: Gordon Messmer (gordonmessmer) *	Date: 2019-07-09 05:44
> I can see that it might be helpful to provide such a conversion if > C.UTF-8 doesn't exist and en_US.UTF-8 does That can't happen. The "C" locale describes the behavior defined in the ISO C standard. It's built-in to glibc (and should be for all other libc implementations). All other locales require external support (i.e. /usr/lib/locale/<locale>) https://www.gnu.org/software/libc/manual/html_node/Standard-Locales.html#Standard-Locales
msg347521 - (view)	Author: Gordon Messmer (gordonmessmer) *	Date: 2019-07-09 06:10
> I agree we shouldn't be aliasing C.UTF-8 to en_US.UTF-8 though What can we do about reverting that change? Python's current behavior causes unexpected exceptions, especially in containers. I'm currently debugging test failures in a Python application that occur in Fedora rawhide containers. Those containers don't have any locales installed. The test software saves its current locale, changes the locale in order to run a test, and then restores the original. Because Python is incorrectly reporting the original locale as "en_US", restoring the original fails.
msg347528 - (view)	Author: Miro Hrončok (hroncok) *	Date: 2019-07-09 08:42
>> C.UTF-8 doesn't exist and en_US.UTF-8 does > That can't happen It certainly can. Take for example RHEL 7 or 6.
msg348367 - (view)	Author: Gordon Messmer (gordonmessmer) *	Date: 2019-07-24 04:24
As an example, let's consider dnf's i18n setup: try: dnf.pycomp.setlocale(locale.LC_ALL, '') except locale.Error: # default to C.UTF-8 or C locale if we got a failure. try: dnf.pycomp.setlocale(locale.LC_ALL, 'C.UTF-8') os.environ['LC_ALL'] = 'C.UTF-8' except locale.Error: dnf.pycomp.setlocale(locale.LC_ALL, 'C') os.environ['LC_ALL'] = 'C' If setting the environment-specified locale fails, dnf will attempt to set the locale to C.UTF-8, and if that fails it will set the locale to C. This seems like an ideal process. If the expected locale is missing, dnf will attempt to at least use UTF-8, before falling back to the C locale. Unfortunately, because of the alias, this process will be unable to set the 'C.UTF-8' locale on systems which do not have the 'en_US' locale installed. This renders system support for 'C.UTF-8' unusable when no locales are installed.

History
Date	User	Action	Args
2022-04-11 14:58:48	admin	set	github: 74940
2019-07-29 15:54:44	vstinner	set	nosy: - vstinner
2019-07-24 04:24:56	gordonmessmer	set	messages: + msg348367
2019-07-24 02:57:25	gordonmessmer	set	keywords: + patch stage: patch review pull_requests: + pull_request14696
2019-07-11 20:17:50	Jeffrey.Kintscher	set	nosy: + Jeffrey.Kintscher
2019-07-09 08:42:49	hroncok	set	nosy: + vstinner, hroncok messages: + msg347528 versions: + Python 3.8
2019-07-09 06:10:15	gordonmessmer	set	messages: + msg347521
2019-07-09 05:44:38	gordonmessmer	set	nosy: + gordonmessmer messages: + msg347520
2017-09-25 22:39:15	mattheww	set	messages: + msg302982
2017-09-25 22:37:16	mattheww	set	messages: + msg302981 versions: + Python 3.7
2017-06-30 02:20:13	ncoghlan	set	nosy: + lemburg, benjamin.peterson, serhiy.storchaka messages: + msg297342
2017-06-29 18:35:14	r.david.murray	set	nosy: + ncoghlan
2017-06-25 16:58:59	mattheww	create