classification
Title: locale.normalize() and getdefaultlocale() convert C.UTF-8 to en_US.UTF-8
Type: Stage: patch review
Components: Versions: Python 3.8, Python 3.7, Python 3.6, Python 3.4, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Jeffrey.Kintscher, benjamin.peterson, gordonmessmer, hroncok, lemburg, mattheww, ncoghlan, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2017-06-25 16:58 by mattheww, last changed 2019-07-29 15:54 by vstinner.

Pull Requests
URL Status Linked Edit
PR 14925 open gordonmessmer, 2019-07-24 02:57
Messages (8)
msg296828 - (view) Author: Matthew Woodcraft (mattheww) Date: 2017-06-25 16:58
I have a system where the default locale is C.UTF-8, and en_US.UTF-8 is
not installed.

But locale.normalize() unhelpfully converts "C.UTF-8" to "en_US.UTF-8".

So the following crashes for me:

  python3.6 -c "import locale;locale.setlocale(locale.LC_ALL, ('C', 'UTF-8'))"


Similarly getdefaultlocale() returns ('en_US', 'UTF-8'), so this crashes too:

  export LANG=C.UTF-8
  unset LC_CTYPE
  unset LC_ALL
  unset LANGUAGE
  python3.6 -c "import locale;locale.setlocale(locale.LC_ALL, locale.getdefaultlocale())"


This behaviour is caused by a locale_alias entry in Lib/locale.py .

https://bugs.python.org/issue20076 documents its addition but doesn't
provide a rationale.

I can see that it might be helpful to provide such a conversion if
C.UTF-8 doesn't exist and en_US.UTF-8 does, but the current code is
breaking modern correctly-configured systems for the benefit of old
misconfigured ones (C.UTF-8 shouldn't really be in the environment if it
isn't available on the system, after all).
msg297342 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-06-30 02:20
I'm honestly not sure how our Python level locale handling really works (I've mainly worked on the lower level C locale manipulation), so adding folks to the nosy list based on #20076 and #29571.

I agree we shouldn't be aliasing C.UTF-8 to en_US.UTF-8 though - we took en_US.UTF-8 out of the locale coercion fallback list in PEP 538 because it wasn't really right.
msg302981 - (view) Author: Matthew Woodcraft (mattheww) Date: 2017-09-25 22:37
I've investigated a bit more.

First, I've tried with Python 3.7.0a1 . As you'd expect, PEP 537 means
this behaviour now also occurs when no locale environment variables at
all are set.


Second, I've looked through locale.py a bit. I believe what it calls the
"aliasing engine" is applied for:

 - getlocale()
 - getdefaultlocale()
 - setlocale() when passed a tuple, but not when passed a string


This leads to some rather odd results.

With 3.7.0a1 and no locale environment variables:

  >>> import locale
  >>> locale.getlocale()
  ('en_US', 'UTF-8')

  # getlocale() is lying: the effective locale is really C.UTF-8
  >>> sorted("abcABC", key=locale.strxfrm)
  ['A', 'B', 'C', 'a', 'b', 'c']


Third, I've checked on a system which does have en_US.UTF-8 installed,
and (as you'd expect) instead of crashing it gives wrong results:

  >>> import locale
  >>> locale.setlocale(locale.LC_ALL, ('C', 'UTF-8'))
  'en_US.UTF-8'
  >>> locale.getlocale()
  ('en_US', 'UTF-8')

  # now getlocale() is telling the truth, and the user isn't getting the
  # collation they requested
  >>> sorted("abcABC", key=locale.strxfrm)
  ['a', 'A', 'b', 'B', 'c', 'C']
msg302982 - (view) Author: Matthew Woodcraft (mattheww) Date: 2017-09-25 22:39
(For PEP 537 please read PEP 538, sorry)
msg347520 - (view) Author: Gordon Messmer (gordonmessmer) * Date: 2019-07-09 05:44
> I can see that it might be helpful to provide such a conversion if
> C.UTF-8 doesn't exist and en_US.UTF-8 does

That can't happen.  The "C" locale describes the behavior defined in the ISO C standard.  It's built-in to glibc (and should be for all other libc implementations).  All other locales require external support (i.e. /usr/lib/locale/<locale>)

https://www.gnu.org/software/libc/manual/html_node/Standard-Locales.html#Standard-Locales
msg347521 - (view) Author: Gordon Messmer (gordonmessmer) * Date: 2019-07-09 06:10
> I agree we shouldn't be aliasing C.UTF-8 to en_US.UTF-8 though

What can we do about reverting that change?  Python's current behavior causes unexpected exceptions, especially in containers.

I'm currently debugging test failures in a Python application that occur in Fedora rawhide containers.  Those containers don't have any locales installed.  The test software saves its current locale, changes the locale in order to run a test, and then restores the original.  Because Python is incorrectly reporting the original locale as "en_US", restoring the original fails.
msg347528 - (view) Author: Miro Hrončok (hroncok) * Date: 2019-07-09 08:42
>> C.UTF-8 doesn't exist and en_US.UTF-8 does
> That can't happen

It certainly can. Take for example RHEL 7 or 6.
msg348367 - (view) Author: Gordon Messmer (gordonmessmer) * Date: 2019-07-24 04:24
As an example, let's consider dnf's i18n setup:

    try:
        dnf.pycomp.setlocale(locale.LC_ALL, '')
    except locale.Error:
        # default to C.UTF-8 or C locale if we got a failure.
        try:
            dnf.pycomp.setlocale(locale.LC_ALL, 'C.UTF-8')
            os.environ['LC_ALL'] = 'C.UTF-8'
        except locale.Error:
            dnf.pycomp.setlocale(locale.LC_ALL, 'C')
            os.environ['LC_ALL'] = 'C'

If setting the environment-specified locale fails, dnf will attempt to set the locale
to C.UTF-8, and if that fails it will set the locale to C.  This seems like an ideal
process.  If the expected locale is missing, dnf will attempt to at least use UTF-8,
before falling back to the C locale.

Unfortunately, because of the alias, this process will be unable to set the 'C.UTF-8'
locale on systems which do not have the 'en_US' locale installed.  This renders
system support for 'C.UTF-8' unusable when no locales are installed.
History
Date User Action Args
2019-07-29 15:54:44vstinnersetnosy: - vstinner
2019-07-24 04:24:56gordonmessmersetmessages: + msg348367
2019-07-24 02:57:25gordonmessmersetkeywords: + patch
stage: patch review
pull_requests: + pull_request14696
2019-07-11 20:17:50Jeffrey.Kintschersetnosy: + Jeffrey.Kintscher
2019-07-09 08:42:49hroncoksetnosy: + vstinner, hroncok

messages: + msg347528
versions: + Python 3.8
2019-07-09 06:10:15gordonmessmersetmessages: + msg347521
2019-07-09 05:44:38gordonmessmersetnosy: + gordonmessmer
messages: + msg347520
2017-09-25 22:39:15matthewwsetmessages: + msg302982
2017-09-25 22:37:16matthewwsetmessages: + msg302981
versions: + Python 3.7
2017-06-30 02:20:13ncoghlansetnosy: + lemburg, benjamin.peterson, serhiy.storchaka
messages: + msg297342
2017-06-29 18:35:14r.david.murraysetnosy: + ncoghlan
2017-06-25 16:58:59matthewwcreate