classification
Title: locale.getdefaultlocale() envvars default code and documentation mismatch
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: a.badger, crosser, doko, eric.araujo, lemburg, loewis, scop, sdaoden
Priority: normal Keywords:

Created on 2009-09-23 19:35 by scop, last changed 2020-11-07 01:04 by iritkatriel.

Messages (7)
msg93048 - (view) Author: Ville Skyttä (scop) * Date: 2009-09-23 19:35
The default list of locale.getdefaultlocale() is documented to be the
one of GNU gettext; in the source docs in Python 2.7 trunk:

    "envvars defaults to the search path used in GNU gettext; it must
     always contain the variable name 'LANG'."

...and at http://docs.python.org/dev/library/locale.html in addition to
that:

    "The GNU gettext search path contains 'LANGUAGE', 'LC_ALL',
     'LC_CTYPE', and 'LANG', in that order."

This is correct, cf.
http://www.gnu.org/software/gettext/manual/gettext.html#Locale-Environment-Variables

However, the code in locale.py does not match the documentation; the
patch in issue #1166948 (svn r39572) moved LANGUAGE to the end of the
list.  I suggest putting it back at the beginning as documented (the
other change in r39572 is ok).

The py3k branch appears to have the same problem.
msg130667 - (view) Author: Eugene Crosser (crosser) Date: 2011-03-12 10:28
I don't know if the solution suggested in the report is right, but I can confirm the the current logic of getdefaultlocale() produces wrong results.

I have
  LANG=en_US.UTF-8
  LANGUAGE=en_US:en
  LC_CTYPE=ru_RU.UTF-8
  LC_COLLATE=ru_RU.UTF-8
which means, according to the documentation, "Do everything in English, but recognize Russian words and sort according to Russian alphabet".

All other software honors that semantics, except Python that returns Russian as the overall default locale:

Python 2.7.1+ (r271:86832, Feb 24 2011, 15:00:15) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> print locale.getdefaultlocale()
('ru_RU', 'UTF8')

I believe that because LC_CTYPE controls only one specific aspect of the locale, it either should not be used at all, or used only as the last resort when locale cannot be determined from LANG or LANGUAGE. I think that the current search order "envvars=('LC_ALL', 'LC_CTYPE', 'LANG', 'LANGUAGE')" is wrong.
msg130671 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-03-12 12:20
Eugene: i disagree.  The semantics are correct according to C standards:

    http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html
msg130674 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-03-12 12:39
(Eugene, whereas i still disagree (i'm a C programmer in daily life),

Python 3.3a0 (default, Mar 10 2011, 11:50:55) 
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> print(locale.getdefaultlocale())
('en_GB', 'UTF8')

So it seems i'm still to new to give you a hint here, sorry..)
msg130675 - (view) Author: Eugene Crosser (crosser) Date: 2011-03-12 12:54
Steffen: can you please be more specific?
As I read the seciton 8.2 of the cited document, I do not see a disparity with my statement. There is even an example:

"""
For example, if a user wanted to interact with the system in French, but required to sort German text files, LANG and LC_COLLATE could be defined as:

    LANG=Fr_FR
    LC_COLLATE=De_DE
"""

which is (almost) exactly my case. I have LANG set to en_US to tell the system that I want to interact in English, and LC_CTYPE - to Russian to tell it that "classification of characters" needs to be Russian-specific.

Note that I do *not* have LC_ALL set, because it takes precedence over all other LC_* variables which is *not* what I want.

I believe that the correct "guessing order", according to the document that you cited, would be:
  LANG
  LC_ALL
then possibly (possibly because it does not have encoding info)
  LANGUAGE
then optionally, as a last resort
  LC_CTYPE and other LC_* variables.
msg130676 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-03-12 13:39
On Sat, Mar 12, 2011 at 12:54:08PM +0000, Eugene Crosser wrote:
> Steffen: can you please be more specific?

I can't, dear Eugene, because you are completely right and i am 
completely wrong.

> I believe that the correct "guessing order" [...]

Well, thank you ... but it would have been better for me to go to 
the kitchen and drink a cup of nice tea before my post!

> I believe that because LC_CTYPE controls only one specific aspect 
> of the locale, it either should not be used at all, or used only 
> as the last resort when locale cannot be determined from LANG or 
> LANGUAGE. I think that the current search order 
> "envvars=('LC_ALL', 'LC_CTYPE', 'LANG', 'LANGUAGE')" is wrong

I really agree with you.
Except for LC_ALL and LANG (and LANGUAGE, but this is non-standart 
for C; it seem to have been pushed forward by the GNU people and 
is documented by Python, however) nothing should be part of the 
evaluation of a getdefaultlocale(). 

(I am personally not so sure about that getdefaultlocale() at all. 
That is, whereas i *may* understand the existence of a 
locale.getpreferredencoding() (but with different semantics as in 
#11022 but that's not of interest here), because you need to setup 
your I/O layer so that it works in the system's environment anyway, 
i personally think that getdefaultlocale() is too much.

I.e. getdefaultlocale() behaves as if setlocale() has been used 
but without setlocale() being used yet. 
This is a fine hint for an application that wants to use the users 
locale settings without using the users locale settings!
Just do a grep(1) on Lib/ ...)

Thanks once again for your kindness!
msg341942 - (view) Author: Toshio Kuratomi (a.badger) * Date: 2019-05-08 20:45
Hey doko, I was just looking through the oldest gettext bugs and found this bug open.  It was caused by your commits here: https://bugs.python.org/issue1166948 .   It feels like we have a few choices:

* revert the LANGUAGE ordering change which would take us back to the 2.6 behaviour. 
* update the documentation to reflect the new ordering [Since the change has been around for so long, I think this is my personal favorite]
* Remove LANGUAGE from setting the defaultlocale because the GNU gettext usage of this variable is actually very different than what we're doing here.  It seems like it should only affect LC_MESSAGES and should affect those only as a fallback.
* Revert the LANGUAGE ordering change to the beginning of the list but remove it from consideration as a source for the *encoding*.

what do you think?
History
Date User Action Args
2020-11-07 01:04:10iritkatrielsetversions: + Python 3.8, Python 3.9, Python 3.10, - Python 3.1, Python 2.7, Python 3.2
2019-05-08 20:45:57a.badgersetnosy: + doko, a.badger
messages: + msg341942
2011-03-12 13:39:06sdaodensetnosy: lemburg, loewis, scop, eric.araujo, sdaoden, crosser
messages: + msg130676
2011-03-12 12:54:07crossersetnosy: lemburg, loewis, scop, eric.araujo, sdaoden, crosser
messages: + msg130675
2011-03-12 12:39:00sdaodensetnosy: lemburg, loewis, scop, eric.araujo, sdaoden, crosser
messages: + msg130674
2011-03-12 12:34:49pitrousetnosy: + eric.araujo
2011-03-12 12:20:01sdaodensetnosy: + sdaoden
messages: + msg130671
2011-03-12 10:28:55crossersetnosy: + crosser
messages: + msg130667
2010-07-11 09:45:20BreamoreBoysetnosy: + lemburg, loewis

versions: + Python 3.1, Python 3.2
2009-09-23 19:35:02scopcreate