Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

locale.getlocale() seems wrong when the locale is yet unset (python3 on linux) #78115

Open
nicolashainaux mannequin opened this issue Jun 21, 2018 · 8 comments
Open

locale.getlocale() seems wrong when the locale is yet unset (python3 on linux) #78115

nicolashainaux mannequin opened this issue Jun 21, 2018 · 8 comments
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@nicolashainaux
Copy link
Mannequin

nicolashainaux mannequin commented Jun 21, 2018

BPO 33934
Nosy @ncoghlan, @vstinner, @ned-deily, @nicolashainaux

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2018-06-21.16:00:37.625>
labels = ['3.7', 'type-bug', 'library']
title = 'locale.getlocale() seems wrong when the locale is yet unset (python3 on linux)'
updated_at = <Date 2018-06-25.10:50:11.842>
user = 'https://github.com/nicolashainaux'

bugs.python.org fields:

activity = <Date 2018-06-25.10:50:11.842>
actor = 'ncoghlan'
assignee = 'docs@python'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2018-06-21.16:00:37.625>
creator = 'zezollo'
dependencies = []
files = []
hgrepos = []
issue_num = 33934
keywords = []
message_count = 8.0
messages = ['320192', '320264', '320298', '320315', '320348', '320397', '320404', '320412']
nosy_count = 5.0
nosy_names = ['ncoghlan', 'vstinner', 'ned.deily', 'docs@python', 'zezollo']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue33934'
versions = ['Python 3.4', 'Python 3.5', 'Python 3.6', 'Python 3.7']

@nicolashainaux
Copy link
Mannequin Author

nicolashainaux mannequin commented Jun 21, 2018

Expected behaviour:

When unset, the locale in use is C (as stated in python documentation) and locale.getlocale() returns (None, None) on Linux with python2.7 or on Windows with python2.7 and python 3.6 (at least):

$ python2
Python 2.7.15 (default, May  1 2018, 20:16:04) 
[GCC 7.3.1 20180406] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getlocale()
(None, None)
>>> 

Issue:

But when using python3.4+ on Linux, instead of (None, None), locale.getlocale() returns the same value as locale.getdefaultlocale():

$ python
Python 3.6.3 (default, Oct 24 2017, 14:48:20) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getlocale()
('fr_FR', 'UTF-8')
>>> locale.localeconv()
{'int_curr_symbol': '', 'currency_symbol': '', 'mon_decimal_point': '', 'mon_thousands_sep': '', 'mon_grouping': [], 'positive_sign': '', 'negative_sign': '', 'int_frac_digits': 127, 'frac_digits': 127, 'p_cs_precedes': 127, 'p_sep_by_space': 127, 'n_cs_precedes': 127, 'n_sep_by_space': 127, 'p_sign_posn': 127, 'n_sign_posn': 127, 'decimal_point': '.', 'thousands_sep': '', 'grouping': []}
>>> locale.str(2.5)
'2.5'

Though the locale actually in use is still C (as shown above by the output of locale.localeconv() and confirmed by the result of locale.str(2.5), which shows a dot as decimal point and not a comma (as expected with fr_FR.UTF-8)).

I could observe this confusing behaviour on Linux with python3.4, 3.5, 3.6 and 3.7 (rc1). (Also on FreeBSD with python3.6.1).

A problematic consequence of this behaviour is that it becomes impossible to detect whether the locale has already been set by the user, or not.

I could not find any other similar issue and hope this is not a duplicate.

@nicolashainaux nicolashainaux mannequin added 3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jun 21, 2018
@ned-deily
Copy link
Member

Can you say on which Linux platform/release you see this behavior and with which Python 3.6.3, i.e. from the platform distributor or built yourself? If I understand your concern correctly, I cannot reproduce that behavior on a current Debian test system using either the Debian-supplied 3.6.6rc1 or with a 3.6.3 built from source. With either LANG unset or set to C (and with no LC* env vars set), I see:

$ unset LC_ALL LC_CTYPE LANG LANGUAGE
$ ./python
Python 3.6.3 (tags/v3.6.3:2c5fed86e0, Jun 22 2018, 16:08:11)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getlocale()
(None, None)
>>> locale.getdefaultlocale()
(None, None)

Note that, as documented, the locale.getdefaultlocale() checks several env vars 'LC_ALL', 'LC_CTYPE', 'LANG' and 'LANGUAGE'. Are you certain that all of those env vars are unset when you run this test?

https://docs.python.org/3.6/library/locale.html#locale.getdefaultlocale

@ned-deily ned-deily removed the 3.7 (EOL) end of life label Jun 22, 2018
@nicolashainaux
Copy link
Mannequin Author

nicolashainaux mannequin commented Jun 23, 2018

Sorry, I did not realize that using the word "unset" was completely misleading: I only meant "before any use of locale.setlocale() in python". So I'll rephrase this all, and add details about the python versions and platforms in this message.

So, first, I do not unset the environment variables from the shell before running python.

The only steps required to reproduce this behaviour are: open a terminal and run python3:

Python 3.6.5 (default, May 11 2018, 04:00:52) 
[GCC 8.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getlocale()
('fr_FR', 'UTF-8')  # Wrong: the C locale is actually in use, so (None, None) is expected

Explanation: when python starts, it runs using the C locale, on any platform (Windows, Linux, BSD), any python version (2, 3...), until locale.setlocale() is used to set another locale. This is expected (the doc says so in the getdefaultlocale() paragraph that you mentioned) and can be confirmed by the outputs of locale.localeconv() and locale.str().

So, before any use of locale.setlocale(), locale.getlocale() should return (None, None) (as this value matches the C locale).

This is the case on Windows, python2 and 3, and on Linux and FreeBSD python2.

But on Linux and FreeBSD, python>=3.4 (could not test 3.0<=python<=3.3), locale.getlocale() returns the value deduced from the environment variables instead, like locale.getdefaultlocale() already does, e.g. ('fr_FR', 'UTF-8').

All python versions I tested are from the platform distributors (3.7 only is compiled, but it's an automatic build from an AUR). Here is a more detailed list of the python versions and Linux and BSD platforms where I could observe this behaviour:

Problem of this behaviour on Linux and FreeBSD python>=3.4 is first, of course, that it's not consistent throughout all platforms, and second, that it makes it impossible for a python library to guess, from locale.getlocale() if the user (a python app) has set the locale or not (and is actually still using the C locale). (It is still possible to rely on locale.localeconv() to get correct elements).

Hope this message made things clear now :-)

@nicolashainaux nicolashainaux mannequin added the 3.7 (EOL) end of life label Jun 23, 2018
@ned-deily
Copy link
Member

Thanks for the more detailed explanation. I think you are right that the behavior does not match the documentation but which is to be preferred does not necessarily have an easy answer. Also, this whole area has been undergoing revision, for example, with new features in 3.7. Nick and/or Victor, can you address Nicolas's query?

@ncoghlan
Copy link
Contributor

This statement is no longer correct: "when python starts, it runs using the C locale, on any platform (Windows, Linux, BSD), any python version (2, 3...), until locale.setlocale() is used to set another locale."

The Python 3 text model doesn't work properly in the legacy C locale due to the assumption of ASCII as the preferred text encoding, so we run setlocale(LC_ALL, "") early in the startup sequence in order to switch to something more sensible. In Python 3.7+, we're even more opinionated about that, and explicitly coerce the C locale to a UTF-8 based one if there's one available.

If our docs are still saying otherwise anywhere, then our docs are outdated, and need to be fixed.

@ncoghlan ncoghlan added docs Documentation in the Doc dir and removed stdlib Python modules in the Lib dir labels Jun 24, 2018
@nicolashainaux
Copy link
Mannequin Author

nicolashainaux mannequin commented Jun 25, 2018

I understand that the statement "when python starts, it runs using the C locale..." should not be correct anymore (and the doc should then be updated), but in fact this statement is still true on the systems I tested; only, the output of locale.getlocale() at start is in contradiction with the locale really set in fact.

It looks like the setting done by setlocale(LC_ALL, "") at an early stage is lost at some point (only locale.getlocale() seems to "remember" it).

For instance, my box locale is 'fr_FR.UTF-8', so the decimal point is a comma, but when starting python 3.7:

>>> import locale
>>> locale.str(2.4)
'2.4'                     # Wrong: if the locale in use is 'fr_FR.UTF-8', then '2,4' is expected instead
>>> locale.getlocale()
('fr_FR', 'UTF-8')
>>> locale.localeconv()
{'int_curr_symbol': '', 'currency_symbol': '', 'mon_decimal_point': '', 'mon_thousands_sep': '', 'mon_grouping': [], 'positive_sign': '', 'negative_sign': '', 'int_frac_digits': 127, 'frac_digits': 127, 'p_cs_precedes': 127, 'p_sep_by_space': 127, 'n_cs_precedes': 127, 'n_sep_by_space': 127, 'p_sign_posn': 127, 'n_sign_posn': 127, 'decimal_point': '.', 'thousands_sep': '', 'grouping': []}
>>>

Note that the output of localeconv() does match C locale, not 'fr_FR.UTF-8'.

Compare this with the outputs of locale.str() and locale.localeconv() when the locale is explicitly set at start:

>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'LC_CTYPE=fr_FR.utf8;LC_NUMERIC=fr_FR.UTF-8;LC_TIME=fr_FR.UTF-8;LC_COLLATE=fr_FR.utf8;LC_MONETARY=fr_FR.UTF-8;LC_MESSAGES=fr_FR.utf8;LC_PAPER=fr_FR.UTF-8;LC_NAME=fr_FR.UTF-8;LC_ADDRESS=fr_FR.UTF-8;LC_TELEPHONE=fr_FR.UTF-8;LC_MEASUREMENT=fr_FR.UTF-8;LC_IDENTIFICATION=fr_FR.UTF-8'
>>> locale.str(2.4)
'2,4'                       # Correct!
>>> locale.localeconv()     # Output of localeconv() does match 'fr_FR.UTF-8' locale
{'int_curr_symbol': 'EUR ', 'currency_symbol': '€', 'mon_decimal_point': ',', 'mon_thousands_sep': '\u202f', 'mon_grouping': [3, 0], 'positive_sign': '', 'negative_sign': '-', 'int_frac_digits': 2, 'frac_digits': 2, 'p_cs_precedes': 0, 'p_sep_by_space': 1, 'n_cs_precedes': 0, 'n_sep_by_space': 1, 'p_sign_posn': 1, 'n_sign_posn': 1, 'decimal_point': ',', 'thousands_sep': '\u202f', 'grouping': [3, 0]}
>>>

Maybe the title of this issue should be turned to "at start, the C locale is in use in spite of locale.getlocale()'s output (python3 on linux)"?

As to the behaviour on Windows, I guess this is another topic (locales belonging to another world on Windows)... but it may be interesting to note that it complies with the current documentation: at start python 3.6 also uses the C locale, and the output of locale.getlocale() is consistent with that. Here is a test on Windows 10:

Python 3.6.3 (v3.6.3:2c5fed8, Oct 3 2017, 18:11:49) [MSC v.1900 64 bit (AMD64)] on win32

>>> import locale
>>> locale.getlocale()
(None, None)
>>> locale.localeconv()
{'decimal_point': '.', 'thousands_sep': '', 'grouping': [], 'int_curr_symbol': '', 'currency_symbol': '', 'mon_decimal_point': '', 'mon_thousands_sep': '', 'mon_grouping': [], 'positive_sign': '', 'negative_sign': '', 'int_frac_digits': 127, 'frac_digits': 127, 'p_cs_precedes': 127, 'p_sep_by_space': 127, 'n_cs_precedes': 127, 'n_sep_by_space': 127, 'p_sign_posn': 127, 'n_sign_posn': 127}
>>> locale.str(2.4)
'2.4'
>>> locale.getdefaultlocale()
('fr_FR', 'cp1252')

@nicolashainaux nicolashainaux mannequin added stdlib Python modules in the Lib dir and removed docs Documentation in the Doc dir labels Jun 25, 2018
@vstinner
Copy link
Member

When testing this issue, I found a bug in Python :-(

I opened bpo-33954: float.__format__('n') fails with _PyUnicode_CheckConsistency assertion error.

@ncoghlan
Copy link
Contributor

Ah, part of the confusion is that I misremembered the command we run implicitly during startup - it's only setlocale(LC_CTYPE, ""), not setlocale(LC_ALL, "").

However, the default category for locale.getlocale() is LC_CTYPE, so it reports the text encoding locale configured during startup, not the C level default.

The difference on Windows is expected - the startup code that implicitly runs setlocale(LC_CTYPE, "") doesn't get compiled in there.

So I think we have a few different potential ways of viewing this bug report:

  1. As a docs issue, where we advise users to run locale.getlocale(locale.LC_MESSAGES) to find out whether or not a specific locale really has been configured (vs the interpreter's default text encoding change that runs implicitly on startup)
  2. As a defaults change for 3.8+, where we switch locale.getlocale() over to checking locale.LC_MESSAGES instead of locale.LC_CTYPES, since the interpreter always sets the latter on startup, so it doesn't convey much useful information.
  3. As (1) for maintenance releases, and as (2) for 3.8+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

3 participants