Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

locale.getlocale() returns a non RFC1766 language code #82986

Open
mgrandi mannequin opened this issue Nov 14, 2019 · 4 comments
Open

locale.getlocale() returns a non RFC1766 language code #82986

mgrandi mannequin opened this issue Nov 14, 2019 · 4 comments
Labels
3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@mgrandi
Copy link
Mannequin

mgrandi mannequin commented Nov 14, 2019

BPO 38805
Nosy @malemburg, @mgrandi

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2019-11-14.23:45:29.271>
labels = ['3.8', 'type-bug', 'library']
title = 'locale.getlocale() returns a non RFC1766 language code'
updated_at = <Date 2020-07-18.12:11:43.308>
user = 'https://github.com/mgrandi'

bugs.python.org fields:

activity = <Date 2020-07-18.12:11:43.308>
actor = 'ricpol'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2019-11-14.23:45:29.271>
creator = 'markgrandi'
dependencies = []
files = []
hgrepos = []
issue_num = 38805
keywords = []
message_count = 2.0
messages = ['356637', '373896']
nosy_count = 3.0
nosy_names = ['lemburg', 'markgrandi', 'ricpol']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue38805'
versions = ['Python 3.8']

@mgrandi
Copy link
Mannequin Author

mgrandi mannequin commented Nov 14, 2019

It seems that something with windows 10, python 3.8, or both changed where locale.getlocale() is now returning strange results

According to the documentation: https://docs.python.org/3/library/locale.html?highlight=locale%20getlocale#locale.getlocale , the language code should be in RFC1766 format:

Language-Tag = Primary-tag *( "-" Subtag )
Primary-tag = 1*8ALPHA
Subtag = 1*8ALPHA
Whitespace is not allowed within the tag.

but in python 3.8, I am getting a language code that doesn't meet RFC1766 specs:

PS C:\Users\auror> py -3
Python 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform; platform.platform()
'Windows-10-10.0.18362-SP0'
>>> import locale; locale.getlocale(); locale.getdefaultlocale()
('English_United States', '1252')
('en_US', 'cp1252')
>>>

on the same machine, with python 3.7.4:

PS C:\Python37> .\python.exe
Python 3.7.4 (tags/v3.7.4:e09359112e, Jul  8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform; platform.platform()
'Windows-10-10.0.18362-SP0'
>>> import locale; locale.getlocale(); locale.getdefaultlocale()
(None, None)
('en_US', 'cp1252')
>>>

also interesting that the encoding is different in py3.8 between locale.getlocale() and locale.getdefaultlocale(), being '1252' and 'cp1252', this might not be related though as it was present in python 3.7.4

these issues might be related, but stuff found hwen searching for 'locale' bugs:

https://bugs.python.org/issue26024
https://bugs.python.org/issue37945

@mgrandi mgrandi mannequin added 3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Nov 14, 2019
@ricpol
Copy link
Mannequin

ricpol mannequin commented Jul 18, 2020

locale.getlocale() is now returning strange results

Not really "strange results" - fact is, now "getlocale()" returns the locale name *as if* it were already set from the beginnning (because it is, at least in part).

Before:

>>> import locale  # Python 3.7, new shell
>>> locale.getlocale()
(None, None)
>>> locale.setlocale(locale.LC_ALL, '') # say Hi from Italy
'Italian_Italy.1252'
>>> locale.getlocale()
('Italian_Italy', '1252')

now:

>>> import locale  # Python 3.8, new shell
>>> locale.getlocale()
('Italian_Italy', '1252')

As for why returned locale names are "a little different" in Windows, I found no better explanation that Eryk Sun's essays in https://bugs.python.org/issue37945. Long story short, it's not even a bug anymore... it's a hot mess and it won't be solved anytime soon.
But it's not the problem at hand, here. Returned locale names have not changed between 3.7 and 3.8.

What *is* changed, though, is that now Python on Windows appears to set the locale, implicitly, right from the start.
Except - maybe it does not, really:

>>> import locale  # Python 3.8, new shell
>>> locale.getlocale()
('Italian_Italy', '1252')
>>> locale.localeconv()
{'int_curr_symbol': '', 'currency_symbol': '', 'mon_decimal_point': '', 'mon_thousands_sep': '', 'mon_grouping': [], 'positive_sign': '', 'negative_sign': '', 'int_frac_digits': 127, 'frac_digits': 127, 'p_cs_precedes': 127, 'p_sep_by_space': 127, 'n_cs_precedes': 127, 'n_sep_by_space': 127, 'p_sign_posn': 127, 'n_sign_posn': 127, 'decimal_point': '.', 'thousands_sep': '', 'grouping': []}

As you can see, we have an Italian locale only in the name: the conventions are still those of the default C locale.
If we explicitly set the locale, on the other hand...

>>> locale.setlocale(locale.LC_ALL, '')
'Italian_Italy.1252'
>>> locale.localeconv()
{'int_curr_symbol': 'EUR', 'currency_symbol': '€', ... ... }

... now we enjoy a real Italian locale - pizza, pasta, gelato and all.

What happened?
Unfortunately, this change of behaviour is NOT documented, except for a passing note here: https://docs.python.org/3/whatsnew/changelog.html#id144. It's buried *very* deep:
"""
bpo-34485: On Windows, the LC_CTYPE is now set to the user preferred locale at startup. Previously, the LC_CTYPE locale was “C” at startup, but changed when calling setlocale(LC_CTYPE, “”) or setlocale(LC_ALL, “”).
"""
This explains... something. Python now pre-sets *only* the LC_CTYPE category, and that's why the other conventions remain unchanged.
Unfortunately, setting *that* one category changes the result yielded by locale.getlocale(). But this is not a bug either, because it's the same behaviour you would have in Python 3.7:

>>> locale.setlocale(locale.LC_CTYPE, '')  # Python 3.7
'Italian_Italy.1252'
>>> locale.getlocale()
('Italian_Italy', '1252')

...and that's because locale.getlocale() with no arguments default, wait for it, to getlocale(category=LC_CTYPE), as documented!

So, why Python 3.8 now pre-sets LC_CTYPE on Windows? Apparently, bpo-34485 is part of the ongoing shakespearian feud between Victor Stinner and the Python locale code. If you squint hard enough, you will see the answer here: https://vstinner.github.io/locale-bugfixes-python3.html but at this point, I don't know if anyone still keeps the score.

To sum up:

  • there's nothing new about locale names - still the same mess;
  • if locale names as returned by locale.getlocale() bother you, you should follow Victor's advice here: https://bugs.python.org/issue37945#msg361806. Use locale.setlocale(category, None) instead;
  • if you relied on getlocale() with no arguments to test your locale, assuming that either a locale is unset or it is "fully set", then you should stop now. A locale can also be "partially set", and in fact it's just what happens now on Windows by default. You should test for a specific category instead;
  • changing the way the locale is set by default on Windows can be... rather surprising and can lead to misunderstandings. I would certainly add a note in the locale documentation to explain this new behaviour.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@danny0838
Copy link

danny0838 commented Dec 11, 2022

I stumpled upon this issue. Currently the doc still says locale.getlocale returns RFC-1766 formatted language code, which is not true.

What is the working way to get the default locale code from Windows as locale.getdefaultlocale can do?

@ennoborg
Copy link

There are lots of entries missing in the locale_alias table, which you can inspect here for Python 3.11:

https://github.com/python/cpython/blob/3.11/Lib/locale.py#L779

italian_italy is just one of those, and I know many, many more, like french_belgoum, dutch_aruba, dutch_belgium, dutch_netherlands

I'm not sure why the 'English_United States' lookup fails. The comment above the table says that underscores and dashes are removed before lookup, but the handling of spaces is a bit obscure to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

2 participants