New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
locale documentation doesn't mention that LC_CTYPE is changed at startup #50452
Comments
In the Library Reference section 22.2.1 for locale, it states: "Initially, when a program is started, the locale is the C locale, no This is the case for python2.x: $ export LANG=en_US.UTF-8
$ python2.5
Python 2.5.4 (r254:67916, Feb 17 2009, 20:16:45)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale; locale.getlocale()
(None, None)
>>> locale.getdefaultlocale()
('en_US', 'UTF8')
>>>
but not for 3.1:
$ python3.1
Python 3.1a1+ (py3k, Mar 23 2009, 00:12:12)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale; locale.getlocale()
('en_US', 'UTF8')
>>> locale.getdefaultlocale()
('en_US', 'UTF8')
>>> Either the code is incorrect in 3.1 or the documentation should be |
Confirmed for 3.1, 3.0 still returns (None, None). |
Deferring to Martin which one is correct :) |
This is definately a bug in 3.1, for the same reason that a C program I have a memory of this being reported before somewhere and someone |
For some reason only LC_CTYPE is affected: >>> locale.getlocale(locale.LC_CTYPE)
('fr_FR', 'UTF8')
>>> locale.getlocale(locale.LC_MESSAGES)
(None, None)
>>> locale.getlocale(locale.LC_TIME)
(None, None)
>>> locale.getlocale(locale.LC_NUMERIC)
(None, None)
>>> locale.getlocale(locale.LC_COLLATE)
(None, None) |
Ah, I can tell you exactly why that is, then. I noticed this in #ifdef HAVE_SETLOCALE
/* Set up the LC_CTYPE locale, so we can obtain
the locale's charset without having to switch
locales. */
setlocale(LC_CTYPE, "");
#endif SVN blames Martin in r56922, so this case is assigned appropriately. |
It would still be better it is was unset afterwards. Third-party |
In principle, they could, yes - but what specific behavior might that |
Ok, so I suppose we could leave the code as-is. |
Since it controls what is considered to be whitespace, it is possible |
To add a little bit more analysis: posix.device_encoding requires that So for 3.1, it seems that Python must set LC_CTYPE. If somebody can |
open() does indirectly (locale.getpreferredencoding()) change temporary the locale (set LC_CTYPE to "") if the file is not a TTY (if it is a TTY, device_encoding() calls nl_langinfo(CODESET) without changing the current locale). If setlocale() is not thread-safe we have (maybe?) a problem here. See also bpo-11022: report of an user not understanding why setlocale() doesn't impact open() (TextIOWrapper) encoding). A quick solution is to call locale.getpreferredencoding(False) which doesn't change the locale. Do you really need os.device_encoding()? If we change TextIOWrapper to call locale.getpreferredencoding(False), os.device_encoding() and locale.getpreferredencoding(False) will give the same result. Except on Windows: os.device_encoding() uses GetConsoleCP() if fd==0 and GetConsoleOutputCP() if fd in (1, 2). But we can use GetConsoleCP() and GetConsoleOutputCP() directly in initstdio(). If someone closes sys.std* and recreate them later: os.device_encoding() can be use explicitly to keep the previous behaviour.
If Python is embeded, it should not change the locale. Even if it is not embeded, it is maybe better to never set LC_CTYPE. It is too late to touch such critical point in Python 3.2, but we may change it in Python 3.3. |
Python can be embedded into other applications and unconditionally If at all, Python should be more careful using this call (pseudo lc_ctype = setlocale(LC_CTYPE, NULL);
if (lc_ctype == NULL || strcmp(lc_ctype, "") || strcmp(lc_ctype, "C")) {
env_lc_ctype = setlocale(LC_CTYPE, "");
setlocale(LC_CTYPE, lc_ctype);
lc_ctype = env_lc_ctype;
} Then use lc_ctype to figure out encodings, etc. While this is not thread-safe, it at least reverts the change back An clean alternative would be adding LC_* variable parsing code to |
That would be highly non-portable, and repeat the mistakes of |
Martin v. Löwis wrote:
You say that often, but I don't really know why. It's certainly portable BTW: For Windows, you can adjust setlocale() to work thread-based Perhaps we ought to expose this in _locale and use it in |
No, it's absolutely not portable across Unix platforms. Looking at Other systems may use other databases to map a locale name to locale Unless you know exactly what version of C library is running on |
Martin v. Löwis: |
More likely, it's my email reader. Sorry about that. |
Maybe could it be useful to specify in the documentation that getlocale() is not intended to be used to get information about what is the locale of the system? It's not explained currently and thus it's a bit weird to have getlocale returning (None, None) even if you have your locales set. |
This issue is about the fact that it doesn't return (None, None). We should probably decide what we are going to do about that before changing the docs if they need it. |
I see two different things here:
My last remark is about the second bit. Maybe should I start a new issue |
Yes a new issue would be more appropriate. |
If the thread safety of setlocale() is a problem, does anybody know how portable uselocale() is? It sets the locale of the current thread only, so it's safe to temporarily change the locale and then set it back. |
Leaving LC_CTYPE unchanged (use the "C" locale, which is ASCII in most Setting the LC_CTYPE to the user preferred encoding is just very So it's just a documentation issue: see my attached patch. |
LGTM |
New changeset 113cdce4663c by Victor Stinner in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: