msg88932 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2009-06-05 10:56 |
In the Library Reference section 22.2.1 for locale, it states:
"Initially, when a program is started, the locale is the C locale, no
matter what the user’s preferred locale is. The program must explicitly
say that it wants the user’s preferred locale settings by calling
setlocale(LC_ALL, '')."
This is the case for python2.x:
$ export LANG=en_US.UTF-8
$ python2.5
Python 2.5.4 (r254:67916, Feb 17 2009, 20:16:45)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale; locale.getlocale()
(None, None)
>>> locale.getdefaultlocale()
('en_US', 'UTF8')
>>>
but not for 3.1:
$ python3.1
Python 3.1a1+ (py3k, Mar 23 2009, 00:12:12)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale; locale.getlocale()
('en_US', 'UTF8')
>>> locale.getdefaultlocale()
('en_US', 'UTF8')
>>>
Either the code is incorrect in 3.1 or the documentation should be
updated.
|
msg89016 - (view) |
Author: Ezio Melotti (ezio.melotti) * |
Date: 2009-06-06 21:00 |
Confirmed for 3.1, 3.0 still returns (None, None).
|
msg89077 - (view) |
Author: Georg Brandl (georg.brandl) * |
Date: 2009-06-08 13:29 |
Deferring to Martin which one is correct :)
|
msg89084 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2009-06-08 16:01 |
This is definately a bug in 3.1, for the same reason that a C program
uses the C locale until an explicit setlocale is done: otherwise, a
non-locale-aware program can run into bugs resulting from locale issues
when run under a different locale than that of the program author.
I have a memory of this being reported before somewhere and someone
tracking it down to a change in python initialization, but I can't find
a bug report and my google-foo is failing me.
|
msg89088 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-06-08 16:17 |
For some reason only LC_CTYPE is affected:
>>> locale.getlocale(locale.LC_CTYPE)
('fr_FR', 'UTF8')
>>> locale.getlocale(locale.LC_MESSAGES)
(None, None)
>>> locale.getlocale(locale.LC_TIME)
(None, None)
>>> locale.getlocale(locale.LC_NUMERIC)
(None, None)
>>> locale.getlocale(locale.LC_COLLATE)
(None, None)
|
msg89089 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2009-06-08 16:22 |
Ah, I can tell you exactly why that is, then. I noticed this in
pythonrun.c while grepping the source:
#ifdef HAVE_SETLOCALE
/* Set up the LC_CTYPE locale, so we can obtain
the locale's charset without having to switch
locales. */
setlocale(LC_CTYPE, "");
#endif
SVN blames Martin in r56922, so this case is assigned appropriately.
Perhaps changing only LC_CTYPE is safe? I must admit to ignorance as to
what all the LC variables mean/control.
|
msg89090 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-06-08 16:26 |
It would still be better it is was unset afterwards. Third-party
extensions could have LC_CTYPE-dependent behaviour.
|
msg89101 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2009-06-08 19:39 |
> It would still be better it is was unset afterwards. Third-party
> extensions could have LC_CTYPE-dependent behaviour.
In principle, they could, yes - but what specific behavior might that
be? What will change is character classification, which I consider
fairly harmless. Also, multi-byte conversion routines will change, which
is the primary reason for leaving it modified.
|
msg89102 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-06-08 19:43 |
> In principle, they could, yes - but what specific behavior might that
> be? What will change is character classification, which I consider
> fairly harmless. Also, multi-byte conversion routines will change, which
> is the primary reason for leaving it modified.
Ok, so I suppose we could leave the code as-is.
|
msg89120 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2009-06-08 21:51 |
Since it controls what is considered to be whitespace, it is possible
this will lead to subtle bugs, but I agree that it seems relatively
benign, especially considering 3.x's unicode orientation. So, this
becomes a doc bug...
|
msg89136 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2009-06-09 07:10 |
To add a little bit more analysis: posix.device_encoding requires that
the LC_CTYPE is set. Setting it just in this function would not be
possible, as setlocale is not thread-safe.
So for 3.1, it seems that Python must set LC_CTYPE. If somebody can
propose a patch that avoids that for 3.2, I'd be certainly in favor.
|
msg127180 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2011-01-27 11:40 |
> To add a little bit more analysis: posix.device_encoding requires that
> the LC_CTYPE is set. Setting it just in this function would not be
> possible, as setlocale is not thread-safe.
open() does indirectly (locale.getpreferredencoding()) change temporary the locale (set LC_CTYPE to "") if the file is not a TTY (if it is a TTY, device_encoding() calls nl_langinfo(CODESET) without changing the current locale). If setlocale() is not thread-safe we have (maybe?) a problem here. See also #11022: report of an user not understanding why setlocale() doesn't impact open() (TextIOWrapper) encoding). A quick solution is to call locale.getpreferredencoding(False) which doesn't change the locale.
Do you really need os.device_encoding()? If we change TextIOWrapper to call locale.getpreferredencoding(False), os.device_encoding() and locale.getpreferredencoding(False) will give the same result. Except on Windows: os.device_encoding() uses GetConsoleCP() if fd==0 and GetConsoleOutputCP() if fd in (1, 2). But we can use GetConsoleCP() and GetConsoleOutputCP() directly in initstdio(). If someone closes sys.std* and recreate them later: os.device_encoding() can be use explicitly to keep the previous behaviour.
> It would still be better it is was unset afterwards. Third-party
> extensions could have LC_CTYPE-dependent behaviour.
If Python is embeded, it should not change the locale. Even if it is not embeded, it is maybe better to never set LC_CTYPE.
It is too late to touch such critical point in Python 3.2, but we may change it in Python 3.3.
|
msg127262 - (view) |
Author: Marc-Andre Lemburg (lemburg) * |
Date: 2011-01-28 09:27 |
Python can be embedded into other applications and unconditionally
changing the locale (esp. the LC_CTYPE) is not good practice, since
it's not thread-safe and affects the entire process. An application
may have set LC_CTYPE (or the locale) to something completely
different.
If at all, Python should be more careful using this call (pseudo
code):
lc_ctype = setlocale(LC_CTYPE, NULL);
if (lc_ctype == NULL || strcmp(lc_ctype, "") || strcmp(lc_ctype, "C")) {
env_lc_ctype = setlocale(LC_CTYPE, "");
setlocale(LC_CTYPE, lc_ctype);
lc_ctype = env_lc_ctype;
}
Then use lc_ctype to figure out encodings, etc.
While this is not thread-safe, it at least reverts the change back
to the original setting and only applies the change if needed. That's
still not optimal, but better than nothing.
An clean alternative would be adding LC_* variable parsing code to
Python to avoid the setlocale() call altogether.
|
msg127265 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2011-01-28 09:33 |
> An clean alternative would be adding LC_* variable parsing code to
> Python to avoid the setlocale() call altogether.
That would be highly non-portable, and repeat the mistakes of
getdefaultlocale.
|
msg127283 - (view) |
Author: Marc-Andre Lemburg (lemburg) * |
Date: 2011-01-28 11:05 |
Martin v. Löwis wrote:
>
> Martin v. Löwis <martin@v.loewis.de> added the comment:
>
>> An clean alternative would be adding LC_* variable parsing code to
>> Python to avoid the setlocale() call altogether.
>
> That would be highly non-portable, and repeat the mistakes of
> getdefaultlocale.
You say that often, but I don't really know why. It's certainly portable
between various Unix platforms, perhaps not Windows, but then i18n
on Windows is a different story altogether.
BTW: For Windows, you can adjust setlocale() to work thread-based
using: _configthreadlocale()
(http://msdn.microsoft.com/de-de/library/26c0tb7x(v=vs.80).aspx)
Perhaps we ought to expose this in _locale and use it in
getdefaultlocal() on Windows to query the locale settings
via the pseudocode I posted.
|
msg127347 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2011-01-28 21:22 |
>> That would be highly non-portable, and repeat the mistakes of
>> getdefaultlocale.
>
> You say that often, but I don't really know why. It's certainly portable
> between various Unix platforms, perhaps not Windows, but then i18n
> on Windows is a different story altogether.
No, it's absolutely not portable across Unix platforms. Looking at
LANG or LC_ALL does *not* allow you to infer the region name, or
the locale's character set. For example, using glibc, in some
installations, /etc/locale.alias is considered to map a value of LANG
to the final locale name. As an option, glibc also considers a
LOCALE_ALIAS_PATH that may point to a (colon-separated) path of
files to search for locale aliases.
Other systems may use other databases to map a locale name to locale
properties.
Unless you know exactly what version of C library is running on
a system, parsing environment variables yourself is doomed to fail.
|
msg127350 - (view) |
Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * |
Date: 2011-01-28 21:36 |
Martin v. Löwis:
It seems that your web browser replaces ", " with ",\t" in the title (where "\t" is a tab character) each time you add a comment.
|
msg127351 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2011-01-28 21:38 |
More likely, it's my email reader. Sorry about that.
|
msg127417 - (view) |
Author: Steffen Daode Nurpmeso (sdaoden) |
Date: 2011-01-29 13:51 |
User lemburg pointed me to this, but no, i've posted msg127416 to Issue 11022.
|
msg141830 - (view) |
Author: Alexis Metaireau (alexis) * |
Date: 2011-08-09 15:53 |
Maybe could it be useful to specify in the documentation that getlocale() is not intended to be used to get information about what is the locale of the system?
It's not explained currently and thus it's a bit weird to have getlocale returning (None, None) even if you have your locales set.
|
msg141847 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-08-10 00:24 |
This issue is about the fact that it doesn't return (None, None). We should probably decide what we are going to do about that before changing the docs if they need it.
|
msg141872 - (view) |
Author: Alexis Metaireau (alexis) * |
Date: 2011-08-10 16:05 |
I see two different things here:
1) the fact that getlocale() doesn't return (None, None) on some python
versions
2) the fact that having it returning (None, None) by default is a bit
misleading as users may think that getlocale() is tied to environment
variables. That's what was at the origin of #12699
My last remark is about the second bit. Maybe should I start a new issue
for this?
|
msg141890 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-08-11 01:25 |
Yes a new issue would be more appropriate.
|
msg147174 - (view) |
Author: Petri Lehtinen (petri.lehtinen) * |
Date: 2011-11-06 19:48 |
If the thread safety of setlocale() is a problem, does anybody know how portable uselocale() is? It sets the locale of the current thread only, so it's safe to temporarily change the locale and then set it back.
|
msg162340 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-06-05 12:02 |
> Either the code is incorrect in 3.1
> or the documentation should be updated.
Leaving LC_CTYPE unchanged (use the "C" locale, which is ASCII in most
cases) at Python startup would be a major change in Python 3. I don't
want to change this. You would see a lot of mojibake in your GUIs and get a lot of ugly surrogate characters in filenames (because of the PEP
393) if we don't set the LC_CTYPE to the user preferred encoding at startup anymore.
Setting the LC_CTYPE to the user preferred encoding is just very
convinient and helps Python to speak to the user though the console,
to the filesystem, to pass arguments on a command line of a
subprocess, etc. For example, you cannot pass non-ASCII characters to
a subprocess, characters written by the user in your GUI, if your
current LC_CTYPE locale is C (ASCII): you get an Unicode encode error.
So it's just a documentation issue: see my attached patch.
|
msg162355 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2012-06-05 16:24 |
LGTM
|
msg162380 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2012-06-05 23:39 |
New changeset 113cdce4663c by Victor Stinner in branch 'default':
Close #6203: Document that Python 3 sets LC_CTYPE at startup to the user's preferred locale encoding
http://hg.python.org/cpython/rev/113cdce4663c
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:49 | admin | set | github: 50452 |
2012-06-05 23:39:39 | python-dev | set | status: open -> closed
nosy:
+ python-dev messages:
+ msg162380
resolution: fixed stage: needs patch -> resolved |
2012-06-05 16:24:06 | ned.deily | set | messages:
+ msg162355 |
2012-06-05 12:03:57 | vstinner | set | title: 3.x locale does not default to C, contrary to the documentation and to 2.x behavior -> locale documentation doesn't mention that LC_CTYPE is changed at startup components:
+ Unicode versions:
+ Python 3.2 |
2012-06-05 12:02:58 | vstinner | set | files:
+ locale_doc.patch keywords:
+ patch messages:
+ msg162340
|
2011-11-06 19:48:10 | petri.lehtinen | set | nosy:
+ petri.lehtinen messages:
+ msg147174
|
2011-08-11 01:25:33 | r.david.murray | set | messages:
+ msg141890 |
2011-08-10 16:05:48 | alexis | set | messages:
+ msg141872 title: 3.x locale does not default to C, contrary to the documentation and to 2.x behavior -> 3.x locale does not default to C, contrary to the documentation and to 2.x behavior |
2011-08-10 00:24:02 | r.david.murray | set | messages:
+ msg141847 |
2011-08-09 15:53:51 | alexis | set | nosy:
+ alexis messages:
+ msg141830
|
2011-08-05 21:34:37 | ned.deily | link | issue12699 superseder |
2011-01-29 13:51:48 | sdaoden | set | nosy:
+ sdaoden messages:
+ msg127417
|
2011-01-28 21:38:45 | loewis | set | nosy:
lemburg, loewis, georg.brandl, pitrou, vstinner, ned.deily, ezio.melotti, Arfrever, r.david.murray messages:
+ msg127351 |
2011-01-28 21:36:54 | Arfrever | set | nosy:
lemburg, loewis, georg.brandl, pitrou, vstinner, ned.deily, ezio.melotti, Arfrever, r.david.murray messages:
+ msg127350 |
2011-01-28 21:22:14 | loewis | set | nosy:
lemburg, loewis, georg.brandl, pitrou, vstinner, ned.deily, ezio.melotti, Arfrever, r.david.murray messages:
+ msg127347 title: 3.x locale does not default to C, contrary to the documentation and to 2.x behavior -> 3.x locale does not default to C, contrary to the documentation and to 2.x behavior |
2011-01-28 15:01:17 | Arfrever | set | nosy:
lemburg, loewis, georg.brandl, pitrou, vstinner, ned.deily, ezio.melotti, Arfrever, r.david.murray title: 3.x locale does not default to C, contrary to the documentation and to 2.x behavior -> 3.x locale does not default to C, contrary to the documentation and to 2.x behavior |
2011-01-28 11:05:45 | lemburg | set | nosy:
lemburg, loewis, georg.brandl, pitrou, vstinner, ned.deily, ezio.melotti, Arfrever, r.david.murray messages:
+ msg127283 |
2011-01-28 09:33:39 | loewis | set | nosy:
lemburg, loewis, georg.brandl, pitrou, vstinner, ned.deily, ezio.melotti, Arfrever, r.david.murray messages:
+ msg127265 title: 3.x locale does not default to C, contrary to the documentation and to 2.x behavior -> 3.x locale does not default to C, contrary to the documentation and to 2.x behavior |
2011-01-28 09:27:54 | lemburg | set | nosy:
+ lemburg messages:
+ msg127262
|
2011-01-27 16:58:10 | Arfrever | set | nosy:
+ Arfrever
|
2011-01-27 11:40:07 | vstinner | set | nosy:
+ vstinner
messages:
+ msg127180 versions:
+ Python 3.3, - Python 3.2 |
2010-10-29 10:07:21 | admin | set | assignee: georg.brandl -> docs@python |
2009-12-30 01:46:52 | r.david.murray | set | versions:
+ Python 3.2, - Python 3.1 |
2009-06-09 10:43:42 | pitrou | set | assignee: georg.brandl |
2009-06-09 07:10:25 | loewis | set | assignee: loewis -> (no value) messages:
+ msg89136 |
2009-06-08 21:51:50 | r.david.murray | set | priority: release blocker -> high
messages:
+ msg89120 components:
- Library (Lib) nosy:
loewis, georg.brandl, pitrou, ned.deily, ezio.melotti, r.david.murray |
2009-06-08 19:43:09 | pitrou | set | messages:
+ msg89102 title: 3.x locale does not default to C, contrary to the documentation and to 2.x behavior -> 3.x locale does not default to C, contrary to the documentation and to 2.x behavior |
2009-06-08 19:39:29 | loewis | set | messages:
+ msg89101 title: 3.x locale does not default to C, contrary to the documentation and to 2.x behavior -> 3.x locale does not default to C, contrary to the documentation and to 2.x behavior |
2009-06-08 16:26:25 | pitrou | set | messages:
+ msg89090 |
2009-06-08 16:22:10 | r.david.murray | set | messages:
+ msg89089 |
2009-06-08 16:17:53 | pitrou | set | nosy:
+ pitrou messages:
+ msg89088
|
2009-06-08 16:01:05 | r.david.murray | set | priority: normal -> release blocker
nosy:
+ r.david.murray messages:
+ msg89084
stage: needs patch |
2009-06-08 13:29:54 | georg.brandl | set | assignee: georg.brandl -> loewis
messages:
+ msg89077 nosy:
+ loewis |
2009-06-06 21:00:39 | ezio.melotti | set | priority: normal
nosy:
+ ezio.melotti messages:
+ msg89016
components:
+ Library (Lib) |
2009-06-05 10:56:37 | ned.deily | create | |