New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
locale.getpreferredencoding() must not set temporary LC_CTYPE #55231
Comments
This bug may be based on same problem as bpo-6203.
|
Yes, this is the expected behaviour with the current code. TextIOWrapper uses indirectly locale.getpreferredencoding() to choose your file encoding. If locale has the CODESET constant, this function sets LC_CTYPE to "" and uses nl_langinfo(CODESET) to get the locale encoding. locale.getpreferredencoding() has an option to not set the LC_CTYPE to "": locale.getpreferredencoding(False). Example: $ python3.1
Type "help", "copyright", "credits" or "license" for more information.
>>> from locale import getpreferredencoding, setlocale, LC_CTYPE
>>> from locale import nl_langinfo, CODESET
>>> setlocale(LC_CTYPE, None)
'fr_FR.utf8'
>>> getpreferredencoding()
'UTF-8'
>>> getpreferredencoding(False)
'UTF-8'
>>> setlocale(LC_CTYPE, 'fr_FR.iso88591')
'fr_FR.iso88591'
>>> nl_langinfo(CODESET)
'ISO-8859-1'
>>> getpreferredencoding()
'UTF-8'
>>> getpreferredencoding(False)
'ISO-8859-1' Setting LC_CTYPE does change directly nl_langinfo(CODESET) result, but not getpreferredencoding() result because getpreferredencoding() doesn't care of the current locale: it uses its own LC_CTYPE value (""). getpreferredencoding(False) uses the current locale and give the expected result.
Set LC_ALL works because getpreferredencoding() sets the LC_CTYPE to "" which will read the current value of the "LC_ALL" and "LC_CTYPE" environment variables. -- Actually, TextIOWrapper doesn't use the current locale, it only uses (indirectly) the environment variables. I don't know which behaviour is better. If you would like that TextIOWrapper uses your current locale, use: open(filename, encoding=locale.getpreferredencoding(True)). Anyway, I don't know understand why do you change your locale, because you know that your file encoding is Latin1. Why don't you use directly: open(filename, encoding='latin1')? |
Nope, both issues are different. Here you want that TextIOWrapper reads your current locale, and not your environment variables. Issue bpo-6203 asks why LC_CTYPE is not C by default, but the user locale LC_CTYPE (read from LC_ALL or LC_CTYPE environment variables). |
Fortunately bpo-9124 is being solved soon due to the very active My misunderstanding was based upon an old project of mine, If Python would be my project, i would change this code, What i really have to say is that the (3.1) implementation of getpreferredencoding() is horror, not only in respect to SMP
|
I don't think it's intentional. I would be +1 on changing to getpreferredencoding(False). |
That won't actually work. If you always use the C library's locale |
Set version to 3.3, I think that it is too late to change such critical code in Python 3.2. |
Python 3 does something like that: Py_InitializeEx() calls setlocale(LC_CTYPE, ""). But I (and others) consider that as a bug (see bpo-6203 discussion): Python should not do that (nor any library) implicitly, but a *program* can do that (once) at startup (explicitly). |
Well, is it any different from today? That's an innocent question: I |
STINNER Victor wrote:
Agreed. See the discussion on the ticket for more details. setlocale() should only be called by applications, not by libraries. |
Also in respect to bpo-6203 i could talk about a project which did not link against anything in the end, only ld(1) and syscalls and the undocumented third 'char **envp' arg to UNIX main()s.
Conclusion: you need a locale.
So - what are you all talking about? I would indeed insist on the following:
After the end: |
Attached patch replaces locale.getpreferredencoding() by locale.getpreferredencoding(False) in _io.TextIOWrapper and _pyio.TextIOWrapper. |
Steffan: I'm not sure what your post means, but I think there is a chance you might be confused about something. Python should *never* change the locale from the C locale. A Python *program* can do so, by calling setlocale, but Python itself should not. This is because when an arbitrary Python program is run, it needs to run in the C locale *unless it chooses otherwise*. To do anything else would produce a myriad portability problems for any code that is affected by locale settings (especially when the programmer doesn't know that it is so affected). This is orthogonal to the issue of deciding what encoding to use for various bits of I/O, where Python may need to discover what locale the user has chosen as a default. It's too bad libc makes this so hard to do safely. |
Most of this is much too loud for a newbie who is about to read PEP-7 anyway. And if this community has chosen to try (?!?) not to break compatibility with code which does not have a notion of a locale setting (i.e. naively uses other code in that spirit), you know, then this is simply the way it is. Thus: you're right. I do agree with what you say, we here have a (8-bit) C++ library which does this in it's setup():
(Like i said: we here went completely grazy and avoid system libraries whenever possible and at least directly, doing the stuff ourselfs and only with syscalls.) Besides that i would agree with me that unthreaded init, optional embeddor locale argument, cleanup of .getprefer...() and other drops of setlocale() are/would be good design decisions. And of course: "keeping the thing simple and understandable" is a thing to keep in mind in respect to a normal user. After the end (i have to excuse myself once again for a book):
|
I think it's absolutely necessary that text files, by default, are opened in the encoding of the user's locale, whether the script has called setlocale or not. There are reasons for C to not automatically call setlocale at startup (mostly backwards compatibility), but they don't apply to Python. |
New changeset 2587328c7c9c by Victor Stinner in branch 'default': |
New changeset 6651c932d014 by Florent Xicluna in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: