Issue24968
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2015-08-31 08:42 by rsc1975, last changed 2022-04-11 14:58 by admin. This issue is now closed.
Messages (7) | |||
---|---|---|---|
msg249390 - (view) | Author: Roberto Sánchez (rsc1975) | Date: 2015-08-31 08:42 | |
System: Python 3.4.2 on Linux Fedora 22 This issues is strongly related with: http://bugs.python.org/issue19846 But It isn't exactly the same case. When I connect from my Mac OSX (using Terminal.app) to a Linux host with Fedora through ssh, the terminal session is forced to the OSX locale (default behavior in Terminal.app): [rob@fedora22 ~]$ locale locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory LANG=es_ES.UTF-8 LC_CTYPE="es_ES.UTF-8" LC_NUMERIC="es_ES.UTF-8" LC_TIME="es_ES.UTF-8" LC_COLLATE="es_ES.UTF-8" LC_MONETARY="es_ES.UTF-8" LC_MESSAGES="es_ES.UTF-8" LC_PAPER="es_ES.UTF-8" LC_NAME="es_ES.UTF-8" LC_ADDRESS="es_ES.UTF-8" LC_TELEPHONE="es_ES.UTF-8" LC_MEASUREMENT="es_ES.UTF-8" LC_IDENTIFICATION="es_ES.UTF-8" LC_ALL= However the installed locales in Fedora are: [rob@fedora22 ~]$ localectl list-locales en_US en_US.iso88591 en_US.iso885915 en_US.utf8 <-- This is the default one And if a launch python3 I get: [rob@fedora22 ~]$ python3 Python 3.4.2 (default, Jul 9 2015, 17:24:30) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import os, codecs, sys, locale >>> locale.getpreferredencoding() 'ANSI_X3.4-1968' >>> codecs.lookup(locale.getpreferredencoding()).name 'ascii' >>> locale.getdefaultlocale() ('es_ES', 'UTF-8') >>> sys.stdout.encoding 'ANSI_X3.4-1968' >>> sys.getfilesystemencoding() 'ascii' >>> print('España') File "<stdin>", line 0 ^ SyntaxError: 'ascii' codec can't decode byte 0xc3 in position 11: ordinal not in range(128) So, If I'm understanding correctly, If the current locale is not supported by the system then python fallback to ascii. I can understand this behavior when the supported locales and the current one has different encoding, but if both of them are 'utf-8' It sounds reasonable that locale.getpreferredencoding() is set to 'utf-8'. This case is causing that programs with CLI (Command Line Interface) fails, if you are using a third party like click lib, a RuntimeException is thrown by the own lib, I learned it by the hard way, the python3 CLI programs need a valid encoding to deal with stdin/stdout, and in this case all systems seems correctly configured about the encoding, I mean, this is a real case, there is no manual locale config modification, IMHO the current behavior seems a bit strict. |
|||
msg249399 - (view) | Author: STINNER Victor (vstinner) * | Date: 2015-08-31 11:38 | |
It's not a bug on Python, but a bug on your system. > New submission from Roberto Sánchez: > [rob@fedora22 ~]$ locale > locale: Cannot set LC_CTYPE to default locale: No such file or directory This message means that the chosen locale doesn't exist. > LANG=es_ES.UTF-8 ... > [rob@fedora22 ~]$ localectl list-locales > .... > en_US.utf8 <-- This is the default one LANG must be en_US.utf8. |
|||
msg249400 - (view) | Author: Nick Coghlan (ncoghlan) * | Date: 2015-08-31 13:02 | |
CPython inherits this behaviour from glibc's locale handling, so it's potentially worth raising the question further upstream. If anyone wanted to pursue that, looking at http://www.gnu.org/software/libc/development.html suggests to me that the appropriate starting point would be to email libc-help@sourceware.org and ask for advice. |
|||
msg249401 - (view) | Author: Roberto Sánchez (rsc1975) | Date: 2015-08-31 13:03 | |
OK, I already knew that "It is not a bug", but the scenario seems quite common, connection to a Linux host from a Mac with Terminal.app and different locales (default behavior), so a bit of "magic" when the locale's encoding part is correct would help to deal with some Unicode issues in python3 scripts. I just say that It would be a desirable enhancement, but I have no idea how to complex can be to change the current behavior, maybe It isn't worth the effort. |
|||
msg249404 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2015-08-31 15:28 | |
I believe there is at least one open issue about Python adopting utf8 as the default instead of ASCII, and in any case, several conversations about how to deal with all this better. This is just one example of a class of issues caused by the ASCII/C posix default locale, in different contexts. |
|||
msg249441 - (view) | Author: Nick Coghlan (ncoghlan) * | Date: 2015-09-01 00:05 | |
Looking again at the *specific* bug report here, I'm moving the resolution to "out of date", as it's actually the one we addressed in 3.5 by enabling surrogateescape by default on all of the standard streams when the OS claims the locale encoding is ASCII, not just stderr: http://bugs.python.org/issue19977 That allows us to at least correctly roundtrip data, even if the OS has given has bad encoding settings. The problem with forcing UTF-8 more generally when the OS claims ASCII is that it may be the wrong thing to do and result in data corruption, especially on systems using East Asian codecs. Querying /etc/locale.conf [1] instead of relying on the nominal glibc locale settings should reliably give us correct encoding/locale information on modern Linux systems in cases like this one, where SSH has forwarded mismatched locale settings from a client system to a server shell session. Another issue with relevant background discussion is issue #23993, which speculated on extending the "default to surrogateescape" idea to all open() calls when glibc claims the locale encoding is ASCII. [1] http://www.freedesktop.org/software/systemd/man/locale.conf.html |
|||
msg249464 - (view) | Author: Roberto Sánchez (rsc1975) | Date: 2015-09-01 07:47 | |
Ok, that makes sense, besides David pointed me about another opened issue that could help to solve cases like this: http://bugs.python.org/issue15216 If the encoding is wrong because the environment but we can change the initial stream encodings (in stdin/out) easily we have a powerful tool to adapt our scripts and patch broken locales like the generated with SSH sessions. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:20 | admin | set | github: 69156 |
2015-09-01 07:47:23 | rsc1975 | set | messages: + msg249464 |
2015-09-01 00:05:15 | ncoghlan | set | resolution: not a bug -> out of date messages: + msg249441 |
2015-08-31 15:28:38 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg249404 |
2015-08-31 13:03:13 | rsc1975 | set | messages: + msg249401 |
2015-08-31 13:02:23 | ncoghlan | set | messages: + msg249400 |
2015-08-31 11:38:43 | vstinner | set | status: open -> closed resolution: not a bug messages: + msg249399 |
2015-08-31 09:02:46 | serhiy.storchaka | set | nosy:
+ lemburg, loewis, ncoghlan, serhiy.storchaka |
2015-08-31 08:42:12 | rsc1975 | create |