Issue45232
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2021-09-17 12:30 by od-cea, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Messages (6) | |||
---|---|---|---|
msg402046 - (view) | Author: Olivier Delhomme (od-cea) | Date: 2021-09-17 12:30 | |
$ python3 --version Python 3.6.4 Setting LANG to en_US.UTF8 works like a charm $ export LANG=en_US.UTF8 $ python3 Python 3.6.4 (default, Jan 11 2018, 16:45:55) [GCC 4.8.5] on linux Type "help", "copyright", "credits" or "license" for more information. >>> machaine='ééééhelp me if you can' >>> print('{}'.format(machaine)) ééééhelp me if you can Unsetting LANG shell variable fails the program: $ unset LANG $ python3 Python 3.6.4 (default, Jan 11 2018, 16:45:55) [GCC 4.8.5] on linux Type "help", "copyright", "credits" or "license" for more information. >>> machaine='ééééhelp me if you can' File "<stdin>", line 0 ^ SyntaxError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128) Setting LANG inside the program does not change this behavior: $ unset LANG $ python3 Python 3.6.4 (default, Jan 11 2018, 16:45:55) [GCC 4.8.5] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.environ['LANG'] = 'en_US.UTF8' >>> machaine='ééééhelp me if you can' File "<stdin>", line 0 ^ SyntaxError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128) Is this an expected behavior ? How can I force an utf8 codec ? |
|||
msg402051 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2021-09-17 13:19 | |
Yes, this is intended. ASCII is used as fallback in case Python cannot determine the I/O encoding to use during startup. This is also the reason why later changes to the environment have no affect on this - the determination of the encoding has already been applied. You can force UTF-8 by enabling the UTF-8 mode: export PYTHONUTF8=1 This will then have Python use UTF-8 regardless of the LANG env var setting. |
|||
msg402054 - (view) | Author: Olivier Delhomme (od-cea) | Date: 2021-09-17 13:45 | |
Hi Marc-Andre, Please note that setting PYTHONUTF8 with "export PYTHONUTF8=1": * Is external to the program and user dependent * It does not seems to work on my use case: $ unset LANG $ export PYTHONUTF8=1 $ python3 Python 3.6.4 (default, Jan 11 2018, 16:45:55) [GCC 4.8.5] on linux Type "help", "copyright", "credits" or "license" for more information. >>> machaine='ééééhelp me if you can' File "<stdin>", line 0 ^ SyntaxError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128) Regards, Olivier. |
|||
msg402058 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2021-09-17 14:10 | |
On 17.09.2021 15:45, Olivier Delhomme wrote: > > Olivier Delhomme <olivier.delhomme@cea.fr> added the comment: > > Hi Marc-Andre, > > Please note that setting PYTHONUTF8 with "export PYTHONUTF8=1": > > * Is external to the program and user dependent > * It does not seems to work on my use case: > > $ unset LANG > $ export PYTHONUTF8=1 > $ python3 > Python 3.6.4 (default, Jan 11 2018, 16:45:55) > [GCC 4.8.5] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> machaine='ééééhelp me if you can' > File "<stdin>", line 0 > > ^ > SyntaxError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128) UTF-8 mode is only supported in Python 3.7 and later: https://docs.python.org/3/whatsnew/3.7.html#whatsnew37-pep540 -- Marc-Andre Lemburg eGenix.com |
|||
msg402059 - (view) | Author: Olivier Delhomme (od-cea) | Date: 2021-09-17 14:54 | |
>> Hi Marc-Andre, >> >> Please note that setting PYTHONUTF8 with "export PYTHONUTF8=1": >> >> * Is external to the program and user dependent >> * It does not seems to work on my use case: >> >> $ unset LANG >> $ export PYTHONUTF8=1 >> $ python3 >> Python 3.6.4 (default, Jan 11 2018, 16:45:55) >> [GCC 4.8.5] on linux >> Type "help", "copyright", "credits" or "license" for more information. >> >>> machaine='ééééhelp me if you can' >> File "<stdin>", line 0 >> >> ^ >> SyntaxError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128) > > UTF-8 mode is only supported in Python 3.7 and later: > > https://docs.python.org/3/whatsnew/3.7.html#whatsnew37-pep540 Oh. Thanks. $ unset LANG $ export PYTHONUTF8=1 $ python3 Python 3.7.5 (default, Dec 24 2019, 08:52:13) [GCC 4.8.5] on linux Type "help", "copyright", "credits" or "license" for more information. >>> machaine='ééééhelp me if you can' >>> From the code point of view: $ unset LANG $ unset PYTHONUTF8 $ python3 Python 3.7.5 (default, Dec 24 2019, 08:52:13) [GCC 4.8.5] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.environ['PYTHONUTF8'] = '1' >>> machaine='ééééhelp me if you can' >>> Even better: $ unset LANG $ unset PYTHONUTF8 $ python3 Python 3.7.5 (default, Dec 24 2019, 08:52:13) [GCC 4.8.5] on linux Type "help", "copyright", "credits" or "license" for more information. >>> machaine='ééééhelp me if you can' >>> Works as expected. Thank you very much. You can close this bug report. Regards, Olivier. |
|||
msg402062 - (view) | Author: Eryk Sun (eryksun) * | Date: 2021-09-17 15:32 | |
Python 3.7+ doesn't need to explicitly enable UTF-8 mode in this case on POSIX systems. If the locale encoding is the "POSIX" or "C" locale, and "C" locale coercion is not disabled via LC_ALL or PYTHONCOERCECLOCALE=0, the interpreter tries to coerce the LC_CTYPE locale to "C.UTF-8", "C.utf8", or "UTF-8". If these attempts fail, or if coercion is disabled, the interpreter will automatically enable UTF-8 mode, unless that's also explicitly disabled. For example: $ unset LANG $ unset LC_ALL $ unset PYTHONCOERCECLOCALE $ unset PYTHONUTF8 $ python -c 'import locale; print(locale.getpreferredencoding())' UTF-8 $ PYTHONCOERCECLOCALE=0 python -c 'import locale; print(locale.getpreferredencoding())' UTF-8 $ PYTHONUTF8=0 python -c 'import locale; print(locale.getpreferredencoding())' UTF-8 $ PYTHONCOERCECLOCALE=0 PYTHONUTF8=0 python -c 'import locale; print(locale.getpreferredencoding())' ANSI_X3.4-1968 |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:50 | admin | set | github: 89395 |
2021-09-17 15:32:18 | eryksun | set | nosy:
+ eryksun messages: + msg402062 |
2021-09-17 15:09:12 | lemburg | set | status: open -> closed resolution: not a bug stage: resolved |
2021-09-17 14:54:29 | od-cea | set | messages: + msg402059 |
2021-09-17 14:10:40 | lemburg | set | messages: + msg402058 |
2021-09-17 13:45:45 | od-cea | set | messages: + msg402054 |
2021-09-17 13:19:59 | lemburg | set | nosy:
+ lemburg messages: + msg402051 |
2021-09-17 12:30:29 | od-cea | create |