classification
Title: PYTHONCOERCECLOCALE no longer being respected
Type: behavior Stage: resolved
Components: Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: anthony shaw, pmpp, vstinner
Priority: normal Keywords:

Created on 2018-05-02 06:18 by anthony shaw, last changed 2018-05-02 08:47 by vstinner. This issue is now closed.

Messages (6)
msg316045 - (view) Author: anthony shaw (anthony shaw) Date: 2018-05-02 06:18
observing a behaviour on Python 3.7 b2 that doesn't match what's documented in PEP 538

PEP 538 states that the locale coercion behaviour can be disabled through the PYTHONCOERCECLOCALE environment variable.
I would then expect the stdin encoding to be the same as Python 3.6 when the C locale is specified with no encoding value. 

bash-3.2$ LANG=C python3.6 -c "import sys; print(sys.stdin.encoding)"
US-ASCII
bash-3.2$ LANG=C python3.7 -c "import sys; print(sys.stdin.encoding)"
utf-8
bash-3.2$ PYTHONCOERCECLOCALE=0 LANG=C python3.7 -c "import sys; print(sys.stdin.encoding)"
utf-8

LC_ALL is not set

bash-3.2$ locale
LANG="C"
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=

Trying to dig into the reason why the env flag isn't disabling the behaviour I found some subsequent changes after the PEP which look to have broken the original implementation behaviour.

https://github.com/python/cpython/commit/9454060e84a669dde63824d9e2fcaf295e34f687
msg316049 - (view) Author: pmp-p (pmpp) * Date: 2018-05-02 07:08
indeed 
a3+ says :
PYTHONCOERCECLOCALE=0 LANG=C python3.7 -c "import sys; print(sys.stdin.encoding)"
ANSI_X3.4-1968


but can reproduce on b3:
PYTHONCOERCECLOCALE=0 LANG=C python3.7 -c "import sys; print(sys.stdin.encoding)"
utf-8
msg316051 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-02 07:50
> PYTHONCOERCECLOCALE=0 LANG=C python3.7 -c "import sys; print(sys.stdin.encoding)"

Are you aware of the PEP 540 "UTF-8 Mode"? It's also enabled automatically by the POSIX locale. If you hate UTF-8, you have to use:

PYTHONCOERCECLOCALE=0 python3.7 -X utf8=0
or
PYTHONCOERCECLOCALE=0 PYTHONUTF8=0 python3.7
msg316052 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-02 07:58
I cannot reproduce the issue with the future Python 3.7 beta4:

vstinner@apu$ PYTHONCOERCECLOCALE=0 LANG=C ./python -X utf8=0 -c "import sys; print(sys.stdin.encoding)"
ANSI_X3.4-1968
vstinner@apu$ LANG=C ./python -X utf8=0 -c "import sys; print(sys.stdin.encoding)"
UTF-8
vstinner@apu$ ./python
Python 3.7.0b3+ (heads/3.7:887b5f8fc6, May  2 2018, 09:54:18) 
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

The master branch works also as expected.
msg316053 - (view) Author: pmp-p (pmpp) * Date: 2018-05-02 08:44
b3 is also ok with the -X parameter :
PYTHONCOERCECLOCALE=0 LANG=C python3.7 -X utf8=0 -c "import sys; print(sys.stdin.encoding)"
ANSI_X3.4-1968
msg316054 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-02 08:47
Ok, pmpp confirmed that the 3.7b2 bug has been fixed in 3.7b3.

Thank you for your bug report Anthony Shaw!
History
Date User Action Args
2018-05-02 08:47:16vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg316054

stage: resolved
2018-05-02 08:44:00pmppsetmessages: + msg316053
2018-05-02 07:58:59vstinnersetmessages: + msg316052
2018-05-02 07:50:12vstinnersetmessages: + msg316051
2018-05-02 07:08:29pmppsetnosy: + pmpp
messages: + msg316049
2018-05-02 06:28:32ned.deilysetnosy: + vstinner
2018-05-02 06:18:14anthony shawcreate