classification
Title: test_warnings fails with PYTHONFSENCODING=latin-1 on UNIX/BSD
Type: Stage:
Components: Tests, Unicode Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, pjenvey, vstinner
Priority: normal Keywords:

Created on 2010-09-29 17:23 by vstinner, last changed 2010-10-13 22:22 by vstinner. This issue is now closed.

Messages (7)
msg117631 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-29 17:23
$ PYTHONFSENCODING=latin-1 ./python Lib/test/test_warnings.py 
...
======================================================================
FAIL: test_nonascii (__main__.CEnvironmentVariableTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Lib/test/test_warnings.py", line 731, in test_nonascii
    "['ignore:Deprecaci├│nWarning']".encode('utf-8'))
AssertionError: b"['ignore:Deprecaci\\udcf3nWarning']" != b"['ignore:Deprecaci\xc3\xb3nWarning']"

======================================================================
FAIL: test_nonascii (__main__.PyEnvironmentVariableTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Lib/test/test_warnings.py", line 731, in test_nonascii
    "['ignore:Deprecaci├│nWarning']".encode('utf-8'))
AssertionError: b"['ignore:Deprecaci\\udcf3nWarning']" != b"['ignore:Deprecaci\xc3\xb3nWarning']"

----------------------------------------------------------------------

The problem is that subprocess encodes PYTHONWARNINGS environment variable value with the filesystem encoding, whereas Py_main() decodes the variable value with the locale encoding.

History of how the variable is read in py3k:
 - #7301: r79880 creates this variable, use mbstowcs() and PySys_AddWarnOption()
 - #7301: r80066 uses setlocale(LC_ALL, ""), and replaces mbstowcs() by _Py_char2wchar() (to support surrogates)
 - #8589: r81358 creates PySys_AddWarnOptionUnicode() and replaces _Py_char2wchar() by PyUnicode_DecodeFSDefault()
 - #8589: r84694 replaces PyUnicode_DecodeFSDefault() by _Py_char2wchar() + PyUnicode_FromWideChar() "because the PyCodec machinery is not ready yet"
 - #8589: r84731 uses PyUnicode_FromString() (utf-8) on Mac OS X
msg117648 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-09-29 20:16
OK, so who's messing up: subprocess or Py_main()?
msg117658 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-29 21:42
> OK, so who's messing up: subprocess or Py_main()?

Well, this is the real question :-)

locale encoding is used to decode command line arguments (sys.argv), filesystem encoding is used to decode environment variables and to encode subprocess arguments and environment variables.

It means that we have something funny if a Python process creates a Python subprocess: child process arguments are encoded using the filesystem encoding, whereas the arguments are decoded using the locale encoding. If both encodings are different (eg. if PYTHONFSENCODING is used by at least the parent process), we have the bug similar to #4388. See also issue #8775.
msg117660 - (view) Author: Philip Jenvey (pjenvey) * (Python committer) Date: 2010-09-29 21:48
It sounds like you had PYTHONWARNINGS using the fs encoding before r84694, but reverted it due to bootstrapping issues.

Indeed, the fs encoding isn't initialized until later in Py_InitializeEx. Maybe the PYTHONWARNINGS code should be moved there instead?
msg117671 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-29 22:52
> Indeed, the fs encoding isn't initialized until later in
> Py_InitializeEx. Maybe the PYTHONWARNINGS code should be moved 
> there instead?

sys.warnopts should be filled early because it is used to initialize the _warnings module, and the _warnings module have to be initialized before loading another non-builtin module because importing a module may emit a warning.

> OK, so who's messing up: subprocess or Py_main()?

I opened issue #9992 to decide which encoding should be used to encode and decode command line arguments: locale or filesystem encoding.
msg117677 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-30 00:12
> Maybe the PYTHONWARNINGS code should be moved there instead?

sys.warnoptions is read by the warnings module (not the _warnings module) when this module is loaded. The warnings module is loaded by Py_InitializeEx() if sys.warnoptions list is not empty.

It might be possible to read PYTHONWARNINGS env var after initfsencoding() but before loading the warnings module. But we have to ensure that Py_InitializeEx() can still be called multiple times.
msg118596 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-10-13 22:22
Fixed by r85430 (remove PYTHONFSENCODING), see #9992.
History
Date User Action Args
2010-10-13 22:22:03vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg118596
2010-09-30 00:12:36vstinnersetmessages: + msg117677
2010-09-29 22:52:58vstinnersetmessages: + msg117671
2010-09-29 21:48:43pjenveysetmessages: + msg117660
2010-09-29 21:42:46vstinnersetmessages: + msg117658
2010-09-29 20:25:11pjenveysetnosy: + pjenvey
2010-09-29 20:16:33brett.cannonsetmessages: + msg117648
2010-09-29 19:43:14pitrousetnosy: + brett.cannon
2010-09-29 17:23:21vstinnercreate