classification
Title: test_sys.test_ioencoding_nonascii() fails with ASCII locale encoding
Type: behavior Stage: patch review
Components: Tests Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: pitrou, r.david.murray, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2013-09-20 21:29 by pitrou, last changed 2015-10-14 16:33 by vstinner.

Files
File name Uploaded Description Edit
sys_test_ioencoding_locale.patch serhiy.storchaka, 2013-09-25 09:40 review
sys_test_ioencoding.patch serhiy.storchaka, 2013-09-28 20:56 review
Messages (13)
msg198174 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-09-20 21:29
The test added in issue18818 fails on the new OS X buildbot:

======================================================================
FAIL: test_ioencoding_nonascii (test.test_sys.SysModuleTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/buildbot/buildarea/3.x.murray-snowleopard/build/Lib/test/test_sys.py", line 581, in test_ioencoding_nonascii
    self.assertEqual(out, os.fsencode(test.support.FS_NONASCII))
AssertionError: b'' != b'\xc3\xa6'

http://buildbot.python.org/all/builders/AMD64%20Snow%20Leop%203.x/builds/4
msg198175 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-09-20 22:16
The test fails with ASCII locale encoding (ex: LANG= on Linux).

The test should not try to display a non-ASCII character, but should check the encoding (sys.stdout.encoding) instead. The test should ensure that sys.stdout.encoding is the same with the PYTHONIOENCODING unset (python started with -E option and the current environment) and with the variable set to an empty value.
msg198181 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-09-21 02:09
I set LC_CTYPE to en_US.utf-8 on the buildbot, which I think is the better setting for that buildbot, so the test doesn't fail there anymore.  However, the test should still be fixed (and maybe we should have a buildbot running with no language set at all).
msg198369 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-24 22:04
Shouldn't FS_NONASCII be None with ASCII locale encoding?
msg198370 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-09-24 22:08
Shouldn't FS_NONASCII be None with ASCII locale encoding?

See the description of the variable in test.support:

# FS_NONASCII: non-ASCII character encodable by os.fsencode(),
# or None if there is no such character.

The file system encoding an the locale encoding can be different... especially when PYTHONIOENCODING is used.

The test should not use FS_NONASCII.
msg198373 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-09-25 01:05
Also note that on OS X I believe the fsencoding is always utf-8, but the locale can of course be something else.
msg198379 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-25 09:40
Indeed.

Here is a patch. It uses same algorithm to obtain encodable non-ASCII string as for FS_NONASCII, but with locale encoding. It also adds new tests and simplifies existing tests.
msg198544 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-09-28 19:23
> Here is a patch. It uses same algorithm to obtain encodable
> non-ASCII string as for FS_NONASCII, but with locale encoding.
> It also adds new tests and simplifies existing tests.

I don't like your patch. The purpose of PYTHONIOENCODING is to set sys.stdin/stdout/stderr encodings. Your patch does not check sys.stdout.encoding, but check directly the codec. Two codecs may encode the same character as the same byte sequence.

Your test is skipped if the locale encoding is ASCII, whereas the purpopse of PYTHONIOENCODING is to write non-ASCII characters without having to care of the locale encoding.

I would really prefer to simply check sys.stdin.encoding, sys.stdout.encoding and sys.stderr.encoding attributes.

If you really want to check the codec itself, you should use known sequence, ex: 'héllo€'.encode('cp1252') gives b'h\xe9llo\x80'.
msg198552 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-28 20:56
Here is a patch which directly checks sys.std* attributes.
msg198553 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-28 21:02
> Your test is skipped if the locale encoding is ASCII, whereas the purpopse of PYTHONIOENCODING is to write non-ASCII characters without having to care of the locale encoding.

This case was tested in previous test.

> If you really want to check the codec itself, you should use known sequence, ex: 'héllo€'.encode('cp1252') gives b'h\xe9llo\x80'.

We can't be sure that OS supports cp1252 (or any other non-default) locale.
msg198554 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-28 21:06
> Your patch does not check sys.stdout.encoding, but check directly the codec. Two codecs may encode the same character as the same byte sequence.

Checking encoding name is too rigid. Python interpreter can normalize encoding name before assigning it to standard streams. This is implementation detail.
msg252514 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-10-08 06:36
What could you say about the recent patch Victor?
msg253008 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-10-14 16:33
> What could you say about the recent patch Victor?

I'm not sure that it works in all cases. io.TextIOWrapper doesn't care to normalize the encoding name. You should use something like:

   encoding = codecs.lookup(encoding).name

Otherwise, the test can fail if you care one of the various aliases of each encoding. Example: "UTF-8" vs "utf8" vs "utf-8".
History
Date User Action Args
2015-10-14 16:33:57vstinnersetmessages: + msg253008
2015-10-08 06:36:41serhiy.storchakasetmessages: + msg252514
2013-09-28 21:06:28serhiy.storchakasetmessages: + msg198554
2013-09-28 21:02:10serhiy.storchakasetmessages: + msg198553
2013-09-28 20:56:59serhiy.storchakasetfiles: + sys_test_ioencoding.patch

messages: + msg198552
2013-09-28 19:23:29vstinnersetmessages: + msg198544
2013-09-25 09:40:27serhiy.storchakasetfiles: + sys_test_ioencoding_locale.patch
keywords: + patch
messages: + msg198379

stage: needs patch -> patch review
2013-09-25 01:05:28r.david.murraysetmessages: + msg198373
2013-09-24 22:08:30vstinnersetmessages: + msg198370
2013-09-24 22:04:01serhiy.storchakasetmessages: + msg198369
2013-09-24 21:58:02vstinnersettitle: test_ioencoding_nonascii (test_sys) fails on Snow Leopard -> test_sys.test_ioencoding_nonascii() fails with ASCII locale encoding
2013-09-21 02:09:51r.david.murraysetmessages: + msg198181
2013-09-20 22:16:09vstinnersetmessages: + msg198175
2013-09-20 21:29:35pitroucreate