classification
Title: [2.7] unittest triggers UnicodeEncodeError with non-ASCII character in the docstring of the test function
Type: behavior Stage: test needed
Components: Tests, Unicode Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: michael.foord Nosy List: eric.araujo, ezio.melotti, jammon, jfinkels, michael.foord, r.david.murray, serhiy.storchaka, xtreak
Priority: normal Keywords: patch

Created on 2010-11-14 13:04 by jammon, last changed 2019-04-17 20:30 by serhiy.storchaka.

Pull Requests
URL Status Linked Edit
PR 12829 open xtreak, 2019-04-14 17:44
Messages (18)
msg121193 - (view) Author: Johannes Ammon (jammon) Date: 2010-11-14 13:04
When there is a non-ASCII character in the docstring of a test function, unittest triggers an UnicodeEncodeError when called with "--verbose".

I have this file unicodetest.py:
-----------------------------------------
# -*- coding: utf-8 -*-
import unittest

class UnicodeTest(unittest.TestCase):
    def test_unicode_docstring(self):
        u"""täst - docstring with unicode character"""
        self.assertEqual(1+1, 2)

if __name__ == '__main__':
    unittest.main()
-----------------------------------------

Running it normally is ok:

$ python unicodetest.py 
.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK


But with "--verbose" it breaks:

$ python unicodetest.py --verbose
Traceback (most recent call last):
  File "unicodetest.py", line 10, in <module>
    unittest.main()
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 817, in __init__
    self.runTests()
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 861, in runTests
    result = testRunner.run(self.test)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 753, in run
    test(result)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 464, in __call__
    return self.run(*args, **kwds)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 460, in run
    test(result)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 464, in __call__
    return self.run(*args, **kwds)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 460, in run
    test(result)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 300, in __call__
    return self.run(*args, **kwds)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 266, in run
    result.startTest(self)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 693, in startTest
    self.stream.write(self.getDescription(test))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 1: ordinal not in range(128)


Found with Python 2.6 on MacOS X 10.6.4
msg121197 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-11-14 15:47
Is this a duplicate of #1293741?  That issue was closed as out of date, but I'm not 100% convinced that was the correct closure.  What do you think?

Does it still happen with 2.7?  (2.6 is in security fix only mode.)
msg121206 - (view) Author: Johannes Ammon (jammon) Date: 2010-11-14 18:04
Same behaviour with 2.7
msg121263 - (view) Author: Jeffrey Finkelstein (jfinkels) * Date: 2010-11-16 04:57
I am not having this problem on Ubuntu 10.10 with the most recent Python 2.7:

<terminal interaction>
$ ./python unicodetest.py --verbose
test_unicode_docstring (__main__.UnicodeTest)
täst - docstring with unicode character ... ok

----------------------------------------------------------------------
Ran 1 test in 0.000s

OK
$ ./python unicodetest.py
test_unicode_docstring (__main__.UnicodeTest)
täst - docstring with unicode character ... ok

----------------------------------------------------------------------
Ran 1 test in 0.000s

OK
</terminal interaction>
msg121264 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-11-16 04:59
Great, thank you for the update!  Closing.
msg121276 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-11-16 09:37
I have read and closed too fast, Johannes still has the bug on OS X.  Can someone turn his example script into a patch adding a unit test?
msg121279 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2010-11-16 10:49
The issue is with a non-ascii character in a *Unicode* docstring. Python has to encode the string to write it to the terminal; the encode is implicit and so fails.

The problem doesn't happen with Python 3 unless you run on an ascii terminal.
msg121288 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-11-16 13:19
Johannes, can you paste the output of the locale command?
msg121294 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-11-16 14:10
In Python 3, sys.stderr uses the 'backslashreplace' error handler. With C locale, sys.stderr uses the ASCII encoding and so the é unicode character is printed as \xe9.

In Python 2, sys.stderr.errors is strict by default.

It works if you specify the error handler:

$ ./python -c "import sys; sys.stderr.write(u'\xe9\n')"
é
$ PYTHONIOENCODING=ascii:backslashreplace ./python -c "import sys; sys.stderr.write(u'\xe9\n')"
\xe9

But with ASCII encoding, and the default error handler (strict), it fails:

$ PYTHONIOENCODING=ascii ./python -c "import sys; sys.stderr.write(u'\xe9\n')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)
$ LANG= ./python -c "import sys; sys.stderr.write(u'\xe9\n')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)

Change the default error handler in a minor release is not a good idea. But we can emulate the backslashreplace error handler. distutils.log does that in Python3:

class Log:

    def __init__(self, threshold=WARN):
        self.threshold = threshold

    def _log(self, level, msg, args):
        if level not in (DEBUG, INFO, WARN, ERROR, FATAL):
            raise ValueError('%s wrong log level' % str(level))

        if level >= self.threshold:
            if args:
                msg = msg % args
            if level in (WARN, ERROR, FATAL):
                stream = sys.stderr
            else:
                stream = sys.stdout
            if stream.errors == 'strict':
                # emulate backslashreplace error handler
                encoding = stream.encoding
                msg = msg.encode(encoding, "backslashreplace").decode(encoding)
            stream.write('%s\n' % msg)
            stream.flush()
    (...)

_WritelnDecorator() of unittest.runner should maybe use the same code.
msg123036 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-12-02 02:10
See also #10601: "sys.displayhook: use backslashreplace error handler if repr(value) is not encodable to sys.stdout".
msg171439 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2012-09-28 11:32
So on OS X (Python 2.7 only) the following still fails:

PYTHONIOENCODING=ascii ./python.exe unicodetest.py --verbose
msg221950 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-06-30 08:58
Does this need following up, can it be closed as "won't fix" as it only affects 2.7, or what?
msg221951 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-06-30 09:05
> Does this need following up, can it be closed as "won't fix" as it only affects 2.7, or what?

IMO we should fix this issue. I proposed a fix in msg121294.
msg222233 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2014-07-03 22:32
So the proposed fix does the backslashreplace for errors and then re-decodes, allowing the encode in the stream to work. That seems like a good fix.
msg340220 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-04-14 17:49
This is still an issue with latest 2.7. I went ahead and created PR based on Victor's suggestion in msg121294. I am not sure of the correct way to test this I have used cStringIO.StringIO as the stream for a test case with a unicode description along with setting default encoding as 'ascii'. I tested the original report to make sure the patch fixes the error.

$ PYTHONIOENCODING=ascii ./python.exe ../backups/bpo10417.py --verbose
test_unicode_docstring (__main__.UnicodeTest)
t\xe4st - docstring with unicode character ... ok

----------------------------------------------------------------------
Ran 1 test in 0.004s

OK
msg340225 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-04-14 18:51
Test that your fix does not break the case of 8-bit non-ascii docstring.
msg340382 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-04-17 08:26
> Test that your fix does not break the case of 8-bit non-ascii docstring.

Do you mean double escaping of the backslash as in t\xe4st changed to t\\xe4st due to this PR? 

➜  cpython git:(bpo10417) $ cat /tmp/foo.py
# -*- coding: utf-8 -*-
import unittest

class UnicodeTest(unittest.TestCase):
    def test_unicode_docstring(self):
        u"""docstring with unicode character. t\xe4st"""
        self.assertEqual(1+1, 2)

if __name__ == '__main__':
    unittest.main()

# ASCII encoding

➜  cpython git:(bpo10417) $ PYTHONIOENCODING=ascii ./python.exe /tmp/foo.py --verbose
test_unicode_docstring (__main__.UnicodeTest)
docstring with unicode character. t\xe4st ... ok

----------------------------------------------------------------------
Ran 1 test in 0.019s

OK

# utf-8 encoding

➜  cpython git:(bpo10417) $ ./python.exe /tmp/foo.py --verbose
test_unicode_docstring (__main__.UnicodeTest)
docstring with unicode character. täst ... ok

----------------------------------------------------------------------
Ran 1 test in 0.004s

OK
msg340443 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-04-17 20:30
I mean that '\xe4'.encode(encoding, 'backslashreplace') will fail.
History
Date User Action Args
2019-04-17 20:30:21serhiy.storchakasetmessages: + msg340443
2019-04-17 08:26:30xtreaksetmessages: + msg340382
2019-04-15 09:50:55vstinnersetnosy: - vstinner

title: unittest triggers UnicodeEncodeError with non-ASCII character in the docstring of the test function -> [2.7] unittest triggers UnicodeEncodeError with non-ASCII character in the docstring of the test function
2019-04-14 18:51:34serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg340225
2019-04-14 17:49:53xtreaksetnosy: + xtreak

messages: + msg340220
stage: patch review -> test needed
2019-04-14 17:44:20xtreaksetkeywords: + patch
stage: test needed -> patch review
pull_requests: + pull_request12754
2019-03-16 00:12:16BreamoreBoysetnosy: - BreamoreBoy
2014-07-03 22:32:19michael.foordsetmessages: + msg222233
2014-06-30 09:05:44vstinnersetmessages: + msg221951
2014-06-30 08:58:04BreamoreBoysetnosy: + BreamoreBoy
messages: + msg221950
2012-09-28 11:32:00michael.foordsetmessages: + msg171439
2010-12-02 02:10:28vstinnersetmessages: + msg123036
2010-11-16 14:10:04vstinnersetnosy: + vstinner
messages: + msg121294
2010-11-16 13:19:58eric.araujosetmessages: + msg121288
2010-11-16 10:49:48michael.foordsetassignee: michael.foord
messages: + msg121279
2010-11-16 09:37:58eric.araujosetstatus: closed -> open
resolution: out of date ->
messages: + msg121276

stage: resolved -> test needed
2010-11-16 04:59:51eric.araujosetstatus: open -> closed
resolution: out of date
messages: + msg121264

stage: resolved
2010-11-16 04:57:45jfinkelssetnosy: + jfinkels
messages: + msg121263
2010-11-15 01:19:49eric.araujosetnosy: + eric.araujo

versions: - Python 2.6
2010-11-14 18:04:53jammonsetmessages: + msg121206
versions: + Python 2.7
2010-11-14 15:47:51r.david.murraysetnosy: + r.david.murray
messages: + msg121197
2010-11-14 13:13:53ezio.melottisetnosy: + ezio.melotti, michael.foord
2010-11-14 13:04:50jammoncreate