msg121193 - (view) |
Author: Johannes Ammon (jammon) |
Date: 2010-11-14 13:04 |
When there is a non-ASCII character in the docstring of a test function, unittest triggers an UnicodeEncodeError when called with "--verbose".
I have this file unicodetest.py:
-----------------------------------------
# -*- coding: utf-8 -*-
import unittest
class UnicodeTest(unittest.TestCase):
def test_unicode_docstring(self):
u"""täst - docstring with unicode character"""
self.assertEqual(1+1, 2)
if __name__ == '__main__':
unittest.main()
-----------------------------------------
Running it normally is ok:
$ python unicodetest.py
.
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
But with "--verbose" it breaks:
$ python unicodetest.py --verbose
Traceback (most recent call last):
File "unicodetest.py", line 10, in <module>
unittest.main()
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 817, in __init__
self.runTests()
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 861, in runTests
result = testRunner.run(self.test)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 753, in run
test(result)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 464, in __call__
return self.run(*args, **kwds)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 460, in run
test(result)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 464, in __call__
return self.run(*args, **kwds)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 460, in run
test(result)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 300, in __call__
return self.run(*args, **kwds)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 266, in run
result.startTest(self)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 693, in startTest
self.stream.write(self.getDescription(test))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 1: ordinal not in range(128)
Found with Python 2.6 on MacOS X 10.6.4
|
msg121197 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2010-11-14 15:47 |
Is this a duplicate of #1293741? That issue was closed as out of date, but I'm not 100% convinced that was the correct closure. What do you think?
Does it still happen with 2.7? (2.6 is in security fix only mode.)
|
msg121206 - (view) |
Author: Johannes Ammon (jammon) |
Date: 2010-11-14 18:04 |
Same behaviour with 2.7
|
msg121263 - (view) |
Author: Jeffrey Finkelstein (jfinkels) * |
Date: 2010-11-16 04:57 |
I am not having this problem on Ubuntu 10.10 with the most recent Python 2.7:
<terminal interaction>
$ ./python unicodetest.py --verbose
test_unicode_docstring (__main__.UnicodeTest)
täst - docstring with unicode character ... ok
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
$ ./python unicodetest.py
test_unicode_docstring (__main__.UnicodeTest)
täst - docstring with unicode character ... ok
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
</terminal interaction>
|
msg121264 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2010-11-16 04:59 |
Great, thank you for the update! Closing.
|
msg121276 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2010-11-16 09:37 |
I have read and closed too fast, Johannes still has the bug on OS X. Can someone turn his example script into a patch adding a unit test?
|
msg121279 - (view) |
Author: Michael Foord (michael.foord) * |
Date: 2010-11-16 10:49 |
The issue is with a non-ascii character in a *Unicode* docstring. Python has to encode the string to write it to the terminal; the encode is implicit and so fails.
The problem doesn't happen with Python 3 unless you run on an ascii terminal.
|
msg121288 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2010-11-16 13:19 |
Johannes, can you paste the output of the locale command?
|
msg121294 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2010-11-16 14:10 |
In Python 3, sys.stderr uses the 'backslashreplace' error handler. With C locale, sys.stderr uses the ASCII encoding and so the é unicode character is printed as \xe9.
In Python 2, sys.stderr.errors is strict by default.
It works if you specify the error handler:
$ ./python -c "import sys; sys.stderr.write(u'\xe9\n')"
é
$ PYTHONIOENCODING=ascii:backslashreplace ./python -c "import sys; sys.stderr.write(u'\xe9\n')"
\xe9
But with ASCII encoding, and the default error handler (strict), it fails:
$ PYTHONIOENCODING=ascii ./python -c "import sys; sys.stderr.write(u'\xe9\n')"
Traceback (most recent call last):
File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)
$ LANG= ./python -c "import sys; sys.stderr.write(u'\xe9\n')"
Traceback (most recent call last):
File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)
Change the default error handler in a minor release is not a good idea. But we can emulate the backslashreplace error handler. distutils.log does that in Python3:
class Log:
def __init__(self, threshold=WARN):
self.threshold = threshold
def _log(self, level, msg, args):
if level not in (DEBUG, INFO, WARN, ERROR, FATAL):
raise ValueError('%s wrong log level' % str(level))
if level >= self.threshold:
if args:
msg = msg % args
if level in (WARN, ERROR, FATAL):
stream = sys.stderr
else:
stream = sys.stdout
if stream.errors == 'strict':
# emulate backslashreplace error handler
encoding = stream.encoding
msg = msg.encode(encoding, "backslashreplace").decode(encoding)
stream.write('%s\n' % msg)
stream.flush()
(...)
_WritelnDecorator() of unittest.runner should maybe use the same code.
|
msg123036 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2010-12-02 02:10 |
See also #10601: "sys.displayhook: use backslashreplace error handler if repr(value) is not encodable to sys.stdout".
|
msg171439 - (view) |
Author: Michael Foord (michael.foord) * |
Date: 2012-09-28 11:32 |
So on OS X (Python 2.7 only) the following still fails:
PYTHONIOENCODING=ascii ./python.exe unicodetest.py --verbose
|
msg221950 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2014-06-30 08:58 |
Does this need following up, can it be closed as "won't fix" as it only affects 2.7, or what?
|
msg221951 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2014-06-30 09:05 |
> Does this need following up, can it be closed as "won't fix" as it only affects 2.7, or what?
IMO we should fix this issue. I proposed a fix in msg121294.
|
msg222233 - (view) |
Author: Michael Foord (michael.foord) * |
Date: 2014-07-03 22:32 |
So the proposed fix does the backslashreplace for errors and then re-decodes, allowing the encode in the stream to work. That seems like a good fix.
|
msg340220 - (view) |
Author: Karthikeyan Singaravelan (xtreak) * |
Date: 2019-04-14 17:49 |
This is still an issue with latest 2.7. I went ahead and created PR based on Victor's suggestion in msg121294. I am not sure of the correct way to test this I have used cStringIO.StringIO as the stream for a test case with a unicode description along with setting default encoding as 'ascii'. I tested the original report to make sure the patch fixes the error.
$ PYTHONIOENCODING=ascii ./python.exe ../backups/bpo10417.py --verbose
test_unicode_docstring (__main__.UnicodeTest)
t\xe4st - docstring with unicode character ... ok
----------------------------------------------------------------------
Ran 1 test in 0.004s
OK
|
msg340225 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2019-04-14 18:51 |
Test that your fix does not break the case of 8-bit non-ascii docstring.
|
msg340382 - (view) |
Author: Karthikeyan Singaravelan (xtreak) * |
Date: 2019-04-17 08:26 |
> Test that your fix does not break the case of 8-bit non-ascii docstring.
Do you mean double escaping of the backslash as in t\xe4st changed to t\\xe4st due to this PR?
➜ cpython git:(bpo10417) $ cat /tmp/foo.py
# -*- coding: utf-8 -*-
import unittest
class UnicodeTest(unittest.TestCase):
def test_unicode_docstring(self):
u"""docstring with unicode character. t\xe4st"""
self.assertEqual(1+1, 2)
if __name__ == '__main__':
unittest.main()
# ASCII encoding
➜ cpython git:(bpo10417) $ PYTHONIOENCODING=ascii ./python.exe /tmp/foo.py --verbose
test_unicode_docstring (__main__.UnicodeTest)
docstring with unicode character. t\xe4st ... ok
----------------------------------------------------------------------
Ran 1 test in 0.019s
OK
# utf-8 encoding
➜ cpython git:(bpo10417) $ ./python.exe /tmp/foo.py --verbose
test_unicode_docstring (__main__.UnicodeTest)
docstring with unicode character. täst ... ok
----------------------------------------------------------------------
Ran 1 test in 0.004s
OK
|
msg340443 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2019-04-17 20:30 |
I mean that '\xe4'.encode(encoding, 'backslashreplace') will fail.
|
msg367351 - (view) |
Author: Zachary Ware (zach.ware) * |
Date: 2020-04-27 02:05 |
With 2.7 now EOL, I'm closing the issue.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:08 | admin | set | github: 54626 |
2020-04-27 02:05:54 | zach.ware | set | status: open -> closed
nosy:
+ zach.ware messages:
+ msg367351
resolution: out of date stage: test needed -> resolved |
2019-04-17 20:30:21 | serhiy.storchaka | set | messages:
+ msg340443 |
2019-04-17 08:26:30 | xtreak | set | messages:
+ msg340382 |
2019-04-15 09:50:55 | vstinner | set | nosy:
- vstinner
title: unittest triggers UnicodeEncodeError with non-ASCII character in the docstring of the test function -> [2.7] unittest triggers UnicodeEncodeError with non-ASCII character in the docstring of the test function |
2019-04-14 18:51:34 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg340225
|
2019-04-14 17:49:53 | xtreak | set | nosy:
+ xtreak
messages:
+ msg340220 stage: patch review -> test needed |
2019-04-14 17:44:20 | xtreak | set | keywords:
+ patch stage: test needed -> patch review pull_requests:
+ pull_request12754 |
2019-03-16 00:12:16 | BreamoreBoy | set | nosy:
- BreamoreBoy
|
2014-07-03 22:32:19 | michael.foord | set | messages:
+ msg222233 |
2014-06-30 09:05:44 | vstinner | set | messages:
+ msg221951 |
2014-06-30 08:58:04 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages:
+ msg221950
|
2012-09-28 11:32:00 | michael.foord | set | messages:
+ msg171439 |
2010-12-02 02:10:28 | vstinner | set | messages:
+ msg123036 |
2010-11-16 14:10:04 | vstinner | set | nosy:
+ vstinner messages:
+ msg121294
|
2010-11-16 13:19:58 | eric.araujo | set | messages:
+ msg121288 |
2010-11-16 10:49:48 | michael.foord | set | assignee: michael.foord messages:
+ msg121279 |
2010-11-16 09:37:58 | eric.araujo | set | status: closed -> open resolution: out of date -> (no value) messages:
+ msg121276
stage: resolved -> test needed |
2010-11-16 04:59:51 | eric.araujo | set | status: open -> closed resolution: out of date messages:
+ msg121264
stage: resolved |
2010-11-16 04:57:45 | jfinkels | set | nosy:
+ jfinkels messages:
+ msg121263
|
2010-11-15 01:19:49 | eric.araujo | set | nosy:
+ eric.araujo
versions:
- Python 2.6 |
2010-11-14 18:04:53 | jammon | set | messages:
+ msg121206 versions:
+ Python 2.7 |
2010-11-14 15:47:51 | r.david.murray | set | nosy:
+ r.david.murray messages:
+ msg121197
|
2010-11-14 13:13:53 | ezio.melotti | set | nosy:
+ ezio.melotti, michael.foord
|
2010-11-14 13:04:50 | jammon | create | |