classification
Title: message in unittest tracebacks
Type: behavior Stage:
Components: Library (Lib), Unicode Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: michael.foord Nosy List: amaury.forgeotdarc, ezio.melotti, gregory.p.smith, gthb, haypo, michael.foord
Priority: normal Keywords: patch

Created on 2010-04-05 10:56 by michael.foord, last changed 2011-12-23 15:45 by michael.foord. This issue is now closed.

Files
File name Uploaded Description Edit
unittest2-issue-8313.patch gthb, 2010-05-04 13:58 Patch: more useful standard assertion messages on unicode comparisons
traceback_unicode.patch haypo, 2010-05-04 15:17
Messages (13)
msg102368 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2010-04-05 10:56
>>> import unittest
>>> class Foo(unittest.TestCase):
...   def test_fffd(self): self.assertEqual(u'\ufffd', u'\ufffd\ufffd')
...
>>> unittest.main(exit=False)
F
======================================================================
FAIL: test_fffd (__main__.Foo)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<stdin>", line 2, in test_fffd
AssertionError: <unprintable AssertionError object>

----------------------------------------------------------------------
Ran 1 test in 0.001s


The problem with creating unicode tracebacks is that they could fail when being output on terminals not capable of showing the full range of unicode characters (the default terminal on Windows is CP1252).

This can already happen with Unicode messages that aren't part of the traceback.

Detecting the 'unprintable' message before calling into traceback and replacing it with the repr of the unicode is one possibility.
msg104937 - (view) Author: Gunnlaugur Thor Briem (gthb) Date: 2010-05-04 13:58
Replacing the message with its repr seems to me at least strongly preferable to the current “hide it all” behavior. :)

Better, msg.encode('ascii', 'backslashreplace') does what repr does with unencodable characters, but does not add the quotes, so the behavior is only different when it needs to be.

Better still, 'ascii' need not be hardcoded. I'm attaching a patch that sets the encoding from an environment variable, defaulting to 'ascii', and encodes the message with 'backslashreplace'. This makes unicode string equality assertions much more useful for me.

The encoding could also be configurable by some clean hook for test runners to use. unit2 could have a command-line parameter, and TextTestRunner could use stream.encoding if not None (or PYTHONIOENCODING on Python 3).

Ideally messages should not be forced to be 8-bit strings by the failure exception class, but I suppose that's a bigger change than you would want to make.

The downside of using backslashreplace (or repr, for that matter) is that it does not preserve lengths, so the diff markers can get misaligned. I find that an acceptable tradeoff, but 'replace' is another option that preserves lengths, at least more often.
msg104940 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2010-05-04 14:10
Sounds like a good solution - I'll look at this, thanks.
msg104943 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-05-04 14:31
Very recently, issue8533 changed regrtest.py to use 'backslashreplace' when printing errors. This issue seems very similar
msg104946 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-05-04 15:01
The example raises an AssertionError(u'\n- \ufffd+ \ufffd\ufffd') which is converted to string by traceback.format_exception(). This function fails in _some_str() on str(value) instruction. You can reproduce the error with:

>>> str(AssertionError(u"\xe9"))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)

> The problem with creating unicode tracebacks is that they could 
> fail when being output on terminals not capable of showing 
> the full range of unicode characters (the default terminal 
> on Windows is CP1252).

The problem is not related to the terminal encoding: str(value) uses Python default encoding (ASCII by default). Python3 is not concerned because str(AssertionError("\xe9")) doesn't raise any error: it returns "\xe9".
msg104947 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-05-04 15:04
> Very recently, issue8533 changed regrtest.py to use
> 'backslashreplace' when printing errors. This issue seems 
> very similar

Issue #8533 is not directly related because in this issue the error occurs before writing the traceback to the terminal.
msg104949 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-05-04 15:17
Attached patch fixes _some_str() function of the traceback module: encode unicode exception message to ASCII using backslashreplace error handler. ASCII is not the best choice, but str(unicode(...)) uses also ASCII (the default encoding) and we don't know the terminal encoding in traceback. We cannot do better here in Python2 (without breaking a lot of APIs...).

The right fix is to use Python3 which formats a traceback to unicode (unicode characters of the error message are kept unchanged). The choice of the encoding and error handler is made only at the end, when writing the output to the terminal, which is the right thing to do.
msg104989 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-05-05 00:32
> The downside of using backslashreplace (or repr, for that matter) is
> that it does not preserve lengths, so the diff markers can get
> misaligned. I find that an acceptable tradeoff, but 'replace' is
> another option that preserves lengths, at least more often.

'replace' loose important informations: if the test is about the unicode string content, we will be unable to see the error data.

Result of the first example with my patch (backslashreplace):
======================================================================
FAIL: test_fffd (__main__.Foo)                                        
----------------------------------------------------------------------
Traceback (most recent call last):                                    
  File "x.py", line 3, in test_fffd                                   
    def test_fffd(self): self.assertEqual(u'\ufffd', u'\ufffd\ufffd') 
AssertionError:                                                       
- \ufffd+ \ufffd\ufffd                                                

Result of the first example with 'replace' error handler:
======================================================================
FAIL: test_fffd (__main__.Foo)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "x.py", line 3, in test_fffd
    def test_fffd(self): self.assertEqual(u'\ufffd', u'\ufffd\ufffd')
AssertionError:
- ?+ ??

(but this example is irrevelant because U+FFFD is the unicode replacement character :-D)

If nobody complains about my patch, I will commit it to Python trunk (only).

You can still reimplement fail() method to encode the message using a more revelant encoding and/or error handler.
msg105011 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2010-05-05 10:19
I would prefer to try str(...) first and only attempt to convert to unicode and do the backslash replace if the str(...) call fails.
msg105022 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-05-05 12:46
Commited: r80777 (trunk) and r80779 (2.6); blocked: r80778 (py3k).

Open a new issue if you would like to use something better than ASCII+backslashreplace in unittest (using runner stream encoding?).
msg150127 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2011-12-23 02:32
http://pypi.python.org/pypi/unittest2 says 

"There are several places in unittest2 (and unittest) that call str(...) on exceptions to get the exception message. This can fail if the exception was created with non-ascii unicode. This is rare and I won't address it unless it is actually reported as a problem for someone."

It is a problem for us now that we've re-rooted all our TestCases on top of unittest2 at work. :)

The solution I'm leaning towards is monkey-patching the new traceback._some_str implementation in at unittest2 import time.
msg150128 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2011-12-23 02:32
We're on python 2.6, otherwise this would be a moot point.  but you might want to include something like that in a new unittest2 backport release.
msg150172 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2011-12-23 15:45
traceback patch looks good. Thanks for the unittest2 patch as well.
History
Date User Action Args
2011-12-23 15:45:17michael.foordsetmessages: + msg150172
2011-12-23 02:32:50gregory.p.smithsetmessages: + msg150128
2011-12-23 02:32:04gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg150127
2010-05-05 12:46:29hayposetstatus: open -> closed
resolution: fixed
messages: + msg105022
2010-05-05 10:19:56michael.foordsetmessages: + msg105011
2010-05-05 10:06:18michael.foordsetmessages: - msg105009
2010-05-05 10:00:40michael.foordsetmessages: + msg105009
2010-05-05 00:32:21hayposetmessages: + msg104989
2010-05-04 15:17:05hayposetfiles: + traceback_unicode.patch

messages: + msg104949
2010-05-04 15:04:09hayposetmessages: + msg104947
2010-05-04 15:01:48hayposetnosy: amaury.forgeotdarc, haypo, ezio.melotti, michael.foord, gthb
messages: + msg104946
components: + Unicode
versions: - Python 3.2
2010-05-04 14:31:03amaury.forgeotdarcsetnosy: + amaury.forgeotdarc, haypo
messages: + msg104943
2010-05-04 14:10:48michael.foordsetmessages: + msg104940
2010-05-04 13:58:50gthbsetfiles: + unittest2-issue-8313.patch

nosy: + gthb
messages: + msg104937

keywords: + patch
2010-04-05 10:56:28michael.foordcreate