Issue1293741
Created on 2005-09-17 11:41 by ogrisel, last changed 2009-10-21 15:08 by babilen.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | Remove |
| test_iso-8859-15.py | ogrisel, 2005-09-17 11:41 | sample code that shows the problem | ||
| issue1293741.py | luciano, 2009-01-11 14:48 | Short script demonstrating encoding problem on doctest output | ||
| doctest.unicode.patch | christoph, 2009-06-30 15:04 | Patch for properly encoding Unicode strings | ||
| doctest.unicode-2.patch | christoph, 2009-07-01 20:24 | Patch extending to DocTestCase.runTest() | ||
| unicode_bug.py | babilen, 2009-10-21 15:02 | The script mentioned in the report | ||
| unicode_bug_literals.py | babilen, 2009-10-21 15:08 | Doctest with unicode_literals | ||
| Messages (10) | |||
|---|---|---|---|
| msg26296 - (view) | Author: GRISEL (ogrisel) | Date: 2005-09-17 11:41 | |
The doctest module fails when the expected result
string has non-ascii charcaters even if the # -*-
coding: XXX -*- line is properly set.
The enclosed code sample produce the following error:
Traceback (most recent call last):
File "test_iso-8859-15.py", line 41, in ?
_test()
File "test_iso-8859-15.py", line 26, in _test
tried, failed = runner.run(t)
File "/usr/lib/python2.4/doctest.py", line 1376, in run
return self.__run(test, compileflags, out)
File "/usr/lib/python2.4/doctest.py", line 1259, in __run
if check(example.want, got, self.optionflags):
File "/usr/lib/python2.4/doctest.py", line 1475, in
check_output
if got == want:
UnicodeDecodeError: 'ascii' codec can't decode byte
0xe9 in position 8: ordinal not in range(128)
|
|||
| msg26297 - (view) | Author: Tim Peters (tim_one) | Date: 2005-09-17 17:42 | |
Logged In: YES user_id=31435 Please try the patch at http://www.python.org/sf/1080727 and report back on whether it solves your problem (attaching comments to the patch report would be most useful). |
|||
| msg26298 - (view) | Author: GRISEL (ogrisel) | Date: 2005-09-18 10:25 | |
Logged In: YES user_id=795041 Unfortunateny that patch does not fix my problem. The patch at bug #1080727 fixes the problem for doctests written in external reST files (testfile and DocFileTest functions). My problem is related to internal docstring encoding (testmod for instance). However, Bjorn Tillenius says: """ If one writes doctests within documentation strings of classes and functions, it's possible to use non-ASCII characters since one can specify the encoding used in the source file. """ So according to him, docstrings' doctests with non-ascii characters should work by default. So maybe my system setup is somewhat broken. Could somebody please confirm/infirm this by running the attached sample script on his system? My system config: LANG=fr_FR@euro (on linux) python 2.4.1 with: sys.getdefaultencoding() == 'ascii' and locale.getpreferredencoding() == 'ISO-8859-15' $ file test_iso-8859-15.py test_iso-8859-15.py: ISO-8859 English text |
|||
| msg26299 - (view) | Author: Bjorn Tillenius (bjoti) | Date: 2006-02-16 11:41 | |
Logged In: YES user_id=1032069 I'm quite sure that you can use non-ASCII characters in your doctest, given that it's a unicode string. So if you make your docstring a unicode string, it should work. That is: u"""Docstring containing non-ASCII characters. ... """ |
|||
| msg26300 - (view) | Author: Tim Peters (tim_one) | Date: 2006-04-24 01:21 | |
Logged In: YES user_id=31435 Unassigned myself -- don't know enough about encodings. |
|||
| msg26301 - (view) | Author: akaihola (akaihola) | Date: 2007-05-09 08:19 | |
I made some tests with Python 2.5 on an Ubuntu Edgy system with an UTF-8 terminal. Here's the basic test which does work correctly: # -*- encoding: utf-8 -*- __doc__ = u""" >>> print u'ä' ä """ ; import doctest ; doctest.testmod() If I start to vary the "ä" (a with umlaut) characters in "print u'ä'" (the test) and the "ä" below it (expected result), I get a UnicodeEncodeError whenever doctest tries to print a message about non-matching test output. Here's a summary of my results in the format of test | expected result | success/failure Note that \u00e4 is unicode for the "ä" character. ä | ä | success \u00e4 | ä | success ä | \u00e4 | success \u00e4 | \u00e4 | success ä | x | fails to display message x | ä | fails to display message \u00e4 | x | fails to display message x | \u00e4 | fails to display message Conclusion: test running and output checking do work correctly, but there's a problem displaying messages about non-matching output whenever either the expected output or the output produced by the test contain any extended characters. The doctest documentation doesn't give any hint on work-arounds. |
|||
| msg79597 - (view) | Author: Luciano Ramalho (luciano) | Date: 2009-01-11 14:48 | |
I have confirmed everything that akaihola reports in Python 2.4, 2.5 and 2.6, but the problem is not limited to non-matching test output. It also happens with doctests with zero failures when the module is run with the -v command-line switch, or testmod is called with verbose=True. The attached file shows a work-around: handle the UnicodeEncodeError thrown by testmod, and display the "object" attribute of the exception to see exactly where the problem is. |
|||
| msg89928 - (view) | Author: Christoph Burgmer (christoph) | Date: 2009-06-30 15:04 | |
See attached patch which works for error reporting and verbose output. |
|||
| msg89996 - (view) | Author: Christoph Burgmer (christoph) | Date: 2009-07-01 20:24 | |
My last patch only changed the encoding used in DocTestRunner.run(). This new patch will apply the same to DocTestCase.runTest(). |
|||
| msg94311 - (view) | Author: Wolodja Wentland (babilen) | Date: 2009-10-21 15:02 | |
Here is some more information.
--- snip ---
Normal behaviour
================
$ locale
LANG=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_NUMERIC=POSIX
LC_TIME=en_GB.UTF-8
LC_COLLATE=en_GB.UTF-8
LC_MONETARY=de_DE.UTF-8
LC_MESSAGES=en_US.UTF-8
LC_PAPER=de_DE.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=de_DE.UTF-8
LC_TELEPHONE=de_DE.UTF-8
LC_MEASUREMENT=de_DE.UTF-8
LC_IDENTIFICATION=de_DE.UTF-8
LC_ALL=
$ python2.6
Python 2.6.3 (r263:75183, Oct 6 2009, 17:19:56)
[GCC 4.3.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print '缺陷'
缺陷
>>> print u'缺陷'
缺陷
>>> '缺陷'
'\xe7\xbc\xba\xe9\x99\xb7'
>>> u'缺陷'
u'\u7f3a\u9677'
>>> '缺陷'.decode('utf8')
u'\u7f3a\u9677'
>>> u'\u7f3a\u9677'
u'\u7f3a\u9677'
>>>
$ cat unicode_bug.py
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
def print_string():
"""
>>> print '缺陷'
缺陷
"""
pass
def print_unicode():
"""
>>> print u'缺陷'
缺陷
"""
pass
def string_repr():
"""
>>> '缺陷'
'\xe7\xbc\xba\xe9\x99\xb7'
"""
pass
def unicode_repr():
"""
>>> u'缺陷'
u'\u7f3a\u9677'
"""
pass
def decode():
"""
>>> '缺陷'.decode('utf8')
u'\u7f3a\u9677'
"""
pass
def unicode_escape_repr():
"""
>>> u'\u7f3a\u9677'
u'\u7f3a\u9677'
"""
pass
if __name__ == "__main__":
import doctest
doctest.testmod()
$ python2.5 unicode_bug.py
/usr/lib/python2.5/doctest.py:1460: UnicodeWarning: Unicode equal
comparison failed to convert both arguments to Unicode - interpreting
them as being unequal
if got == want:
/usr/lib/python2.5/doctest.py:1480: UnicodeWarning: Unicode equal
comparison failed to convert both arguments to Unicode - interpreting
them as being unequal
if got == want:
Traceback (most recent call last):
File "unicode_bug.py", line 48, in <module>
doctest.testmod()
File "/usr/lib/python2.5/doctest.py", line 1815, in testmod
runner.run(test)
File "/usr/lib/python2.5/doctest.py", line 1361, in run
return self.__run(test, compileflags, out)
File "/usr/lib/python2.5/doctest.py", line 1277, in __run
self.report_failure(out, test, example, got)
File "/usr/lib/python2.5/doctest.py", line 1141, in report_failure
self._checker.output_difference(example, got, self.optionflags))
File "/usr/lib/python2.5/doctest.py", line 1565, in output_difference
return 'Expected:\n%sGot:\n%s' % (_indent(want), _indent(got))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 14:
ordinal not in range(128)
$ python2.6 unicode_bug.py
/usr/local/lib/python2.6/doctest.py:1475: UnicodeWarning: Unicode equal
comparison failed to convert both arguments to Unicode - interpreting
them as being unequal
if got == want:
/usr/local/lib/python2.6/doctest.py:1495: UnicodeWarning: Unicode equal
comparison failed to convert both arguments to Unicode - interpreting
them as being unequal
if got == want:
Traceback (most recent call last):
File "unicode_bug.py", line 48, in <module>
doctest.testmod()
File "/usr/local/lib/python2.6/doctest.py", line 1830, in testmod
runner.run(test)
File "/usr/local/lib/python2.6/doctest.py", line 1374, in run
return self.__run(test, compileflags, out)
File "/usr/local/lib/python2.6/doctest.py", line 1290, in __run
self.report_failure(out, test, example, got)
File "/usr/local/lib/python2.6/doctest.py", line 1154, in report_failure
self._checker.output_difference(example, got, self.optionflags))
File "/usr/local/lib/python2.6/doctest.py", line 1580, in
output_difference
return 'Expected:\n%sGot:\n%s' % (_indent(want), _indent(got))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 14:
ordinal not in range(128)
$ nosetests -V
nosetests version 0.11.1
$ nosetests --with-doctest -v unicode_bug.py
Doctest: unicode_bug.decode ... ok
Doctest: unicode_bug.print_string ... ok
Doctest: unicode_bug.print_unicode ...
/usr/local/lib/python2.6/doctest.py:1475: UnicodeWarning: Unicode equal
comparison failed to convert both arguments to Unicode - interpreting
them as being unequal
if got == want:
/usr/local/lib/python2.6/doctest.py:1495: UnicodeWarning: Unicode equal
comparison failed to convert both arguments to Unicode - interpreting
them as being unequal
if got == want:
ERROR
Doctest: unicode_bug.string_repr ... FAIL
Doctest: unicode_bug.unicode_escape_repr ... ok
Doctest: unicode_bug.unicode_repr ... FAIL
======================================================================
ERROR: Doctest: unicode_bug.print_unicode
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.6/doctest.py", line 2140, in runTest
test, out=new.write, clear_globs=False)
File "/usr/local/lib/python2.6/doctest.py", line 1374, in run
return self.__run(test, compileflags, out)
File "/usr/local/lib/python2.6/doctest.py", line 1290, in __run
self.report_failure(out, test, example, got)
File "/usr/local/lib/python2.6/doctest.py", line 1154, in report_failure
self._checker.output_difference(example, got, self.optionflags))
File "/usr/local/lib/python2.6/doctest.py", line 1580, in
output_difference
return 'Expected:\n%sGot:\n%s' % (_indent(want), _indent(got))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 14:
ordinal not in range(128)
======================================================================
FAIL: Doctest: unicode_bug.string_repr
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.6/doctest.py", line 2145, in runTest
raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for unicode_bug.string_repr
File "/home/babilen/test/unicode_bug.py", line 18, in string_repr
----------------------------------------------------------------------
File "/home/babilen/test/unicode_bug.py", line 20, in
unicode_bug.string_repr
Failed example:
'缺陷'
Expected:
'缺陷'
Got:
'\xe7\xbc\xba\xe9\x99\xb7'
======================================================================
FAIL: Doctest: unicode_bug.unicode_repr
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.6/doctest.py", line 2145, in runTest
raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for unicode_bug.unicode_repr
File ".../unicode_bug.py", line 25, in unicode_repr
----------------------------------------------------------------------
File ".../unicode_bug.py", line 27, in unicode_bug.unicode_repr
Failed example:
u'缺陷'
Expected:
u'\u7f3a\u9677'
Got:
u'\xe7\xbc\xba\xe9\x99\xb7'
----------------------------------------------------------------------
Ran 6 tests in 0.011s
FAILED (errors=1, failures=2)
--- snip ---
unicode_literals
================
$ python2.6
Python 2.6.3 (r263:75183, Oct 6 2009, 17:19:56)
[GCC 4.3.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from __future__ import unicode_literals
>>> print '缺陷'
缺陷
>>> print u'缺陷'
缺陷
>>> '缺陷'
u'\u7f3a\u9677'
>>> u'缺陷'
u'\u7f3a\u9677'
>>> '缺陷'.decode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-1: ordinal not in range(128)
>>> u'\u7f3a\u9677'
u'\u7f3a\u9677'
$ cat unicode_bug_literals.py
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
from __future__ import unicode_literals
def print_string():
"""
>>> print '缺陷'
缺陷
"""
pass
def print_unicode():
"""
>>> print u'缺陷'
缺陷
"""
pass
def string_repr():
"""
>>> '缺陷'
u'\u7f3a\u9677'
"""
pass
def unicode_repr():
"""
>>> u'缺陷'
u'\u7f3a\u9677'
"""
pass
def unicode_escape_repr():
"""
>>> u'\u7f3a\u9677'
u'\u7f3a\u9677'
"""
pass
if __name__ == "__main__":
import doctest
doctest.testmod()
$ python2.6 unicode_bug_literals.py
Traceback (most recent call last):
File "unicode_bug_literals.py", line 43, in <module>
doctest.testmod()
File "/usr/local/lib/python2.6/doctest.py", line 1830, in testmod
runner.run(test)
File "/usr/local/lib/python2.6/doctest.py", line 1374, in run
return self.__run(test, compileflags, out)
File "/usr/local/lib/python2.6/doctest.py", line 1290, in __run
self.report_failure(out, test, example, got)
File "/usr/local/lib/python2.6/doctest.py", line 1154, in report_failure
self._checker.output_difference(example, got, self.optionflags))
UnicodeEncodeError: 'ascii' codec can't encode characters in position
157-158: ordinal not in range(128)
$ nosetests --with-doctest -v unicode_bug_literals.py
Doctest: unicode_bug_literals.print_string ... ok
Doctest: unicode_bug_literals.print_unicode ... ok
Doctest: unicode_bug_literals.string_repr ... FAIL
Doctest: unicode_bug_literals.unicode_escape_repr ... FAIL
Doctest: unicode_bug_literals.unicode_repr ... FAIL
======================================================================
FAIL: Doctest: unicode_bug_literals.string_repr
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.6/doctest.py", line 2145, in runTest
raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: <unprintable AssertionError object>
======================================================================
FAIL: Doctest: unicode_bug_literals.unicode_escape_repr
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.6/doctest.py", line 2145, in runTest
raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: <unprintable AssertionError object>
======================================================================
FAIL: Doctest: unicode_bug_literals.unicode_repr
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.6/doctest.py", line 2145, in runTest
raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: <unprintable AssertionError object>
----------------------------------------------------------------------
Ran 5 tests in 0.011s
FAILED (failures=3)
--- snip ---
With doctest.unicode-2.patch
============================
$ nosetests --with-doctest -v unicode_bug.py
Doctest: unicode_bug.decode ... ok
Doctest: unicode_bug.print_string ... ok
Doctest: unicode_bug.print_unicode ...
/usr/local/lib/python2.6/doctest.py:1480: UnicodeWarning: Unicode equal
comparison failed to convert both arguments to Unicode - interpreting
them as being unequal
if got == want:
/usr/local/lib/python2.6/doctest.py:1500: UnicodeWarning: Unicode equal
comparison failed to convert both arguments to Unicode - interpreting
them as being unequal
if got == want:
ERROR
Doctest: unicode_bug.string_repr ... ERROR
Doctest: unicode_bug.unicode_escape_repr ... ok
Doctest: unicode_bug.unicode_repr ... ERROR
======================================================================
ERROR: Doctest: unicode_bug.print_unicode
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.6/doctest.py", line 2150, in runTest
clear_globs=False)
File "/usr/local/lib/python2.6/doctest.py", line 1379, in run
return self.__run(test, compileflags, out)
File "/usr/local/lib/python2.6/doctest.py", line 1291, in __run
self.report_failure(out, test, example, got)
File "/usr/local/lib/python2.6/doctest.py", line 1155, in report_failure
self._checker.output_difference(example, got, self.optionflags))
File "/usr/local/lib/python2.6/doctest.py", line 1585, in
output_difference
return 'Expected:\n%sGot:\n%s' % (_indent(want), _indent(got))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 14:
ordinal not in range(128)
======================================================================
ERROR: Doctest: unicode_bug.string_repr
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.6/doctest.py", line 2150, in runTest
clear_globs=False)
File "/usr/local/lib/python2.6/doctest.py", line 1379, in run
return self.__run(test, compileflags, out)
File "/usr/local/lib/python2.6/doctest.py", line 1291, in __run
self.report_failure(out, test, example, got)
File "/usr/local/lib/python2.6/doctest.py", line 1155, in report_failure
self._checker.output_difference(example, got, self.optionflags))
File "/usr/local/lib/python2.6/doctest.py", line 2149, in <lambda>
test, out=lambda x: new.write(x.encode(output_encoding)),
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position
170: ordinal not in range(128)
======================================================================
ERROR: Doctest: unicode_bug.unicode_repr
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.6/doctest.py", line 2150, in runTest
clear_globs=False)
File "/usr/local/lib/python2.6/doctest.py", line 1379, in run
return self.__run(test, compileflags, out)
File "/usr/local/lib/python2.6/doctest.py", line 1291, in __run
self.report_failure(out, test, example, got)
File "/usr/local/lib/python2.6/doctest.py", line 1155, in report_failure
self._checker.output_difference(example, got, self.optionflags))
File "/usr/local/lib/python2.6/doctest.py", line 2149, in <lambda>
test, out=lambda x: new.write(x.encode(output_encoding)),
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position
172: ordinal not in range(128)
----------------------------------------------------------------------
Ran 6 tests in 0.010s
FAILED (errors=3)
$ nosetests --with-doctest -v unicode_bug_literals.py
Doctest: unicode_bug_literals.print_string ... ok
Doctest: unicode_bug_literals.print_unicode ... ok
Doctest: unicode_bug_literals.string_repr ... FAIL
Doctest: unicode_bug_literals.unicode_escape_repr ... FAIL
Doctest: unicode_bug_literals.unicode_repr ... FAIL
======================================================================
FAIL: Doctest: unicode_bug_literals.string_repr
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.6/doctest.py", line 2155, in runTest
raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for unicode_bug_literals.string_repr
File "/home/babilen/test/unicode_bug_literals.py", line 20, in string_repr
----------------------------------------------------------------------
File "/home/babilen/test/unicode_bug_literals.py", line 22, in
unicode_bug_literals.string_repr
Failed example:
'缺陷'
Expected:
u'缺陷'
Got:
u'\u7f3a\u9677'
======================================================================
FAIL: Doctest: unicode_bug_literals.unicode_escape_repr
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.6/doctest.py", line 2155, in runTest
raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for
unicode_bug_literals.unicode_escape_repr
File "/home/babilen/test/unicode_bug_literals.py", line 34, in
unicode_escape_repr
----------------------------------------------------------------------
File "/home/babilen/test/unicode_bug_literals.py", line 36, in
unicode_bug_literals.unicode_escape_repr
Failed example:
u'缺陷'
Expected:
u'缺陷'
Got:
u'\u7f3a\u9677'
======================================================================
FAIL: Doctest: unicode_bug_literals.unicode_repr
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.6/doctest.py", line 2155, in runTest
raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for unicode_bug_literals.unicode_repr
File "/home/babilen/test/unicode_bug_literals.py", line 27, in
unicode_repr
----------------------------------------------------------------------
File "/home/babilen/test/unicode_bug_literals.py", line 29, in
unicode_bug_literals.unicode_repr
Failed example:
u'缺陷'
Expected:
u'缺陷'
Got:
u'\u7f3a\u9677'
----------------------------------------------------------------------
Ran 5 tests in 0.009s
FAILED (failures=3)
--- snip ---
If you need further information do not hesitate to contact me.
with kind regards
Wolodja Wentland
|
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2009-10-21 15:08:41 | babilen | set | files: + unicode_bug_literals.py |
| 2009-10-21 15:02:17 | babilen | set | files:
+ unicode_bug.py nosy: + babilen messages: + msg94311 |
| 2009-07-01 20:24:08 | christoph | set | files:
+ doctest.unicode-2.patch messages: + msg89996 |
| 2009-06-30 15:04:05 | christoph | set | files:
+ doctest.unicode.patch nosy: + christoph messages: + msg89928 keywords: + patch |
| 2009-01-27 14:07:27 | rbp | set | nosy: + rbp |
| 2009-01-11 14:48:21 | luciano | set | files:
+ issue1293741.py nosy: + luciano title: doctest runner cannot handle non-ascii characters -> doctest runner cannot handle non-ascii characters messages: + msg79597 versions: + Python 2.6, Python 2.5 |
| 2005-09-17 11:41:22 | ogrisel | create | |