Issue 1293741: doctest runner cannot handle non-ascii characters

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/42376

classification

Title:	doctest runner cannot handle non-ascii characters
Type:	enhancement	Stage:
Components:	Extension Modules	Versions:	Python 2.7

process

Status:	closed	Resolution:	out of date
Dependencies:		Superseder:
Assigned To:		Nosy List:	akaihola, babilen, bjoti, christoph, eric.araujo, luciano, ogrisel, rbp, terry.reedy, tim.peters
Priority:	normal	Keywords:	patch

Created on 2005-09-17 11:41 by ogrisel, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
test_iso-8859-15.py	ogrisel, 2005-09-17 11:41	sample code that shows the problem
issue1293741.py	luciano, 2009-01-11 14:48	Short script demonstrating encoding problem on doctest output
doctest.unicode.patch	christoph, 2009-06-30 15:04	Patch for properly encoding Unicode strings
doctest.unicode-2.patch	christoph, 2009-07-01 20:24	Patch extending to DocTestCase.runTest()
unicode_bug.py	babilen, 2009-10-21 15:02	The script mentioned in the report
unicode_bug_literals.py	babilen, 2009-10-21 15:08	Doctest with unicode_literals

Messages (11)
msg26296 - (view)	Author: GRISEL (ogrisel)	Date: 2005-09-17 11:41
The doctest module fails when the expected result string has non-ascii charcaters even if the # -- coding: XXX -- line is properly set. The enclosed code sample produce the following error: Traceback (most recent call last): File "test_iso-8859-15.py", line 41, in ? _test() File "test_iso-8859-15.py", line 26, in _test tried, failed = runner.run(t) File "/usr/lib/python2.4/doctest.py", line 1376, in run return self.__run(test, compileflags, out) File "/usr/lib/python2.4/doctest.py", line 1259, in __run if check(example.want, got, self.optionflags): File "/usr/lib/python2.4/doctest.py", line 1475, in check_output if got == want: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 8: ordinal not in range(128)
msg26297 - (view)	Author: Tim Peters (tim.peters) *	Date: 2005-09-17 17:42
Logged In: YES user_id=31435 Please try the patch at http://www.python.org/sf/1080727 and report back on whether it solves your problem (attaching comments to the patch report would be most useful).
msg26298 - (view)	Author: GRISEL (ogrisel)	Date: 2005-09-18 10:25
Logged In: YES user_id=795041 Unfortunateny that patch does not fix my problem. The patch at bug #1080727 fixes the problem for doctests written in external reST files (testfile and DocFileTest functions). My problem is related to internal docstring encoding (testmod for instance). However, Bjorn Tillenius says: """ If one writes doctests within documentation strings of classes and functions, it's possible to use non-ASCII characters since one can specify the encoding used in the source file. """ So according to him, docstrings' doctests with non-ascii characters should work by default. So maybe my system setup is somewhat broken. Could somebody please confirm/infirm this by running the attached sample script on his system? My system config: LANG=fr_FR@euro (on linux) python 2.4.1 with: sys.getdefaultencoding() == 'ascii' and locale.getpreferredencoding() == 'ISO-8859-15' $ file test_iso-8859-15.py test_iso-8859-15.py: ISO-8859 English text
msg26299 - (view)	Author: Bjorn Tillenius (bjoti)	Date: 2006-02-16 11:41
Logged In: YES user_id=1032069 I'm quite sure that you can use non-ASCII characters in your doctest, given that it's a unicode string. So if you make your docstring a unicode string, it should work. That is: u"""Docstring containing non-ASCII characters. ... """
msg26300 - (view)	Author: Tim Peters (tim.peters) *	Date: 2006-04-24 01:21
Logged In: YES user_id=31435 Unassigned myself -- don't know enough about encodings.
msg26301 - (view)	Author: akaihola (akaihola)	Date: 2007-05-09 08:19
I made some tests with Python 2.5 on an Ubuntu Edgy system with an UTF-8 terminal. Here's the basic test which does work correctly: # -- encoding: utf-8 -- __doc__ = u""" >>> print u'ä' ä """ ; import doctest ; doctest.testmod() If I start to vary the "ä" (a with umlaut) characters in "print u'ä'" (the test) and the "ä" below it (expected result), I get a UnicodeEncodeError whenever doctest tries to print a message about non-matching test output. Here's a summary of my results in the format of test \| expected result \| success/failure Note that \u00e4 is unicode for the "ä" character. ä \| ä \| success \u00e4 \| ä \| success ä \| \u00e4 \| success \u00e4 \| \u00e4 \| success ä \| x \| fails to display message x \| ä \| fails to display message \u00e4 \| x \| fails to display message x \| \u00e4 \| fails to display message Conclusion: test running and output checking do work correctly, but there's a problem displaying messages about non-matching output whenever either the expected output or the output produced by the test contain any extended characters. The doctest documentation doesn't give any hint on work-arounds.
msg79597 - (view)	Author: Luciano Ramalho (luciano)	Date: 2009-01-11 14:48
I have confirmed everything that akaihola reports in Python 2.4, 2.5 and 2.6, but the problem is not limited to non-matching test output. It also happens with doctests with zero failures when the module is run with the -v command-line switch, or testmod is called with verbose=True. The attached file shows a work-around: handle the UnicodeEncodeError thrown by testmod, and display the "object" attribute of the exception to see exactly where the problem is.
msg89928 - (view)	Author: Christoph Burgmer (christoph)	Date: 2009-06-30 15:04
See attached patch which works for error reporting and verbose output.
msg89996 - (view)	Author: Christoph Burgmer (christoph)	Date: 2009-07-01 20:24
My last patch only changed the encoding used in DocTestRunner.run(). This new patch will apply the same to DocTestCase.runTest().
msg94311 - (view)	Author: Wolodja Wentland (babilen)	Date: 2009-10-21 15:02
Here is some more information. --- snip --- Normal behaviour ================ $ locale LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 LC_NUMERIC=POSIX LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=de_DE.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=de_DE.UTF-8 LC_TELEPHONE=de_DE.UTF-8 LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=de_DE.UTF-8 LC_ALL= $ python2.6 Python 2.6.3 (r263:75183, Oct 6 2009, 17:19:56) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> print '缺陷' 缺陷 >>> print u'缺陷' 缺陷 >>> '缺陷' '\xe7\xbc\xba\xe9\x99\xb7' >>> u'缺陷' u'\u7f3a\u9677' >>> '缺陷'.decode('utf8') u'\u7f3a\u9677' >>> u'\u7f3a\u9677' u'\u7f3a\u9677' >>> $ cat unicode_bug.py #!/usr/bin/env python # -- coding: UTF-8 -- def print_string(): """ >>> print '缺陷' 缺陷 """ pass def print_unicode(): """ >>> print u'缺陷' 缺陷 """ pass def string_repr(): """ >>> '缺陷' '\xe7\xbc\xba\xe9\x99\xb7' """ pass def unicode_repr(): """ >>> u'缺陷' u'\u7f3a\u9677' """ pass def decode(): """ >>> '缺陷'.decode('utf8') u'\u7f3a\u9677' """ pass def unicode_escape_repr(): """ >>> u'\u7f3a\u9677' u'\u7f3a\u9677' """ pass if __name__ == "__main__": import doctest doctest.testmod() $ python2.5 unicode_bug.py /usr/lib/python2.5/doctest.py:1460: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal if got == want: /usr/lib/python2.5/doctest.py:1480: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal if got == want: Traceback (most recent call last): File "unicode_bug.py", line 48, in <module> doctest.testmod() File "/usr/lib/python2.5/doctest.py", line 1815, in testmod runner.run(test) File "/usr/lib/python2.5/doctest.py", line 1361, in run return self.__run(test, compileflags, out) File "/usr/lib/python2.5/doctest.py", line 1277, in __run self.report_failure(out, test, example, got) File "/usr/lib/python2.5/doctest.py", line 1141, in report_failure self._checker.output_difference(example, got, self.optionflags)) File "/usr/lib/python2.5/doctest.py", line 1565, in output_difference return 'Expected:\n%sGot:\n%s' % (_indent(want), _indent(got)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 14: ordinal not in range(128) $ python2.6 unicode_bug.py /usr/local/lib/python2.6/doctest.py:1475: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal if got == want: /usr/local/lib/python2.6/doctest.py:1495: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal if got == want: Traceback (most recent call last): File "unicode_bug.py", line 48, in <module> doctest.testmod() File "/usr/local/lib/python2.6/doctest.py", line 1830, in testmod runner.run(test) File "/usr/local/lib/python2.6/doctest.py", line 1374, in run return self.__run(test, compileflags, out) File "/usr/local/lib/python2.6/doctest.py", line 1290, in __run self.report_failure(out, test, example, got) File "/usr/local/lib/python2.6/doctest.py", line 1154, in report_failure self._checker.output_difference(example, got, self.optionflags)) File "/usr/local/lib/python2.6/doctest.py", line 1580, in output_difference return 'Expected:\n%sGot:\n%s' % (_indent(want), _indent(got)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 14: ordinal not in range(128) $ nosetests -V nosetests version 0.11.1 $ nosetests --with-doctest -v unicode_bug.py Doctest: unicode_bug.decode ... ok Doctest: unicode_bug.print_string ... ok Doctest: unicode_bug.print_unicode ... /usr/local/lib/python2.6/doctest.py:1475: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal if got == want: /usr/local/lib/python2.6/doctest.py:1495: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal if got == want: ERROR Doctest: unicode_bug.string_repr ... FAIL Doctest: unicode_bug.unicode_escape_repr ... ok Doctest: unicode_bug.unicode_repr ... FAIL ====================================================================== ERROR: Doctest: unicode_bug.print_unicode ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/doctest.py", line 2140, in runTest test, out=new.write, clear_globs=False) File "/usr/local/lib/python2.6/doctest.py", line 1374, in run return self.__run(test, compileflags, out) File "/usr/local/lib/python2.6/doctest.py", line 1290, in __run self.report_failure(out, test, example, got) File "/usr/local/lib/python2.6/doctest.py", line 1154, in report_failure self._checker.output_difference(example, got, self.optionflags)) File "/usr/local/lib/python2.6/doctest.py", line 1580, in output_difference return 'Expected:\n%sGot:\n%s' % (_indent(want), _indent(got)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 14: ordinal not in range(128) ====================================================================== FAIL: Doctest: unicode_bug.string_repr ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/doctest.py", line 2145, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for unicode_bug.string_repr File "/home/babilen/test/unicode_bug.py", line 18, in string_repr ---------------------------------------------------------------------- File "/home/babilen/test/unicode_bug.py", line 20, in unicode_bug.string_repr Failed example: '缺陷' Expected: '缺陷' Got: '\xe7\xbc\xba\xe9\x99\xb7' ====================================================================== FAIL: Doctest: unicode_bug.unicode_repr ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/doctest.py", line 2145, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for unicode_bug.unicode_repr File ".../unicode_bug.py", line 25, in unicode_repr ---------------------------------------------------------------------- File ".../unicode_bug.py", line 27, in unicode_bug.unicode_repr Failed example: u'缺陷' Expected: u'\u7f3a\u9677' Got: u'\xe7\xbc\xba\xe9\x99\xb7' ---------------------------------------------------------------------- Ran 6 tests in 0.011s FAILED (errors=1, failures=2) --- snip --- unicode_literals ================ $ python2.6 Python 2.6.3 (r263:75183, Oct 6 2009, 17:19:56) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from __future__ import unicode_literals >>> print '缺陷' 缺陷 >>> print u'缺陷' 缺陷 >>> '缺陷' u'\u7f3a\u9677' >>> u'缺陷' u'\u7f3a\u9677' >>> '缺陷'.decode('utf8') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.6/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128) >>> u'\u7f3a\u9677' u'\u7f3a\u9677' $ cat unicode_bug_literals.py #!/usr/bin/env python # -- coding: UTF-8 -- from __future__ import unicode_literals def print_string(): """ >>> print '缺陷' 缺陷 """ pass def print_unicode(): """ >>> print u'缺陷' 缺陷 """ pass def string_repr(): """ >>> '缺陷' u'\u7f3a\u9677' """ pass def unicode_repr(): """ >>> u'缺陷' u'\u7f3a\u9677' """ pass def unicode_escape_repr(): """ >>> u'\u7f3a\u9677' u'\u7f3a\u9677' """ pass if __name__ == "__main__": import doctest doctest.testmod() $ python2.6 unicode_bug_literals.py Traceback (most recent call last): File "unicode_bug_literals.py", line 43, in <module> doctest.testmod() File "/usr/local/lib/python2.6/doctest.py", line 1830, in testmod runner.run(test) File "/usr/local/lib/python2.6/doctest.py", line 1374, in run return self.__run(test, compileflags, out) File "/usr/local/lib/python2.6/doctest.py", line 1290, in __run self.report_failure(out, test, example, got) File "/usr/local/lib/python2.6/doctest.py", line 1154, in report_failure self._checker.output_difference(example, got, self.optionflags)) UnicodeEncodeError: 'ascii' codec can't encode characters in position 157-158: ordinal not in range(128) $ nosetests --with-doctest -v unicode_bug_literals.py Doctest: unicode_bug_literals.print_string ... ok Doctest: unicode_bug_literals.print_unicode ... ok Doctest: unicode_bug_literals.string_repr ... FAIL Doctest: unicode_bug_literals.unicode_escape_repr ... FAIL Doctest: unicode_bug_literals.unicode_repr ... FAIL ====================================================================== FAIL: Doctest: unicode_bug_literals.string_repr ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/doctest.py", line 2145, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: <unprintable AssertionError object> ====================================================================== FAIL: Doctest: unicode_bug_literals.unicode_escape_repr ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/doctest.py", line 2145, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: <unprintable AssertionError object> ====================================================================== FAIL: Doctest: unicode_bug_literals.unicode_repr ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/doctest.py", line 2145, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: <unprintable AssertionError object> ---------------------------------------------------------------------- Ran 5 tests in 0.011s FAILED (failures=3) --- snip --- With doctest.unicode-2.patch ============================ $ nosetests --with-doctest -v unicode_bug.py Doctest: unicode_bug.decode ... ok Doctest: unicode_bug.print_string ... ok Doctest: unicode_bug.print_unicode ... /usr/local/lib/python2.6/doctest.py:1480: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal if got == want: /usr/local/lib/python2.6/doctest.py:1500: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal if got == want: ERROR Doctest: unicode_bug.string_repr ... ERROR Doctest: unicode_bug.unicode_escape_repr ... ok Doctest: unicode_bug.unicode_repr ... ERROR ====================================================================== ERROR: Doctest: unicode_bug.print_unicode ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/doctest.py", line 2150, in runTest clear_globs=False) File "/usr/local/lib/python2.6/doctest.py", line 1379, in run return self.__run(test, compileflags, out) File "/usr/local/lib/python2.6/doctest.py", line 1291, in __run self.report_failure(out, test, example, got) File "/usr/local/lib/python2.6/doctest.py", line 1155, in report_failure self._checker.output_difference(example, got, self.optionflags)) File "/usr/local/lib/python2.6/doctest.py", line 1585, in output_difference return 'Expected:\n%sGot:\n%s' % (_indent(want), _indent(got)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 14: ordinal not in range(128) ====================================================================== ERROR: Doctest: unicode_bug.string_repr ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/doctest.py", line 2150, in runTest clear_globs=False) File "/usr/local/lib/python2.6/doctest.py", line 1379, in run return self.__run(test, compileflags, out) File "/usr/local/lib/python2.6/doctest.py", line 1291, in __run self.report_failure(out, test, example, got) File "/usr/local/lib/python2.6/doctest.py", line 1155, in report_failure self._checker.output_difference(example, got, self.optionflags)) File "/usr/local/lib/python2.6/doctest.py", line 2149, in <lambda> test, out=lambda x: new.write(x.encode(output_encoding)), UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 170: ordinal not in range(128) ====================================================================== ERROR: Doctest: unicode_bug.unicode_repr ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/doctest.py", line 2150, in runTest clear_globs=False) File "/usr/local/lib/python2.6/doctest.py", line 1379, in run return self.__run(test, compileflags, out) File "/usr/local/lib/python2.6/doctest.py", line 1291, in __run self.report_failure(out, test, example, got) File "/usr/local/lib/python2.6/doctest.py", line 1155, in report_failure self._checker.output_difference(example, got, self.optionflags)) File "/usr/local/lib/python2.6/doctest.py", line 2149, in <lambda> test, out=lambda x: new.write(x.encode(output_encoding)), UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 172: ordinal not in range(128) ---------------------------------------------------------------------- Ran 6 tests in 0.010s FAILED (errors=3) $ nosetests --with-doctest -v unicode_bug_literals.py Doctest: unicode_bug_literals.print_string ... ok Doctest: unicode_bug_literals.print_unicode ... ok Doctest: unicode_bug_literals.string_repr ... FAIL Doctest: unicode_bug_literals.unicode_escape_repr ... FAIL Doctest: unicode_bug_literals.unicode_repr ... FAIL ====================================================================== FAIL: Doctest: unicode_bug_literals.string_repr ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/doctest.py", line 2155, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for unicode_bug_literals.string_repr File "/home/babilen/test/unicode_bug_literals.py", line 20, in string_repr ---------------------------------------------------------------------- File "/home/babilen/test/unicode_bug_literals.py", line 22, in unicode_bug_literals.string_repr Failed example: '缺陷' Expected: u'缺陷' Got: u'\u7f3a\u9677' ====================================================================== FAIL: Doctest: unicode_bug_literals.unicode_escape_repr ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/doctest.py", line 2155, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for unicode_bug_literals.unicode_escape_repr File "/home/babilen/test/unicode_bug_literals.py", line 34, in unicode_escape_repr ---------------------------------------------------------------------- File "/home/babilen/test/unicode_bug_literals.py", line 36, in unicode_bug_literals.unicode_escape_repr Failed example: u'缺陷' Expected: u'缺陷' Got: u'\u7f3a\u9677' ====================================================================== FAIL: Doctest: unicode_bug_literals.unicode_repr ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/doctest.py", line 2155, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for unicode_bug_literals.unicode_repr File "/home/babilen/test/unicode_bug_literals.py", line 27, in unicode_repr ---------------------------------------------------------------------- File "/home/babilen/test/unicode_bug_literals.py", line 29, in unicode_bug_literals.unicode_repr Failed example: u'缺陷' Expected: u'缺陷' Got: u'\u7f3a\u9677' ---------------------------------------------------------------------- Ran 5 tests in 0.009s FAILED (failures=3) --- snip --- If you need further information do not hesitate to contact me. with kind regards Wolodja Wentland
msg112724 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-08-03 23:53
OP: "The doctest module fails when the expected result string has non-ascii charcaters even if the # -- coding: XXX -- line is properly set." I believe the claim in msg70907 of #2811 is correct: the file encoding only affects the conversion of unicode literals to unicode objects. It does not affect the conversion of byte literals to byte string objects. Nor does it affect the later interpretation of byte strings by testmod. As msg26299 also says, make the doctstring a unicode, not byte string, to have the encoding cookie take effect. So the original bug claim is invalid. That aside, the issue was fixed in 3.0 by making text be unicode. Seriously, issues like this were part of the motivation for 3.0. That aside, test modules are not revised in bugfix releases without severe reason. Closing for all these reasons: invalid, out-of-date, fixed; take one's pick.

History
Date	User	Action	Args
2022-04-11 14:56:13	admin	set	github: 42376
2010-08-03 23:53:47	terry.reedy	set	status: open -> closed type: enhancement versions: + Python 2.7, - Python 2.6, Python 2.5, Python 2.4 nosy: + terry.reedy messages: + msg112724 resolution: out of date
2009-11-28 00:00:12	eric.araujo	set	nosy: + eric.araujo
2009-10-21 15:08:41	babilen	set	files: + unicode_bug_literals.py
2009-10-21 15:02:17	babilen	set	files: + unicode_bug.py nosy: + babilen messages: + msg94311
2009-07-01 20:24:08	christoph	set	files: + doctest.unicode-2.patch messages: + msg89996
2009-06-30 15:04:05	christoph	set	files: + doctest.unicode.patch nosy: + christoph messages: + msg89928 keywords: + patch
2009-01-27 14:07:27	rbp	set	nosy: + rbp
2009-01-11 14:48:21	luciano	set	files: + issue1293741.py nosy: + luciano title: doctest runner cannot handle non-ascii characters -> doctest runner cannot handle non-ascii characters messages: + msg79597 versions: + Python 2.6, Python 2.5
2005-09-17 11:41:22	ogrisel	create