classification
Title: doctest runner cannot handle non-ascii characters
Type: Stage:
Components: Extension Modules Versions: Python 2.6, Python 2.5, Python 2.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: akaihola, babilen, bjoti, christoph, luciano, ogrisel, rbp, tim_one (8)
Priority: normal Keywords patch

Created on 2005-09-17 11:41 by ogrisel, last changed 2009-10-21 15:08 by babilen.

Files
File name Uploaded Description Edit Remove
test_iso-8859-15.py ogrisel, 2005-09-17 11:41 sample code that shows the problem
issue1293741.py luciano, 2009-01-11 14:48 Short script demonstrating encoding problem on doctest output
doctest.unicode.patch christoph, 2009-06-30 15:04 Patch for properly encoding Unicode strings
doctest.unicode-2.patch christoph, 2009-07-01 20:24 Patch extending to DocTestCase.runTest()
unicode_bug.py babilen, 2009-10-21 15:02 The script mentioned in the report
unicode_bug_literals.py babilen, 2009-10-21 15:08 Doctest with unicode_literals
Messages (10)
msg26296 - (view) Author: GRISEL (ogrisel) Date: 2005-09-17 11:41
The doctest module fails when the expected result
string has non-ascii charcaters even if the # -*-
coding: XXX -*- line is properly set.

The enclosed code sample produce the following error:

Traceback (most recent call last):
  File "test_iso-8859-15.py", line 41, in ?
    _test()
  File "test_iso-8859-15.py", line 26, in _test
    tried, failed = runner.run(t)
  File "/usr/lib/python2.4/doctest.py", line 1376, in run
    return self.__run(test, compileflags, out)
  File "/usr/lib/python2.4/doctest.py", line 1259, in __run
    if check(example.want, got, self.optionflags):
  File "/usr/lib/python2.4/doctest.py", line 1475, in
check_output
    if got == want:
UnicodeDecodeError: 'ascii' codec can't decode byte
0xe9 in position 8: ordinal not in range(128)

msg26297 - (view) Author: Tim Peters (tim_one) Date: 2005-09-17 17:42
Logged In: YES 
user_id=31435

Please try the patch at

http://www.python.org/sf/1080727

and report back on whether it solves your problem (attaching 
comments to the patch report would be most useful).
msg26298 - (view) Author: GRISEL (ogrisel) Date: 2005-09-18 10:25
Logged In: YES 
user_id=795041

Unfortunateny that patch does not fix my problem. The patch
at bug #1080727 fixes the problem for doctests written in
external reST files (testfile and DocFileTest functions). My
problem is related to internal docstring encoding (testmod
for instance). However, Bjorn Tillenius says:
"""
If one writes doctests within documentation strings of
classes and
functions, it's possible to use non-ASCII characters since
one can
specify the encoding used in the source file.
"""
So according to him, docstrings' doctests with non-ascii
characters should work by default. So maybe my system setup
is somewhat broken. Could somebody please confirm/infirm
this by running the attached sample script on his system?

My system config:
LANG=fr_FR@euro (on linux)
python 2.4.1 with:  sys.getdefaultencoding() == 'ascii' 
and locale.getpreferredencoding() == 'ISO-8859-15'
$ file test_iso-8859-15.py
test_iso-8859-15.py: ISO-8859 English text
msg26299 - (view) Author: Bjorn Tillenius (bjoti) Date: 2006-02-16 11:41
Logged In: YES 
user_id=1032069

I'm quite sure that you can use non-ASCII characters in 
your doctest, given that it's a unicode string. So if you 
make your docstring a unicode string, it should work. That 
is:

u"""Docstring containing non-ASCII characters.
...
"""
msg26300 - (view) Author: Tim Peters (tim_one) Date: 2006-04-24 01:21
Logged In: YES 
user_id=31435

Unassigned myself -- don't know enough about encodings.
msg26301 - (view) Author: akaihola (akaihola) Date: 2007-05-09 08:19
I made some tests with Python 2.5 on an Ubuntu Edgy system with an UTF-8 terminal. Here's the basic test which does work correctly:

# -*- encoding: utf-8 -*-
__doc__ = u"""
>>> print u'ä'
ä
""" ; import doctest ; doctest.testmod()

If I start to vary the "ä" (a with umlaut) characters in "print u'ä'" (the test) and the "ä" below it (expected result), I get a UnicodeEncodeError whenever doctest tries to print a message about non-matching test output.

Here's a summary of my results in the format of
test | expected result | success/failure
Note that \u00e4 is unicode for the "ä" character.

ä      | ä      | success
\u00e4 | ä      | success
ä      | \u00e4 | success
\u00e4 | \u00e4 | success
ä      | x      | fails to display message
x      | ä      | fails to display message
\u00e4 | x      | fails to display message
x      | \u00e4 | fails to display message

Conclusion: test running and output checking do work correctly, but there's a problem displaying messages about non-matching output whenever either the expected output or the output produced by the test contain any extended characters.

The doctest documentation doesn't give any hint on work-arounds.
msg79597 - (view) Author: Luciano Ramalho (luciano) Date: 2009-01-11 14:48
I have confirmed everything that akaihola reports in Python 2.4, 2.5 and
2.6, but the problem is not limited to non-matching test output. It also
happens with doctests with zero failures when the module is run with the
-v command-line switch, or testmod is called with verbose=True.

The attached file shows a work-around: handle the UnicodeEncodeError
thrown by testmod, and display the "object" attribute of the exception
to see exactly where the problem is.
msg89928 - (view) Author: Christoph Burgmer (christoph) Date: 2009-06-30 15:04
See attached patch which works for error reporting and verbose output.
msg89996 - (view) Author: Christoph Burgmer (christoph) Date: 2009-07-01 20:24
My last patch only changed the encoding used in DocTestRunner.run().
This new patch will apply the same to DocTestCase.runTest().
msg94311 - (view) Author: Wolodja Wentland (babilen) Date: 2009-10-21 15:02
Here is some more information.

--- snip ---

Normal behaviour
================

$ locale
LANG=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_NUMERIC=POSIX
LC_TIME=en_GB.UTF-8
LC_COLLATE=en_GB.UTF-8
LC_MONETARY=de_DE.UTF-8
LC_MESSAGES=en_US.UTF-8
LC_PAPER=de_DE.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=de_DE.UTF-8
LC_TELEPHONE=de_DE.UTF-8
LC_MEASUREMENT=de_DE.UTF-8
LC_IDENTIFICATION=de_DE.UTF-8
LC_ALL=
$ python2.6
Python 2.6.3 (r263:75183, Oct  6 2009, 17:19:56) 
[GCC 4.3.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print '缺陷'
缺陷
>>> print u'缺陷'
缺陷
>>> '缺陷'
'\xe7\xbc\xba\xe9\x99\xb7'
>>> u'缺陷'
u'\u7f3a\u9677'
>>> '缺陷'.decode('utf8')
u'\u7f3a\u9677'
>>> u'\u7f3a\u9677'
u'\u7f3a\u9677'
>>> 
$ cat unicode_bug.py 
#!/usr/bin/env python
# -*- coding: UTF-8 -*-

def print_string():
    """
    >>> print '缺陷'
    缺陷
    """
    pass

def print_unicode():
    """
    >>> print u'缺陷'
    缺陷
    """
    pass

def string_repr():
    """
    >>> '缺陷'
    '\xe7\xbc\xba\xe9\x99\xb7'
    """
    pass

def unicode_repr():
    """
    >>> u'缺陷'
    u'\u7f3a\u9677'
    """
    pass

def decode():
    """
    >>> '缺陷'.decode('utf8')
    u'\u7f3a\u9677'
    """
    pass

def unicode_escape_repr():
    """
    >>> u'\u7f3a\u9677'
    u'\u7f3a\u9677'
    """
    pass

if __name__ == "__main__":
    import doctest
    doctest.testmod()

$ python2.5 unicode_bug.py 
/usr/lib/python2.5/doctest.py:1460: UnicodeWarning: Unicode equal
comparison failed to convert both arguments to Unicode - interpreting
them as being unequal
  if got == want:
/usr/lib/python2.5/doctest.py:1480: UnicodeWarning: Unicode equal
comparison failed to convert both arguments to Unicode - interpreting
them as being unequal
  if got == want:
Traceback (most recent call last):
  File "unicode_bug.py", line 48, in <module>
    doctest.testmod()
  File "/usr/lib/python2.5/doctest.py", line 1815, in testmod
    runner.run(test)
  File "/usr/lib/python2.5/doctest.py", line 1361, in run
    return self.__run(test, compileflags, out)
  File "/usr/lib/python2.5/doctest.py", line 1277, in __run
    self.report_failure(out, test, example, got)
  File "/usr/lib/python2.5/doctest.py", line 1141, in report_failure
    self._checker.output_difference(example, got, self.optionflags))
  File "/usr/lib/python2.5/doctest.py", line 1565, in output_difference
    return 'Expected:\n%sGot:\n%s' % (_indent(want), _indent(got))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 14:
ordinal not in range(128)

$ python2.6 unicode_bug.py 
/usr/local/lib/python2.6/doctest.py:1475: UnicodeWarning: Unicode equal
comparison failed to convert both arguments to Unicode - interpreting
them as being unequal
  if got == want:
/usr/local/lib/python2.6/doctest.py:1495: UnicodeWarning: Unicode equal
comparison failed to convert both arguments to Unicode - interpreting
them as being unequal
  if got == want:
Traceback (most recent call last):
  File "unicode_bug.py", line 48, in <module>
    doctest.testmod()
  File "/usr/local/lib/python2.6/doctest.py", line 1830, in testmod
    runner.run(test)
  File "/usr/local/lib/python2.6/doctest.py", line 1374, in run
    return self.__run(test, compileflags, out)
  File "/usr/local/lib/python2.6/doctest.py", line 1290, in __run
    self.report_failure(out, test, example, got)
  File "/usr/local/lib/python2.6/doctest.py", line 1154, in report_failure
    self._checker.output_difference(example, got, self.optionflags))
  File "/usr/local/lib/python2.6/doctest.py", line 1580, in
output_difference
    return 'Expected:\n%sGot:\n%s' % (_indent(want), _indent(got))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 14:
ordinal not in range(128)

$ nosetests -V
nosetests version 0.11.1

$ nosetests --with-doctest -v unicode_bug.py 
Doctest: unicode_bug.decode ... ok
Doctest: unicode_bug.print_string ... ok
Doctest: unicode_bug.print_unicode ...
/usr/local/lib/python2.6/doctest.py:1475: UnicodeWarning: Unicode equal
comparison failed to convert both arguments to Unicode - interpreting
them as being unequal
  if got == want:
/usr/local/lib/python2.6/doctest.py:1495: UnicodeWarning: Unicode equal
comparison failed to convert both arguments to Unicode - interpreting
them as being unequal
  if got == want:
ERROR
Doctest: unicode_bug.string_repr ... FAIL
Doctest: unicode_bug.unicode_escape_repr ... ok
Doctest: unicode_bug.unicode_repr ... FAIL

======================================================================
ERROR: Doctest: unicode_bug.print_unicode
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.6/doctest.py", line 2140, in runTest
    test, out=new.write, clear_globs=False)
  File "/usr/local/lib/python2.6/doctest.py", line 1374, in run
    return self.__run(test, compileflags, out)
  File "/usr/local/lib/python2.6/doctest.py", line 1290, in __run
    self.report_failure(out, test, example, got)
  File "/usr/local/lib/python2.6/doctest.py", line 1154, in report_failure
    self._checker.output_difference(example, got, self.optionflags))
  File "/usr/local/lib/python2.6/doctest.py", line 1580, in
output_difference
    return 'Expected:\n%sGot:\n%s' % (_indent(want), _indent(got))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 14:
ordinal not in range(128)

======================================================================
FAIL: Doctest: unicode_bug.string_repr
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.6/doctest.py", line 2145, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for unicode_bug.string_repr
  File "/home/babilen/test/unicode_bug.py", line 18, in string_repr

----------------------------------------------------------------------
File "/home/babilen/test/unicode_bug.py", line 20, in
unicode_bug.string_repr
Failed example:
    '缺陷'
Expected:
    '缺陷'
Got:
    '\xe7\xbc\xba\xe9\x99\xb7'


======================================================================
FAIL: Doctest: unicode_bug.unicode_repr
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.6/doctest.py", line 2145, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for unicode_bug.unicode_repr
  File ".../unicode_bug.py", line 25, in unicode_repr

----------------------------------------------------------------------
File ".../unicode_bug.py", line 27, in unicode_bug.unicode_repr
Failed example:
    u'缺陷'
Expected:
    u'\u7f3a\u9677'
Got:
    u'\xe7\xbc\xba\xe9\x99\xb7'


----------------------------------------------------------------------
Ran 6 tests in 0.011s

FAILED (errors=1, failures=2)

--- snip ---

unicode_literals
================

$ python2.6
Python 2.6.3 (r263:75183, Oct  6 2009, 17:19:56) 
[GCC 4.3.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from __future__ import unicode_literals
>>> print '缺陷'
缺陷
>>> print u'缺陷'
缺陷
>>> '缺陷'
u'\u7f3a\u9677'
>>> u'缺陷'
u'\u7f3a\u9677'
>>> '缺陷'.decode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-1: ordinal not in range(128)
>>> u'\u7f3a\u9677'
u'\u7f3a\u9677'

$ cat unicode_bug_literals.py
#!/usr/bin/env python
# -*- coding: UTF-8 -*-

from __future__ import unicode_literals

def print_string():
    """
    >>> print '缺陷'
    缺陷
    """
    pass

def print_unicode():
    """
    >>> print u'缺陷'
    缺陷
    """
    pass

def string_repr():
    """
    >>> '缺陷'
    u'\u7f3a\u9677'
    """
    pass

def unicode_repr():
    """
    >>> u'缺陷'
    u'\u7f3a\u9677'
    """
    pass

def unicode_escape_repr():
    """
    >>> u'\u7f3a\u9677'
    u'\u7f3a\u9677'
    """
    pass

if __name__ == "__main__":
    import doctest
    doctest.testmod()

$ python2.6 unicode_bug_literals.py
Traceback (most recent call last):
  File "unicode_bug_literals.py", line 43, in <module>
    doctest.testmod()
  File "/usr/local/lib/python2.6/doctest.py", line 1830, in testmod
    runner.run(test)
  File "/usr/local/lib/python2.6/doctest.py", line 1374, in run
    return self.__run(test, compileflags, out)
  File "/usr/local/lib/python2.6/doctest.py", line 1290, in __run
    self.report_failure(out, test, example, got)
  File "/usr/local/lib/python2.6/doctest.py", line 1154, in report_failure
    self._checker.output_difference(example, got, self.optionflags))
UnicodeEncodeError: 'ascii' codec can't encode characters in position
157-158: ordinal not in range(128)

$ nosetests --with-doctest -v unicode_bug_literals.py 
Doctest: unicode_bug_literals.print_string ... ok
Doctest: unicode_bug_literals.print_unicode ... ok
Doctest: unicode_bug_literals.string_repr ... FAIL
Doctest: unicode_bug_literals.unicode_escape_repr ... FAIL
Doctest: unicode_bug_literals.unicode_repr ... FAIL

======================================================================
FAIL: Doctest: unicode_bug_literals.string_repr
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.6/doctest.py", line 2145, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: <unprintable AssertionError object>

======================================================================
FAIL: Doctest: unicode_bug_literals.unicode_escape_repr
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.6/doctest.py", line 2145, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: <unprintable AssertionError object>

======================================================================
FAIL: Doctest: unicode_bug_literals.unicode_repr
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.6/doctest.py", line 2145, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: <unprintable AssertionError object>

----------------------------------------------------------------------
Ran 5 tests in 0.011s

FAILED (failures=3)

--- snip ---

With doctest.unicode-2.patch
============================

$ nosetests --with-doctest -v unicode_bug.py 
Doctest: unicode_bug.decode ... ok
Doctest: unicode_bug.print_string ... ok
Doctest: unicode_bug.print_unicode ...
/usr/local/lib/python2.6/doctest.py:1480: UnicodeWarning: Unicode equal
comparison failed to convert both arguments to Unicode - interpreting
them as being unequal
  if got == want:
/usr/local/lib/python2.6/doctest.py:1500: UnicodeWarning: Unicode equal
comparison failed to convert both arguments to Unicode - interpreting
them as being unequal
  if got == want:
ERROR
Doctest: unicode_bug.string_repr ... ERROR
Doctest: unicode_bug.unicode_escape_repr ... ok
Doctest: unicode_bug.unicode_repr ... ERROR

======================================================================
ERROR: Doctest: unicode_bug.print_unicode
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.6/doctest.py", line 2150, in runTest
    clear_globs=False)
  File "/usr/local/lib/python2.6/doctest.py", line 1379, in run
    return self.__run(test, compileflags, out)
  File "/usr/local/lib/python2.6/doctest.py", line 1291, in __run
    self.report_failure(out, test, example, got)
  File "/usr/local/lib/python2.6/doctest.py", line 1155, in report_failure
    self._checker.output_difference(example, got, self.optionflags))
  File "/usr/local/lib/python2.6/doctest.py", line 1585, in
output_difference
    return 'Expected:\n%sGot:\n%s' % (_indent(want), _indent(got))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 14:
ordinal not in range(128)

======================================================================
ERROR: Doctest: unicode_bug.string_repr
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.6/doctest.py", line 2150, in runTest
    clear_globs=False)
  File "/usr/local/lib/python2.6/doctest.py", line 1379, in run
    return self.__run(test, compileflags, out)
  File "/usr/local/lib/python2.6/doctest.py", line 1291, in __run
    self.report_failure(out, test, example, got)
  File "/usr/local/lib/python2.6/doctest.py", line 1155, in report_failure
    self._checker.output_difference(example, got, self.optionflags))
  File "/usr/local/lib/python2.6/doctest.py", line 2149, in <lambda>
    test, out=lambda x: new.write(x.encode(output_encoding)),
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position
170: ordinal not in range(128)

======================================================================
ERROR: Doctest: unicode_bug.unicode_repr
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.6/doctest.py", line 2150, in runTest
    clear_globs=False)
  File "/usr/local/lib/python2.6/doctest.py", line 1379, in run
    return self.__run(test, compileflags, out)
  File "/usr/local/lib/python2.6/doctest.py", line 1291, in __run
    self.report_failure(out, test, example, got)
  File "/usr/local/lib/python2.6/doctest.py", line 1155, in report_failure
    self._checker.output_difference(example, got, self.optionflags))
  File "/usr/local/lib/python2.6/doctest.py", line 2149, in <lambda>
    test, out=lambda x: new.write(x.encode(output_encoding)),
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position
172: ordinal not in range(128)

----------------------------------------------------------------------
Ran 6 tests in 0.010s

FAILED (errors=3)

$ nosetests --with-doctest -v unicode_bug_literals.py 
Doctest: unicode_bug_literals.print_string ... ok
Doctest: unicode_bug_literals.print_unicode ... ok
Doctest: unicode_bug_literals.string_repr ... FAIL
Doctest: unicode_bug_literals.unicode_escape_repr ... FAIL
Doctest: unicode_bug_literals.unicode_repr ... FAIL

======================================================================
FAIL: Doctest: unicode_bug_literals.string_repr
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.6/doctest.py", line 2155, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for unicode_bug_literals.string_repr
  File "/home/babilen/test/unicode_bug_literals.py", line 20, in string_repr

----------------------------------------------------------------------
File "/home/babilen/test/unicode_bug_literals.py", line 22, in
unicode_bug_literals.string_repr
Failed example:
    '缺陷'
Expected:
    u'缺陷'
Got:
    u'\u7f3a\u9677'


======================================================================
FAIL: Doctest: unicode_bug_literals.unicode_escape_repr
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.6/doctest.py", line 2155, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for
unicode_bug_literals.unicode_escape_repr
  File "/home/babilen/test/unicode_bug_literals.py", line 34, in
unicode_escape_repr

----------------------------------------------------------------------
File "/home/babilen/test/unicode_bug_literals.py", line 36, in
unicode_bug_literals.unicode_escape_repr
Failed example:
    u'缺陷'
Expected:
    u'缺陷'
Got:
    u'\u7f3a\u9677'


======================================================================
FAIL: Doctest: unicode_bug_literals.unicode_repr
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.6/doctest.py", line 2155, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for unicode_bug_literals.unicode_repr
  File "/home/babilen/test/unicode_bug_literals.py", line 27, in
unicode_repr

----------------------------------------------------------------------
File "/home/babilen/test/unicode_bug_literals.py", line 29, in
unicode_bug_literals.unicode_repr
Failed example:
    u'缺陷'
Expected:
    u'缺陷'
Got:
    u'\u7f3a\u9677'


----------------------------------------------------------------------
Ran 5 tests in 0.009s

FAILED (failures=3)

--- snip ---

If you need further information do not hesitate to contact me.

with kind regards

    Wolodja Wentland
History
Date User Action Args
2009-10-21 15:08:41babilensetfiles: + unicode_bug_literals.py
2009-10-21 15:02:17babilensetfiles: + unicode_bug.py
nosy: + babilen
messages: + msg94311

2009-07-01 20:24:08christophsetfiles: + doctest.unicode-2.patch

messages: + msg89996
2009-06-30 15:04:05christophsetfiles: + doctest.unicode.patch

nosy: + christoph
messages: + msg89928

keywords: + patch
2009-01-27 14:07:27rbpsetnosy: + rbp
2009-01-11 14:48:21lucianosetfiles: + issue1293741.py
nosy: + luciano
title: doctest runner cannot handle non-ascii characters -> doctest runner cannot handle non-ascii characters
messages: + msg79597
versions: + Python 2.6, Python 2.5
2005-09-17 11:41:22ogriselcreate