This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author kmtracey
Recipients kmtracey, neves
Date 2008-08-08.16:14:21
SpamBayes Score 1.03602e-10
Marked as misclassified No
Message-id <1218212066.25.0.0863479351419.issue2811@psf.upfronthosting.co.za>
In-reply-to
Content
I believe the problem is in your test file, not doctest.  The enclosing
doctest string is not specified as a unicode literal, so the file
encoding specification ultimately has no effect on it.  At least that is
how I read the documentation regarding the effect of the ecoding
declaration ("The encoding is used for all lexical analysis, in
particular to find the end of a string, and to interpret the contents of
Unicode literals. String literals are converted to Unicode for
syntactical analysis, then converted back to their original encoding
before interpretation starts.")

If you change the test file so that the string enclosing the test is a
unicode literal then the test passes:

user@gutsy:~/tmp$ cat test_iso-8859-15.py
# -*- coding: utf-8 -*-

import doctest

def normalize(s):
    u"""
    >>> normalize(u'á')
    u'b'
    """
    return s.translate({ord(u'á'): u'b'})
    
doctest.testmod()
print 'without doctest ===>>>', normalize(u'á')

user@gutsy:~/tmp$ python test_iso-8859-15.py
without doctest ===>>> b

-----

There is a problem with this, though: doctest now will be unable to
correctly report errors when there are output mismatches involving
unicode strings with non-ASCII chars.  For example if you add an 'x' to
the front of your unicode literal to be normalized you'll get this when
you try to run it:

user@gutsy:~/tmp$ python test_iso-8859-15.py
Traceback (most recent call last):
  File "test_iso-8859-15.py", line 12, in <module>
    doctest.testmod()
  File "/usr/lib/python2.5/doctest.py", line 1799, in testmod
    runner.run(test)
  File "/usr/lib/python2.5/doctest.py", line 1345, in run
    return self.__run(test, compileflags, out)
  File "/usr/lib/python2.5/doctest.py", line 1261, in __run
    self.report_failure(out, test, example, got)
  File "/usr/lib/python2.5/doctest.py", line 1125, in report_failure
    self._checker.output_difference(example, got, self.optionflags))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in
position 149: ordinal not in range(128)
user@gutsy:~/tmp$ 

This issue is reported in #1293741, but there is no fix or guidance
offered there on how to work around the problem.

I'd appreciate feedback on whether what I've said here is correct.  I'm
currently trying to diagnose/fix problems with use of unicode literals
in some tests and this is as far as I've got. That is, I think I need to
be specifying the enclosing strings as unicode literals, but then I run
into #1293741.  If the conclusion I've reached is correct, then trying
to figure out a fix for that problem should be where I focus my efforts.
 If, however, I shouldn't be specifying the enclosing string as a
unicode literal, then attempting to fix the problem as described here
would perhaps be more useful.  Though I do not know how the doctest code
can know the file's encoding specification?
History
Date User Action Args
2008-08-08 16:14:26kmtraceysetrecipients: + kmtracey, neves
2008-08-08 16:14:26kmtraceysetmessageid: <1218212066.25.0.0863479351419.issue2811@psf.upfronthosting.co.za>
2008-08-08 16:14:24kmtraceylinkissue2811 messages
2008-08-08 16:14:21kmtraceycreate