Issue3955
Created on 2008-09-24 12:37 by mark, last changed 2009-07-01 20:25 by christoph.
| File name |
Uploaded |
Description |
Edit |
Remove |
|
test.py
|
christoph,
2009-06-30 15:03
|
Test case revealing Unicode literal weakness |
|
|
|
msg73710 - (view) |
Author: Mark Summerfield (mark) |
Date: 2008-09-24 12:37 |
|
# This program works fine with Python 2.5 and 2.6:
def f():
"""
>>> f()
'xyz'
"""
return "xyz"
if __name__ == "__main__":
import doctest
doctest.testmod()
But if you put the statement "from __future__ import unicode_literals"
at the start then it fails:
File "/tmp/test.py", line 5, in __main__.f
Failed example:
f()
Expected:
'xyz'
Got:
u'xyz'
I don't know if it is a bug or a feature but I didn't see any mention of
it in the bugs or docs so thought I'd mention it.
|
|
msg73728 - (view) |
Author: Georg Brandl (georg.brandl) |
Date: 2008-09-24 16:29 |
|
It certainly isn't a feature. I don't immediately see how to fix it,
though. unicode_literals doesn't change the repr() of unicode objects
(it obviously can't, since that change would not be module-local).
Let's try to get a comment from Uncle Timmy...
|
|
msg89874 - (view) |
Author: Christoph Burgmer (christoph) |
Date: 2009-06-29 19:19 |
|
OutputChecker.check_output() seems to be responsible for comparing
'example.want' and 'got' literals and this is obviously done literally.
So as "u'1'" is different to "'1'" this is reflected in the result.
This gets more complicated with literals like "[u'1', u'2']" I believe.
So, eval() could be used for testing for equality:
>>> repr(['1', '2']) == repr([u'1', u'2'])
False
but
>>> eval(repr(['1', '2'])) == eval(repr([u'1', u'2']))
True
doctests are already compiled and executed, but evaluating the doctest
code's result is probably a security issue, so a method doing the
invers of repr() could be used, that only works on variables; something
like Pickle, but without its own protocol.
|
|
msg89927 - (view) |
Author: Christoph Burgmer (christoph) |
Date: 2009-06-30 15:03 |
|
This problem seems more severe as the appended test case shows.
That gives me:
Expected:
u'ī'
Got:
u'\u012b'
Both literals are the same.
Unicode literals in doc strings are not treated as other escaped
characters:
>>> repr(r'\n')
"'\\\\n'"
>>> repr('\n')
"'\\n'"
but:
>>> repr(ur'\u012b')
"u'\\u012b'"
>>> repr(u'\u012b')
"u'\\u012b'"
So there is no work around in the docstring's reference itself.
I file this here, even though the problems are not strictly equal. I do
believe though that there is or should be a common solution to these
issues. Both results need to be interpreted on a more abstract scale.
|
|
msg89997 - (view) |
Author: Christoph Burgmer (christoph) |
Date: 2009-07-01 20:25 |
|
JFTR: To yield the results of my last comment, you need to apply the
patch posted in http://bugs.python.org/issue1293741
|
|
| Date |
User |
Action |
Args |
| 2009-07-01 20:25:04 | christoph | set | messages:
+ msg89997 |
| 2009-06-30 15:03:19 | christoph | set | files:
+ test.py
messages:
+ msg89927 |
| 2009-06-29 19:19:40 | christoph | set | nosy:
+ christoph messages:
+ msg89874
|
| 2008-09-24 16:29:09 | georg.brandl | set | assignee: tim_one messages:
+ msg73728 nosy:
+ georg.brandl, tim_one |
| 2008-09-24 12:37:20 | mark | create | |
|