New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
__repr__ returning unicode doesn't work when called implicitly #50126
Comments
Invitation... (Debian Sid, gnome-terminal with pl_PL.UTF8 locales) Python 2.5.4 (r254:67916, Feb 17 2009, 20:16:45) Lets create some class... >>> class T(object):
... def __repr__(self): return u'あみご'
... Does its repr() work? >>> T().__repr__()
u'\u3042\u307f\u3054'
>>> print T().__repr__()
あみご But when it is implicitly called, it doesnt?! >>> T()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-2: ordinal not in range(128)
>>> print T()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-2: ordinal not in range(128) Encoding: >>> import sys
>>> sys.stdin.encoding
'UTF-8'
>>> sys.stdout.encoding
'UTF-8' Workaround for now: >>> class T(object):
... def __repr__(self): return u'あみご'.encode('utf-8')
... |
This worked in 2.4 and stopped working in 2.5. It's not a problem in 3.x. (2.5 is in security-fix-only mode, so I'm removing it from versions). |
I think it’s not an implicit vs. explicit call problem, rather repr vs. str. IIRC, in 2.x it is allowed that __str__ returns a unicode object, and str will convert it to a str. To do that, it will use the default encoding, which is ASCII in 2.5+, so your example cannot work. Ideas for work-arounds:
|
I think that this issue is a duplicate of bpo-4947 which has been fixed in Python 2.7.1. Can you retry with Python 2.7.2 (or 2.7.1)? |
Debian SID. No, it wasn't. Python 2.7.2+ (default, Aug 16 2011, 09:23:59)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> class T(object):
... def __repr__(self): return u'あみご'
...
>>> T().__repr__()
u'\u3042\u307f\u3054'
>>> print T().__repr__()
あみご
>>> T()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
>>> print T()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
>>> import sys
>>> sys.stdin.encoding
'UTF-8'
>>> sys.stdout.encoding
'UTF-8' |
Oh ok, gotcha: repr() always returns a str string. If obj.__repr__() returns a $ ./python -S
Python 2.7.2+ (2.7:85a12278de69, Sep 2 2011, 00:21:57)
[GCC 4.6.0 20110603 (Red Hat 4.6.0-10)] on linux2
>>> import sys
>>> sys.setdefaultencoding('ISO-8859-1')
>>> class A(object):
... def __repr__(self): return u'\xe9'
...
>>> repr(A())
'\xe9' Don't do that at home! Change the default encoding is not a good idea. I don't think that repr(obj) can be changed to return Unicode if |
A __repr__() that returns unicode can, in CPython 2.7 be used in "%s" % x or in u"%s" % x --- both expressions then return a unicode without doing any encoding --- but it cannot be used anywhere else, e.g. in "%r" % x or in repr(x). See also the PyPy issue https://bugs.pypy.org/issue857 . |
In Python 3 ascii() uses the backslashreplace error handler. >>> class T:
... def __repr__(self):
... return '\u20ac\udcff'
...
>>> print(ascii(T()))
\u20ac\udcff I think using the backslashreplace error handler in repr() in Python 2.7 is good solution. Here is a patch. |
This change is going to break backward compatibility. I don't think |
How it can break backward compatibility? Currently repr() just raises UnicodeEncodeError. UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128) With patch it always returns 8-bit string. As far as repr() usually used for debugging the second alternative looks more helpful. |
It depends on sys.getdefaultencoding() which can be modified in the site module (or in a PYTHONSTARTUP script) using sys.setdefaultencoding(). It should not possible to change the default encoding, and it was fixed in Python 3. |
@serhiy: it would certainly break a program that tries to call the repr() and catches the UnicodeEncodeError to do something else, like encode the data differently. |
.__repr__() is not really allowed to return Unicode objects in Python 2.x. If you do this, you're on your own. The CPython internals try to convert any non-str object to a str object, but this is only done to assure that PyObject_Repr() always returns a str object. I'd suggest closing this as won't fix. |
Agreed, it's time to upgrade to Python 3! |
Of course. Every successful without patch repr() will left same with patch. However the patch allows you to see objects which were not repr-able before. repr() itself is used in the formatting of error messages, so it is desirable extend its aplicability as far as possible.
Why it would break? You want encode the data differently.only due non-working repr(), however with proposed patch this will be just not needed.
PyObject_Repr() contains a code which converts unicode to str and raise an exception if __repr__() result is not str or unicode. Unicode __repr__() is expected even if it is not recommended. |
Serhiy Storchaka wrote:
True, but the code is not intended to support non-ASCII Unicode, |
@serhiy: it's a behavior change and as such not an option for a micro release. For example, the following legal code would behave differently: it would compute s = '\\u1234' instead of s = 'UTF8:\xe1\x88\xb4'.
I think I agree that a working repr() is generally better, but in this case it should point to the programmer that they should rather have __repr__() return something sensible and avoid the trick above... |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: