classification
Title: __repr__ returning unicode doesn't work when called implicitly
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: Nam.Nguyen, arigo, eric.araujo, ezio.melotti, haypo, lemburg, liori, r.david.murray, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2009-04-29 11:58 by liori, last changed 2014-07-19 00:56 by berker.peksag. This issue is now closed.

Files
File name Uploaded Description Edit
unicode_repr.patch serhiy.storchaka, 2013-08-23 13:31 review
Messages (17)
msg86798 - (view) Author: Tomasz Melcer (liori) Date: 2009-04-29 11:58
Invitation... (Debian Sid, gnome-terminal with pl_PL.UTF8 locales)

Python 2.5.4 (r254:67916, Feb 17 2009, 20:16:45) 
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

Lets create some class...

>>> class T(object):
...     def __repr__(self): return u'あみご'
... 

Does its repr() work?

>>> T().__repr__()
u'\u3042\u307f\u3054'
>>> print T().__repr__()
あみご

But when it is implicitly called, it doesnt?!

>>> T()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-2: ordinal not in range(128)
>>> print T()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-2: ordinal not in range(128)


Encoding:

>>> import sys
>>> sys.stdin.encoding
'UTF-8'
>>> sys.stdout.encoding
'UTF-8'

Workaround for now:

>>> class T(object):
...     def __repr__(self): return u'あみご'.encode('utf-8')
...
msg86799 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-04-29 12:50
This worked in 2.4 and stopped working in 2.5.

It's not a problem in 3.x.

(2.5 is in security-fix-only mode, so I'm removing it from versions).
msg143541 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-09-05 16:10
I think it’s not an implicit vs. explicit call problem, rather repr vs. str.

IIRC, in 2.x it is allowed that __str__ returns a unicode object, and str will convert it to a str.  To do that, it will use the default encoding, which is ASCII in 2.5+, so your example cannot work.

Ideas for work-arounds:
- write a displayhook (http://docs.python.org/dev/library/sys#sys.displayhook) that converts unicode objects using sys.stout.encoding
- for 2.6+, test if setting PYTHONIOENCODING changes soemthing
msg143550 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-09-05 16:38
I think that this issue is a duplicate of #4947 which has been fixed in Python 2.7.1. Can you retry with Python 2.7.2 (or 2.7.1)?
msg143553 - (view) Author: Tomasz Melcer (liori) Date: 2011-09-05 17:27
Debian SID. No, it wasn't.

Python 2.7.2+ (default, Aug 16 2011, 09:23:59) 
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> class T(object):
...     def __repr__(self): return u'あみご'
... 
>>> T().__repr__()
u'\u3042\u307f\u3054'
>>> print T().__repr__()
あみご
>>> T()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
>>> print T()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
>>> import sys
>>> sys.stdin.encoding
'UTF-8'
>>> sys.stdout.encoding
'UTF-8'
msg143559 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-09-05 20:07
> Debian SID. No, it wasn't.

Oh ok, gotcha: repr() always returns a str string. If obj.__repr__() returns a 
Unicode string, the string is encoded to the default encoding. By default, the 
default encoding is ASCII.

$ ./python -S 
Python 2.7.2+ (2.7:85a12278de69, Sep  2 2011, 00:21:57) 
[GCC 4.6.0 20110603 (Red Hat 4.6.0-10)] on linux2
>>> import sys
>>> sys.setdefaultencoding('ISO-8859-1')
>>> class A(object):
...  def __repr__(self): return u'\xe9'
... 
>>> repr(A())
'\xe9'

Don't do that at home! Change the default encoding is not a good idea.

I don't think that repr(obj) can be changed to return Unicode if 
obj.__repr__() returns Unicode. It is too late to change such thing in Python 
2.
msg143632 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2011-09-06 16:56
A __repr__() that returns unicode can, in CPython 2.7 be used in  "%s" % x  or in  u"%s" % x  --- both expressions then return a unicode without doing any encoding --- but it cannot be used anywhere else, e.g. in  "%r" % x  or in  repr(x).  See also the PyPy issue https://bugs.pypy.org/issue857 .
msg195967 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-23 13:31
In Python 3 ascii() uses the backslashreplace error handler.

>>> class T:
...     def __repr__(self):
...         return '\u20ac\udcff'
... 
>>> print(ascii(T()))
\u20ac\udcff

I think using the backslashreplace error handler in repr() in Python 2.7 is good solution. Here is a patch.
msg195970 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-08-23 13:43
This change is going to break backward compatibility. I don't think
that it can be done in Python 2.7.x, and there is no Python 2.8 (PEP
404).
msg195985 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-23 17:15
How it can break backward compatibility? Currently repr() just raises UnicodeEncodeError.

UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)

With patch it always returns 8-bit string. As far as repr() usually used for debugging the second alternative looks more helpful.
msg195986 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-08-23 17:22
> How it can break backward compatibility? Currently repr() just raises UnicodeEncodeError.

It depends on sys.getdefaultencoding() which can be modified in the site module (or in a PYTHONSTARTUP script) using sys.setdefaultencoding(). It should not possible to change the default encoding, and it was fixed in Python 3.
msg195993 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2013-08-23 17:42
@Serhiy: it would certainly break a program that tries to call the repr() and catches the UnicodeEncodeError to do something else, like encode the data differently.
msg195996 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-08-23 18:15
.__repr__() is not really allowed to return Unicode objects in Python 2.x. If you do this, you're on your own. The CPython internals try to convert any non-str object to a str object, but this is only done to assure that PyObject_Repr() always returns a str object.

I'd suggest closing this as won't fix.
msg196003 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-08-23 18:57
> I'd suggest closing this as won't fix.

Agreed, it's time to upgrade to Python 3!
msg196008 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-23 19:12
> It depends on sys.getdefaultencoding() which can be modified in the site module (or in a PYTHONSTARTUP script) using sys.setdefaultencoding().

Of course. Every successful without patch repr() will left same with patch. However the patch allows you to see objects which were not repr-able before. repr() itself is used in the formatting of error messages, so it is desirable extend its aplicability as far as possible.

> @Serhiy: it would certainly break a program that tries to call the repr() and catches the UnicodeEncodeError to do something else, like encode the data differently.

Why it would break? You want encode the data differently.only due non-working repr(), however with proposed patch this will be just not needed.

> .__repr__() is not really allowed to return Unicode objects in Python 2.x. If you do this, you're on your own.

PyObject_Repr() contains a code which converts unicode to str and raise an exception if __repr__() result is not str or unicode. Unicode __repr__() is expected even if it is not recommended.
msg196014 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-08-23 19:22
Serhiy Storchaka wrote:
>> .__repr__() is not really allowed to return Unicode objects in Python 2.x. If you do this, you're on your own.
> 
> PyObject_Repr() contains a code which converts unicode to str and raise an exception if __repr__() result is not str or unicode. Unicode __repr__() is expected even if it is not recommended.

True, but the code is not intended to support non-ASCII Unicode,
otherwise we would have taken care to introduce support for this
much earlier in the 2.x series.
msg196065 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2013-08-24 06:40
@Serhiy: it's a behavior change and as such not an option for a micro release.  For example, the following legal code would behave differently: it would compute s = '\\u1234' instead of s = 'UTF8:\xe1\x88\xb4'.

    try:
        s = repr(x)
    except UnicodeEncodeError:
        s = 'UTF8:' + x.value.encode('utf-8')

I think I agree that a working repr() is generally better, but in this case it should point to the programmer that they should rather have __repr__() return something sensible and avoid the trick above...
History
Date User Action Args
2014-07-19 00:56:31berker.peksagsetresolution: fixed -> wont fix
stage: patch review -> resolved
2013-08-24 06:40:30arigosetmessages: + msg196065
2013-08-23 19:22:50lemburgsetmessages: + msg196014
2013-08-23 19:12:23serhiy.storchakasetmessages: + msg196008
2013-08-23 18:57:45hayposetstatus: open -> closed
resolution: fixed
messages: + msg196003
2013-08-23 18:15:07lemburgsetmessages: + msg195996
2013-08-23 17:42:50arigosetmessages: + msg195993
2013-08-23 17:22:54hayposetmessages: + msg195986
2013-08-23 17:15:44serhiy.storchakasetmessages: + msg195985
2013-08-23 13:43:43hayposetmessages: + msg195970
2013-08-23 13:31:13serhiy.storchakasetfiles: + unicode_repr.patch

nosy: + serhiy.storchaka
messages: + msg195967

keywords: + patch
stage: test needed -> patch review
2011-09-06 16:56:40arigosetnosy: + arigo
messages: + msg143632
2011-09-05 20:07:15hayposetmessages: + msg143559
2011-09-05 17:27:42liorisetmessages: + msg143553
2011-09-05 16:38:42hayposetnosy: + haypo
messages: + msg143550
2011-09-05 16:10:46eric.araujosetnosy: + eric.araujo, lemburg

messages: + msg143541
versions: - Python 2.6
2011-09-03 11:55:23Nam.Nguyensetnosy: + Nam.Nguyen
2009-04-29 12:50:14r.david.murraysetpriority: normal

components: + Interpreter Core, - Extension Modules
versions: + Python 2.6, Python 2.7, - Python 2.5
nosy: + r.david.murray

messages: + msg86799
stage: test needed
2009-04-29 12:01:37ezio.melottisetnosy: + ezio.melotti
2009-04-29 11:58:31lioricreate