classification
Title: __repr__ returning unicode doesn't work when called implicitly
Type: behavior Stage: test needed
Components: Interpreter Core Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Nam.Nguyen, arigo, eric.araujo, ezio.melotti, haypo, lemburg, liori, r.david.murray
Priority: normal Keywords:

Created on 2009-04-29 11:58 by liori, last changed 2011-09-06 16:56 by arigo.

Messages (7)
msg86798 - (view) Author: Tomasz Melcer (liori) Date: 2009-04-29 11:58
Invitation... (Debian Sid, gnome-terminal with pl_PL.UTF8 locales)

Python 2.5.4 (r254:67916, Feb 17 2009, 20:16:45) 
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

Lets create some class...

>>> class T(object):
...     def __repr__(self): return u'あみご'
... 

Does its repr() work?

>>> T().__repr__()
u'\u3042\u307f\u3054'
>>> print T().__repr__()
あみご

But when it is implicitly called, it doesnt?!

>>> T()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-2: ordinal not in range(128)
>>> print T()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-2: ordinal not in range(128)


Encoding:

>>> import sys
>>> sys.stdin.encoding
'UTF-8'
>>> sys.stdout.encoding
'UTF-8'

Workaround for now:

>>> class T(object):
...     def __repr__(self): return u'あみご'.encode('utf-8')
...
msg86799 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-04-29 12:50
This worked in 2.4 and stopped working in 2.5.

It's not a problem in 3.x.

(2.5 is in security-fix-only mode, so I'm removing it from versions).
msg143541 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-09-05 16:10
I think it’s not an implicit vs. explicit call problem, rather repr vs. str.

IIRC, in 2.x it is allowed that __str__ returns a unicode object, and str will convert it to a str.  To do that, it will use the default encoding, which is ASCII in 2.5+, so your example cannot work.

Ideas for work-arounds:
- write a displayhook (http://docs.python.org/dev/library/sys#sys.displayhook) that converts unicode objects using sys.stout.encoding
- for 2.6+, test if setting PYTHONIOENCODING changes soemthing
msg143550 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-09-05 16:38
I think that this issue is a duplicate of #4947 which has been fixed in Python 2.7.1. Can you retry with Python 2.7.2 (or 2.7.1)?
msg143553 - (view) Author: Tomasz Melcer (liori) Date: 2011-09-05 17:27
Debian SID. No, it wasn't.

Python 2.7.2+ (default, Aug 16 2011, 09:23:59) 
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> class T(object):
...     def __repr__(self): return u'あみご'
... 
>>> T().__repr__()
u'\u3042\u307f\u3054'
>>> print T().__repr__()
あみご
>>> T()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
>>> print T()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
>>> import sys
>>> sys.stdin.encoding
'UTF-8'
>>> sys.stdout.encoding
'UTF-8'
msg143559 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-09-05 20:07
> Debian SID. No, it wasn't.

Oh ok, gotcha: repr() always returns a str string. If obj.__repr__() returns a 
Unicode string, the string is encoded to the default encoding. By default, the 
default encoding is ASCII.

$ ./python -S 
Python 2.7.2+ (2.7:85a12278de69, Sep  2 2011, 00:21:57) 
[GCC 4.6.0 20110603 (Red Hat 4.6.0-10)] on linux2
>>> import sys
>>> sys.setdefaultencoding('ISO-8859-1')
>>> class A(object):
...  def __repr__(self): return u'\xe9'
... 
>>> repr(A())
'\xe9'

Don't do that at home! Change the default encoding is not a good idea.

I don't think that repr(obj) can be changed to return Unicode if 
obj.__repr__() returns Unicode. It is too late to change such thing in Python 
2.
msg143632 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2011-09-06 16:56
A __repr__() that returns unicode can, in CPython 2.7 be used in  "%s" % x  or in  u"%s" % x  --- both expressions then return a unicode without doing any encoding --- but it cannot be used anywhere else, e.g. in  "%r" % x  or in  repr(x).  See also the PyPy issue https://bugs.pypy.org/issue857 .
History
Date User Action Args
2011-09-06 16:56:40arigosetnosy: + arigo
messages: + msg143632
2011-09-05 20:07:15hayposetmessages: + msg143559
2011-09-05 17:27:42liorisetmessages: + msg143553
2011-09-05 16:38:42hayposetnosy: + haypo
messages: + msg143550
2011-09-05 16:10:46eric.araujosetnosy: + eric.araujo, lemburg

messages: + msg143541
versions: - Python 2.6
2011-09-03 11:55:23Nam.Nguyensetnosy: + Nam.Nguyen
2009-04-29 12:50:14r.david.murraysetpriority: normal

components: + Interpreter Core, - Extension Modules
versions: + Python 2.6, Python 2.7, - Python 2.5
nosy: + r.david.murray

messages: + msg86799
stage: test needed
2009-04-29 12:01:37ezio.melottisetnosy: + ezio.melotti
2009-04-29 11:58:31lioricreate