repr returning unicode doesn't work when called implicitly #50126

liori · 2009-04-29T11:58:32Z

BPO	5876
Nosy	@malemburg, @arigo, @vstinner, @ezio-melotti, @merwok, @bitdancer, @postmasters, @serhiy-storchaka
Files	unicode_repr.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2013-08-23.18:57:45.077>
created_at = <Date 2009-04-29.11:58:31.926>
labels = ['interpreter-core', 'type-bug']
title = "__repr__ returning unicode doesn't work when called implicitly"
updated_at = <Date 2014-07-19.00:56:31.365>
user = 'https://bugs.python.org/liori'

bugs.python.org fields:

activity = <Date 2014-07-19.00:56:31.365>
actor = 'berker.peksag'
assignee = 'none'
closed = True
closed_date = <Date 2013-08-23.18:57:45.077>
closer = 'vstinner'
components = ['Interpreter Core']
creation = <Date 2009-04-29.11:58:31.926>
creator = 'liori'
dependencies = []
files = ['31439']
hgrepos = []
issue_num = 5876
keywords = ['patch']
message_count = 17.0
messages = ['86798', '86799', '143541', '143550', '143553', '143559', '143632', '195967', '195970', '195985', '195986', '195993', '195996', '196003', '196008', '196014', '196065']
nosy_count = 9.0
nosy_names = ['lemburg', 'arigo', 'vstinner', 'ezio.melotti', 'eric.araujo', 'r.david.murray', 'liori', 'Nam.Nguyen', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'wont fix'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue5876'
versions = ['Python 2.7']

liori · 2009-04-29T11:58:30Z

Invitation... (Debian Sid, gnome-terminal with pl_PL.UTF8 locales)

Python 2.5.4 (r254:67916, Feb 17 2009, 20:16:45)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

Lets create some class...

>>> class T(object):
...     def __repr__(self): return u'あみご'
...

Does its repr() work?

>>> T().__repr__()
u'\u3042\u307f\u3054'
>>> print T().__repr__()
あみご

But when it is implicitly called, it doesnt?!

>>> T()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-2: ordinal not in range(128)
>>> print T()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-2: ordinal not in range(128)

Encoding:

>>> import sys
>>> sys.stdin.encoding
'UTF-8'
>>> sys.stdout.encoding
'UTF-8'

Workaround for now:

>>> class T(object):
...     def __repr__(self): return u'あみご'.encode('utf-8')
...

bitdancer · 2009-04-29T12:50:14Z

This worked in 2.4 and stopped working in 2.5.

It's not a problem in 3.x.

(2.5 is in security-fix-only mode, so I'm removing it from versions).

merwok · 2011-09-05T16:10:46Z

I think it’s not an implicit vs. explicit call problem, rather repr vs. str.

IIRC, in 2.x it is allowed that __str__ returns a unicode object, and str will convert it to a str. To do that, it will use the default encoding, which is ASCII in 2.5+, so your example cannot work.

Ideas for work-arounds:

write a displayhook (http://docs.python.org/dev/library/sys#sys.displayhook) that converts unicode objects using sys.stout.encoding
for 2.6+, test if setting PYTHONIOENCODING changes soemthing

vstinner · 2011-09-05T16:38:42Z

I think that this issue is a duplicate of bpo-4947 which has been fixed in Python 2.7.1. Can you retry with Python 2.7.2 (or 2.7.1)?

liori · 2011-09-05T17:27:42Z

Debian SID. No, it wasn't.

Python 2.7.2+ (default, Aug 16 2011, 09:23:59) 
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> class T(object):
...     def __repr__(self): return u'あみご'
... 
>>> T().__repr__()
u'\u3042\u307f\u3054'
>>> print T().__repr__()
あみご
>>> T()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
>>> print T()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
>>> import sys
>>> sys.stdin.encoding
'UTF-8'
>>> sys.stdout.encoding
'UTF-8'

vstinner · 2011-09-05T20:07:15Z

Debian SID. No, it wasn't.

Oh ok, gotcha: repr() always returns a str string. If obj.__repr__() returns a
Unicode string, the string is encoded to the default encoding. By default, the
default encoding is ASCII.

$ ./python -S 
Python 2.7.2+ (2.7:85a12278de69, Sep  2 2011, 00:21:57) 
[GCC 4.6.0 20110603 (Red Hat 4.6.0-10)] on linux2
>>> import sys
>>> sys.setdefaultencoding('ISO-8859-1')
>>> class A(object):
...  def __repr__(self): return u'\xe9'
... 
>>> repr(A())
'\xe9'

Don't do that at home! Change the default encoding is not a good idea.

I don't think that repr(obj) can be changed to return Unicode if
obj.__repr__() returns Unicode. It is too late to change such thing in Python
2.

arigo · 2011-09-06T16:56:41Z

A __repr__() that returns unicode can, in CPython 2.7 be used in "%s" % x or in u"%s" % x --- both expressions then return a unicode without doing any encoding --- but it cannot be used anywhere else, e.g. in "%r" % x or in repr(x). See also the PyPy issue https://bugs.pypy.org/issue857 .

serhiy-storchaka · 2013-08-23T13:31:13Z

In Python 3 ascii() uses the backslashreplace error handler.

>>> class T:
...     def __repr__(self):
...         return '\u20ac\udcff'
... 
>>> print(ascii(T()))
\u20ac\udcff

I think using the backslashreplace error handler in repr() in Python 2.7 is good solution. Here is a patch.

vstinner · 2013-08-23T13:43:44Z

This change is going to break backward compatibility. I don't think
that it can be done in Python 2.7.x, and there is no Python 2.8 (PEP-404).

serhiy-storchaka · 2013-08-23T17:15:44Z

How it can break backward compatibility? Currently repr() just raises UnicodeEncodeError.

UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)

With patch it always returns 8-bit string. As far as repr() usually used for debugging the second alternative looks more helpful.

vstinner · 2013-08-23T17:22:55Z

How it can break backward compatibility? Currently repr() just raises UnicodeEncodeError.

It depends on sys.getdefaultencoding() which can be modified in the site module (or in a PYTHONSTARTUP script) using sys.setdefaultencoding(). It should not possible to change the default encoding, and it was fixed in Python 3.

arigo · 2013-08-23T17:42:51Z

@serhiy: it would certainly break a program that tries to call the repr() and catches the UnicodeEncodeError to do something else, like encode the data differently.

malemburg · 2013-08-23T18:15:08Z

.__repr__() is not really allowed to return Unicode objects in Python 2.x. If you do this, you're on your own. The CPython internals try to convert any non-str object to a str object, but this is only done to assure that PyObject_Repr() always returns a str object.

I'd suggest closing this as won't fix.

vstinner · 2013-08-23T18:57:45Z

I'd suggest closing this as won't fix.

Agreed, it's time to upgrade to Python 3!

serhiy-storchaka · 2013-08-23T19:12:24Z

It depends on sys.getdefaultencoding() which can be modified in the site module (or in a PYTHONSTARTUP script) using sys.setdefaultencoding().

Of course. Every successful without patch repr() will left same with patch. However the patch allows you to see objects which were not repr-able before. repr() itself is used in the formatting of error messages, so it is desirable extend its aplicability as far as possible.

@serhiy: it would certainly break a program that tries to call the repr() and catches the UnicodeEncodeError to do something else, like encode the data differently.

Why it would break? You want encode the data differently.only due non-working repr(), however with proposed patch this will be just not needed.

.__repr__() is not really allowed to return Unicode objects in Python 2.x. If you do this, you're on your own.

PyObject_Repr() contains a code which converts unicode to str and raise an exception if __repr__() result is not str or unicode. Unicode __repr__() is expected even if it is not recommended.

malemburg · 2013-08-23T19:22:50Z

Serhiy Storchaka wrote:

> .__repr__() is not really allowed to return Unicode objects in Python 2.x. If you do this, you're on your own.

PyObject_Repr() contains a code which converts unicode to str and raise an exception if __repr__() result is not str or unicode. Unicode __repr__() is expected even if it is not recommended.

True, but the code is not intended to support non-ASCII Unicode,
otherwise we would have taken care to introduce support for this
much earlier in the 2.x series.

arigo · 2013-08-24T06:40:30Z

@serhiy: it's a behavior change and as such not an option for a micro release. For example, the following legal code would behave differently: it would compute s = '\\u1234' instead of s = 'UTF8:\xe1\x88\xb4'.

try:
    s = repr(x)
except UnicodeEncodeError:
    s = 'UTF8:' + x.value.encode('utf-8')

I think I agree that a working repr() is generally better, but in this case it should point to the programmer that they should rather have __repr__() return something sensible and avoid the trick above...

liori mannequin added extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Apr 29, 2009

bitdancer added interpreter-core (Objects, Python, Grammar, and Parser dirs) and removed extension-modules C modules in the Modules dir labels Apr 29, 2009

vstinner closed this as completed Aug 23, 2013

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

repr returning unicode doesn't work when called implicitly #50126

repr returning unicode doesn't work when called implicitly #50126

liori mannequin commented Apr 29, 2009

liori mannequin commented Apr 29, 2009

bitdancer commented Apr 29, 2009

merwok commented Sep 5, 2011

vstinner commented Sep 5, 2011

liori mannequin commented Sep 5, 2011

vstinner commented Sep 5, 2011

arigo mannequin commented Sep 6, 2011

serhiy-storchaka commented Aug 23, 2013

vstinner commented Aug 23, 2013

serhiy-storchaka commented Aug 23, 2013

vstinner commented Aug 23, 2013

arigo mannequin commented Aug 23, 2013

malemburg commented Aug 23, 2013

vstinner commented Aug 23, 2013

serhiy-storchaka commented Aug 23, 2013

malemburg commented Aug 23, 2013

arigo mannequin commented Aug 24, 2013

__repr__ returning unicode doesn't work when called implicitly #50126

__repr__ returning unicode doesn't work when called implicitly #50126

Comments

liori mannequin commented Apr 29, 2009

liori mannequin commented Apr 29, 2009

bitdancer commented Apr 29, 2009

merwok commented Sep 5, 2011

vstinner commented Sep 5, 2011

liori mannequin commented Sep 5, 2011

vstinner commented Sep 5, 2011

arigo mannequin commented Sep 6, 2011

serhiy-storchaka commented Aug 23, 2013

vstinner commented Aug 23, 2013

serhiy-storchaka commented Aug 23, 2013

vstinner commented Aug 23, 2013

arigo mannequin commented Aug 23, 2013

malemburg commented Aug 23, 2013

vstinner commented Aug 23, 2013

serhiy-storchaka commented Aug 23, 2013

malemburg commented Aug 23, 2013

arigo mannequin commented Aug 24, 2013

repr returning unicode doesn't work when called implicitly #50126

repr returning unicode doesn't work when called implicitly #50126