Issue 5876: __repr__ returning unicode doesn't work when called implicitly

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/50126

classification

Title:	__repr__ returning unicode doesn't work when called implicitly
Type:	behavior	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 2.7

process

Status:	closed	Resolution:	wont fix
Dependencies:		Superseder:
Assigned To:		Nosy List:	Nam.Nguyen, arigo, eric.araujo, ezio.melotti, lemburg, liori, r.david.murray, serhiy.storchaka, vstinner
Priority:	normal	Keywords:	patch

Created on 2009-04-29 11:58 by liori, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
unicode_repr.patch	serhiy.storchaka, 2013-08-23 13:31		review

Messages (17)
msg86798 - (view)	Author: Tomasz Melcer (liori)	Date: 2009-04-29 11:58
Invitation... (Debian Sid, gnome-terminal with pl_PL.UTF8 locales) Python 2.5.4 (r254:67916, Feb 17 2009, 20:16:45) [GCC 4.3.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. Lets create some class... >>> class T(object): ... def __repr__(self): return u'あみご' ... Does its repr() work? >>> T().__repr__() u'\u3042\u307f\u3054' >>> print T().__repr__() あみご But when it is implicitly called, it doesnt?! >>> T() Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128) >>> print T() Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128) Encoding: >>> import sys >>> sys.stdin.encoding 'UTF-8' >>> sys.stdout.encoding 'UTF-8' Workaround for now: >>> class T(object): ... def __repr__(self): return u'あみご'.encode('utf-8') ...
msg86799 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2009-04-29 12:50
This worked in 2.4 and stopped working in 2.5. It's not a problem in 3.x. (2.5 is in security-fix-only mode, so I'm removing it from versions).
msg143541 - (view)	Author: Éric Araujo (eric.araujo) *	Date: 2011-09-05 16:10
I think it’s not an implicit vs. explicit call problem, rather repr vs. str. IIRC, in 2.x it is allowed that __str__ returns a unicode object, and str will convert it to a str. To do that, it will use the default encoding, which is ASCII in 2.5+, so your example cannot work. Ideas for work-arounds: - write a displayhook (http://docs.python.org/dev/library/sys#sys.displayhook) that converts unicode objects using sys.stout.encoding - for 2.6+, test if setting PYTHONIOENCODING changes soemthing
msg143550 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-09-05 16:38
I think that this issue is a duplicate of #4947 which has been fixed in Python 2.7.1. Can you retry with Python 2.7.2 (or 2.7.1)?
msg143553 - (view)	Author: Tomasz Melcer (liori)	Date: 2011-09-05 17:27
Debian SID. No, it wasn't. Python 2.7.2+ (default, Aug 16 2011, 09:23:59) [GCC 4.6.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> class T(object): ... def __repr__(self): return u'あみご' ... >>> T().__repr__() u'\u3042\u307f\u3054' >>> print T().__repr__() あみご >>> T() Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128) >>> print T() Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128) >>> import sys >>> sys.stdin.encoding 'UTF-8' >>> sys.stdout.encoding 'UTF-8'
msg143559 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-09-05 20:07
> Debian SID. No, it wasn't. Oh ok, gotcha: repr() always returns a str string. If obj.__repr__() returns a Unicode string, the string is encoded to the default encoding. By default, the default encoding is ASCII. $ ./python -S Python 2.7.2+ (2.7:85a12278de69, Sep 2 2011, 00:21:57) [GCC 4.6.0 20110603 (Red Hat 4.6.0-10)] on linux2 >>> import sys >>> sys.setdefaultencoding('ISO-8859-1') >>> class A(object): ... def __repr__(self): return u'\xe9' ... >>> repr(A()) '\xe9' Don't do that at home! Change the default encoding is not a good idea. I don't think that repr(obj) can be changed to return Unicode if obj.__repr__() returns Unicode. It is too late to change such thing in Python 2.
msg143632 - (view)	Author: Armin Rigo (arigo) *	Date: 2011-09-06 16:56
A __repr__() that returns unicode can, in CPython 2.7 be used in "%s" % x or in u"%s" % x --- both expressions then return a unicode without doing any encoding --- but it cannot be used anywhere else, e.g. in "%r" % x or in repr(x). See also the PyPy issue https://bugs.pypy.org/issue857 .
msg195967 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-08-23 13:31
In Python 3 ascii() uses the backslashreplace error handler. >>> class T: ... def __repr__(self): ... return '\u20ac\udcff' ... >>> print(ascii(T())) \u20ac\udcff I think using the backslashreplace error handler in repr() in Python 2.7 is good solution. Here is a patch.
msg195970 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-08-23 13:43
This change is going to break backward compatibility. I don't think that it can be done in Python 2.7.x, and there is no Python 2.8 (PEP 404).
msg195985 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-08-23 17:15
How it can break backward compatibility? Currently repr() just raises UnicodeEncodeError. UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128) With patch it always returns 8-bit string. As far as repr() usually used for debugging the second alternative looks more helpful.
msg195986 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-08-23 17:22
> How it can break backward compatibility? Currently repr() just raises UnicodeEncodeError. It depends on sys.getdefaultencoding() which can be modified in the site module (or in a PYTHONSTARTUP script) using sys.setdefaultencoding(). It should not possible to change the default encoding, and it was fixed in Python 3.
msg195993 - (view)	Author: Armin Rigo (arigo) *	Date: 2013-08-23 17:42
@Serhiy: it would certainly break a program that tries to call the repr() and catches the UnicodeEncodeError to do something else, like encode the data differently.
msg195996 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2013-08-23 18:15
.__repr__() is not really allowed to return Unicode objects in Python 2.x. If you do this, you're on your own. The CPython internals try to convert any non-str object to a str object, but this is only done to assure that PyObject_Repr() always returns a str object. I'd suggest closing this as won't fix.
msg196003 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-08-23 18:57
> I'd suggest closing this as won't fix. Agreed, it's time to upgrade to Python 3!
msg196008 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-08-23 19:12
> It depends on sys.getdefaultencoding() which can be modified in the site module (or in a PYTHONSTARTUP script) using sys.setdefaultencoding(). Of course. Every successful without patch repr() will left same with patch. However the patch allows you to see objects which were not repr-able before. repr() itself is used in the formatting of error messages, so it is desirable extend its aplicability as far as possible. > @Serhiy: it would certainly break a program that tries to call the repr() and catches the UnicodeEncodeError to do something else, like encode the data differently. Why it would break? You want encode the data differently.only due non-working repr(), however with proposed patch this will be just not needed. > .__repr__() is not really allowed to return Unicode objects in Python 2.x. If you do this, you're on your own. PyObject_Repr() contains a code which converts unicode to str and raise an exception if __repr__() result is not str or unicode. Unicode __repr__() is expected even if it is not recommended.
msg196014 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2013-08-23 19:22
Serhiy Storchaka wrote: >> .__repr__() is not really allowed to return Unicode objects in Python 2.x. If you do this, you're on your own. > > PyObject_Repr() contains a code which converts unicode to str and raise an exception if __repr__() result is not str or unicode. Unicode __repr__() is expected even if it is not recommended. True, but the code is not intended to support non-ASCII Unicode, otherwise we would have taken care to introduce support for this much earlier in the 2.x series.
msg196065 - (view)	Author: Armin Rigo (arigo) *	Date: 2013-08-24 06:40
@Serhiy: it's a behavior change and as such not an option for a micro release. For example, the following legal code would behave differently: it would compute s = '\\u1234' instead of s = 'UTF8:\xe1\x88\xb4'. try: s = repr(x) except UnicodeEncodeError: s = 'UTF8:' + x.value.encode('utf-8') I think I agree that a working repr() is generally better, but in this case it should point to the programmer that they should rather have __repr__() return something sensible and avoid the trick above...

History
Date	User	Action	Args
2022-04-11 14:56:48	admin	set	github: 50126
2014-07-19 00:56:31	berker.peksag	set	resolution: fixed -> wont fix stage: patch review -> resolved
2013-08-24 06:40:30	arigo	set	messages: + msg196065
2013-08-23 19:22:50	lemburg	set	messages: + msg196014
2013-08-23 19:12:23	serhiy.storchaka	set	messages: + msg196008
2013-08-23 18:57:45	vstinner	set	status: open -> closed resolution: fixed messages: + msg196003
2013-08-23 18:15:07	lemburg	set	messages: + msg195996
2013-08-23 17:42:50	arigo	set	messages: + msg195993
2013-08-23 17:22:54	vstinner	set	messages: + msg195986
2013-08-23 17:15:44	serhiy.storchaka	set	messages: + msg195985
2013-08-23 13:43:43	vstinner	set	messages: + msg195970
2013-08-23 13:31:13	serhiy.storchaka	set	files: + unicode_repr.patch nosy: + serhiy.storchaka messages: + msg195967 keywords: + patch stage: test needed -> patch review
2011-09-06 16:56:40	arigo	set	nosy: + arigo messages: + msg143632
2011-09-05 20:07:15	vstinner	set	messages: + msg143559
2011-09-05 17:27:42	liori	set	messages: + msg143553
2011-09-05 16:38:42	vstinner	set	nosy: + vstinner messages: + msg143550
2011-09-05 16:10:46	eric.araujo	set	nosy: + eric.araujo, lemburg messages: + msg143541 versions: - Python 2.6
2011-09-03 11:55:23	Nam.Nguyen	set	nosy: + Nam.Nguyen
2009-04-29 12:50:14	r.david.murray	set	priority: normal components: + Interpreter Core, - Extension Modules versions: + Python 2.6, Python 2.7, - Python 2.5 nosy: + r.david.murray messages: + msg86799 stage: test needed
2009-04-29 12:01:37	ezio.melotti	set	nosy: + ezio.melotti
2009-04-29 11:58:31	liori	create