Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__repr__ returning unicode doesn't work when called implicitly #50126

Closed
liori mannequin opened this issue Apr 29, 2009 · 17 comments
Closed

__repr__ returning unicode doesn't work when called implicitly #50126

liori mannequin opened this issue Apr 29, 2009 · 17 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@liori
Copy link
Mannequin

liori mannequin commented Apr 29, 2009

BPO 5876
Nosy @malemburg, @arigo, @vstinner, @ezio-melotti, @merwok, @bitdancer, @postmasters, @serhiy-storchaka
Files
  • unicode_repr.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2013-08-23.18:57:45.077>
    created_at = <Date 2009-04-29.11:58:31.926>
    labels = ['interpreter-core', 'type-bug']
    title = "__repr__ returning unicode doesn't work when called implicitly"
    updated_at = <Date 2014-07-19.00:56:31.365>
    user = 'https://bugs.python.org/liori'

    bugs.python.org fields:

    activity = <Date 2014-07-19.00:56:31.365>
    actor = 'berker.peksag'
    assignee = 'none'
    closed = True
    closed_date = <Date 2013-08-23.18:57:45.077>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2009-04-29.11:58:31.926>
    creator = 'liori'
    dependencies = []
    files = ['31439']
    hgrepos = []
    issue_num = 5876
    keywords = ['patch']
    message_count = 17.0
    messages = ['86798', '86799', '143541', '143550', '143553', '143559', '143632', '195967', '195970', '195985', '195986', '195993', '195996', '196003', '196008', '196014', '196065']
    nosy_count = 9.0
    nosy_names = ['lemburg', 'arigo', 'vstinner', 'ezio.melotti', 'eric.araujo', 'r.david.murray', 'liori', 'Nam.Nguyen', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'wont fix'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue5876'
    versions = ['Python 2.7']

    @liori
    Copy link
    Mannequin Author

    liori mannequin commented Apr 29, 2009

    Invitation... (Debian Sid, gnome-terminal with pl_PL.UTF8 locales)

    Python 2.5.4 (r254:67916, Feb 17 2009, 20:16:45)
    [GCC 4.3.3] on linux2
    Type "help", "copyright", "credits" or "license" for more information.

    Lets create some class...

    >>> class T(object):
    ...     def __repr__(self): return u'あみご'
    ... 

    Does its repr() work?

    >>> T().__repr__()
    u'\u3042\u307f\u3054'
    >>> print T().__repr__()
    あみご

    But when it is implicitly called, it doesnt?!

    >>> T()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode characters in position
    0-2: ordinal not in range(128)
    >>> print T()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode characters in position
    0-2: ordinal not in range(128)

    Encoding:

    >>> import sys
    >>> sys.stdin.encoding
    'UTF-8'
    >>> sys.stdout.encoding
    'UTF-8'

    Workaround for now:

    >>> class T(object):
    ...     def __repr__(self): return u'あみご'.encode('utf-8')
    ...

    @liori liori mannequin added extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Apr 29, 2009
    @bitdancer
    Copy link
    Member

    This worked in 2.4 and stopped working in 2.5.

    It's not a problem in 3.x.

    (2.5 is in security-fix-only mode, so I'm removing it from versions).

    @bitdancer bitdancer added interpreter-core (Objects, Python, Grammar, and Parser dirs) and removed extension-modules C modules in the Modules dir labels Apr 29, 2009
    @merwok
    Copy link
    Member

    merwok commented Sep 5, 2011

    I think it’s not an implicit vs. explicit call problem, rather repr vs. str.

    IIRC, in 2.x it is allowed that __str__ returns a unicode object, and str will convert it to a str. To do that, it will use the default encoding, which is ASCII in 2.5+, so your example cannot work.

    Ideas for work-arounds:

    @vstinner
    Copy link
    Member

    vstinner commented Sep 5, 2011

    I think that this issue is a duplicate of bpo-4947 which has been fixed in Python 2.7.1. Can you retry with Python 2.7.2 (or 2.7.1)?

    @liori
    Copy link
    Mannequin Author

    liori mannequin commented Sep 5, 2011

    Debian SID. No, it wasn't.

    Python 2.7.2+ (default, Aug 16 2011, 09:23:59) 
    [GCC 4.6.1] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> class T(object):
    ...     def __repr__(self): return u'あみご'
    ... 
    >>> T().__repr__()
    u'\u3042\u307f\u3054'
    >>> print T().__repr__()
    あみご
    >>> T()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
    >>> print T()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
    >>> import sys
    >>> sys.stdin.encoding
    'UTF-8'
    >>> sys.stdout.encoding
    'UTF-8'

    @vstinner
    Copy link
    Member

    vstinner commented Sep 5, 2011

    Debian SID. No, it wasn't.

    Oh ok, gotcha: repr() always returns a str string. If obj.__repr__() returns a
    Unicode string, the string is encoded to the default encoding. By default, the
    default encoding is ASCII.

    $ ./python -S 
    Python 2.7.2+ (2.7:85a12278de69, Sep  2 2011, 00:21:57) 
    [GCC 4.6.0 20110603 (Red Hat 4.6.0-10)] on linux2
    >>> import sys
    >>> sys.setdefaultencoding('ISO-8859-1')
    >>> class A(object):
    ...  def __repr__(self): return u'\xe9'
    ... 
    >>> repr(A())
    '\xe9'

    Don't do that at home! Change the default encoding is not a good idea.

    I don't think that repr(obj) can be changed to return Unicode if
    obj.__repr__() returns Unicode. It is too late to change such thing in Python
    2.

    @arigo
    Copy link
    Mannequin

    arigo mannequin commented Sep 6, 2011

    A __repr__() that returns unicode can, in CPython 2.7 be used in "%s" % x or in u"%s" % x --- both expressions then return a unicode without doing any encoding --- but it cannot be used anywhere else, e.g. in "%r" % x or in repr(x). See also the PyPy issue https://bugs.pypy.org/issue857 .

    @serhiy-storchaka
    Copy link
    Member

    In Python 3 ascii() uses the backslashreplace error handler.

    >>> class T:
    ...     def __repr__(self):
    ...         return '\u20ac\udcff'
    ... 
    >>> print(ascii(T()))
    \u20ac\udcff

    I think using the backslashreplace error handler in repr() in Python 2.7 is good solution. Here is a patch.

    @vstinner
    Copy link
    Member

    This change is going to break backward compatibility. I don't think
    that it can be done in Python 2.7.x, and there is no Python 2.8 (PEP-404).

    @serhiy-storchaka
    Copy link
    Member

    How it can break backward compatibility? Currently repr() just raises UnicodeEncodeError.

    UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)

    With patch it always returns 8-bit string. As far as repr() usually used for debugging the second alternative looks more helpful.

    @vstinner
    Copy link
    Member

    How it can break backward compatibility? Currently repr() just raises UnicodeEncodeError.

    It depends on sys.getdefaultencoding() which can be modified in the site module (or in a PYTHONSTARTUP script) using sys.setdefaultencoding(). It should not possible to change the default encoding, and it was fixed in Python 3.

    @arigo
    Copy link
    Mannequin

    arigo mannequin commented Aug 23, 2013

    @serhiy: it would certainly break a program that tries to call the repr() and catches the UnicodeEncodeError to do something else, like encode the data differently.

    @malemburg
    Copy link
    Member

    .__repr__() is not really allowed to return Unicode objects in Python 2.x. If you do this, you're on your own. The CPython internals try to convert any non-str object to a str object, but this is only done to assure that PyObject_Repr() always returns a str object.

    I'd suggest closing this as won't fix.

    @vstinner
    Copy link
    Member

    I'd suggest closing this as won't fix.

    Agreed, it's time to upgrade to Python 3!

    @serhiy-storchaka
    Copy link
    Member

    It depends on sys.getdefaultencoding() which can be modified in the site module (or in a PYTHONSTARTUP script) using sys.setdefaultencoding().

    Of course. Every successful without patch repr() will left same with patch. However the patch allows you to see objects which were not repr-able before. repr() itself is used in the formatting of error messages, so it is desirable extend its aplicability as far as possible.

    @serhiy: it would certainly break a program that tries to call the repr() and catches the UnicodeEncodeError to do something else, like encode the data differently.

    Why it would break? You want encode the data differently.only due non-working repr(), however with proposed patch this will be just not needed.

    .__repr__() is not really allowed to return Unicode objects in Python 2.x. If you do this, you're on your own.

    PyObject_Repr() contains a code which converts unicode to str and raise an exception if __repr__() result is not str or unicode. Unicode __repr__() is expected even if it is not recommended.

    @malemburg
    Copy link
    Member

    Serhiy Storchaka wrote:

    > .__repr__() is not really allowed to return Unicode objects in Python 2.x. If you do this, you're on your own.

    PyObject_Repr() contains a code which converts unicode to str and raise an exception if __repr__() result is not str or unicode. Unicode __repr__() is expected even if it is not recommended.

    True, but the code is not intended to support non-ASCII Unicode,
    otherwise we would have taken care to introduce support for this
    much earlier in the 2.x series.

    @arigo
    Copy link
    Mannequin

    arigo mannequin commented Aug 24, 2013

    @serhiy: it's a behavior change and as such not an option for a micro release. For example, the following legal code would behave differently: it would compute s = '\\u1234' instead of s = 'UTF8:\xe1\x88\xb4'.

    try:
        s = repr(x)
    except UnicodeEncodeError:
        s = 'UTF8:' + x.value.encode('utf-8')
    

    I think I agree that a working repr() is generally better, but in this case it should point to the programmer that they should rather have __repr__() return something sensible and avoid the trick above...

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants