Title: 2.7.1 unicode subclasses not calling __str__() for print statement
Type: behavior Stage: resolved
Components: Unicode Versions: Python 2.7
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, nagle, opstad, r.david.murray
Priority: normal Keywords:

Created on 2011-04-21 18:14 by opstad, last changed 2011-12-16 08:28 by amaury.forgeotdarc. This issue is now closed.

Messages (7)
msg134231 - (view) Author: Dave Opstad (opstad) Date: 2011-04-21 18:14
Python 2.7.1 doesn't appear to do the usual implicit call to str() for subclasses of unicode. In the following snippet, I would have expected print myTest and print str(myTest) to behave the same:

>>> class Test(unicode):
...   def __str__(self):
...     print "In __str__"
...     return (u"*** " + self + u" ***").encode('utf-8')
...   def __unicode__(self):
...     print "In __unicode__"
...     return u"*** " + self + u" ***"
>>> myTest = Test(u"abc")
>>> print myTest
>>> print str(myTest)
In __str__
*** abc ***
>>> print unicode(myTest)
In __unicode__
*** abc ***
msg134233 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-21 18:28
You subclassed unicode.  So print printed the value of your unicode object, which didn't need coercion.
msg134235 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-21 18:37
For the record, this isn't as simple as I made it sound.  See, for example, issue 9196.
msg134238 - (view) Author: Dave Opstad (opstad) Date: 2011-04-21 19:22
I guess I was confused by the inconsistency with Python 3, which *does* call the __str__ method, even though, again, no coercion is needed:

Python 3.2 (r32:88452, Feb 20 2011, 10:19:59) 
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> class X(str):
...   def __str__(self):
...     print("In __str__")
...     return "*** " + self + " ***"
>>> x = X("abcde")
>>> print(x)
In __str__
*** abcde ***
>>> print(str(x))
In __str__
*** abcde ***
msg134251 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-21 20:22
Well, it's possible I'm wrong and you've found a bug.  There are numerous differences between 2 and 3 in both string handling and special method handling, though, so it may be hard to pin down.  If you poke around a bit more and still think it is a bug, please reopen.
msg149569 - (view) Author: John Nagle (nagle) Date: 2011-12-15 19:26
This has nothing to do with Python 3.  There's a difference in __str__ handling between Python 2.6 and Python 2.7.2.  It's enough to crash BeautifulSoup:

[Thread-8] Unexpected EXCEPTION while processing page "": global name '__str__' is not defined
[Thread-8] Traceback (most recent call last):
[Thread-8]   File "C:\projects\sitetruth\", line 646, in prettify
[Thread-8]     return self.__str__(encoding, True)
[Thread-8]   File "C:\projects\sitetruth\", line 621, in __str__
[Thread-8]     contents = self.renderContents(encoding, prettyPrint, indentContents)
[Thread-8]   File "C:\projects\sitetruth\", line 656, in renderContents
[Thread-8]     text = c.__str__(encoding)
[Thread-8]   File "C:\projects\sitetruth\", line 415, in __str__
[Thread-8]     return "<!--%s-->" % NavigableString.__str__(self, encoding)
[Thread-8]   File "C:\projects\sitetruth\", line 393, in __unicode__
[Thread-8]     return __str__(self, None)
[Thread-8] NameError: global name '__str__' is not defined

The class method that's failing is simply

class NavigableString(unicode, PageElement):
    def __unicode__(self):
        return __str__(self, None)   #### EXCEPTION RAISED HERE ####

    def __str__(self, encoding=DEFAULT_OUTPUT_ENCODING):
        if encoding:
            return self.encode(encoding)
            return self

Using __str__ in the global namespace is probably wrong, and in a later version of BeautifulSoup, that code is changed to

    def __unicode__(self):
        return str(self).decode(DEFAULT_OUTPUT_ENCODING)

which seems to work.  However, it is a real change from 2.6 to 2.7 that breaks code.
msg149599 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-12-16 08:28
> However, it is a real change from 2.6 to 2.7 that breaks code.

John, this issue is not the same as the one above.  The difference between Python 2.6 and Python 2.7.2 you mention only applies to % formatting.
The change is clearly documented in
"""It’s now possible for a subclass of the built-in unicode type to override the __unicode__() method."""

This is clearly a bug in the application.  There are many ways to break compatibility with bogus code...
Date User Action Args
2011-12-16 08:28:02amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg149599
2011-12-15 19:26:52naglesetnosy: + nagle
messages: + msg149569
2011-04-21 20:22:22r.david.murraysetmessages: + msg134251
2011-04-21 19:22:49opstadsetmessages: + msg134238
2011-04-21 18:37:35r.david.murraysetmessages: + msg134235
2011-04-21 18:28:24r.david.murraysetstatus: open -> closed

nosy: + r.david.murray
messages: + msg134233

resolution: not a bug
stage: resolved
2011-04-21 18:14:26opstadcreate