classification
Title: IDLE hangs while printing instance of Unicode subclass
Type: behavior Stage: resolved
Components: IDLE Versions: Python 3.4, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: ezio.melotti, kbk, mjpieters, ned.deily, python-dev, roger.serwy, serhiy.storchaka, terry.reedy, tim.peters
Priority: normal Keywords: patch

Created on 2013-11-03 04:29 by tim.peters, last changed 2015-03-04 15:00 by mjpieters. This issue is now closed.

Files
File name Uploaded Description Edit
idle_print_unicode_subclass.patch serhiy.storchaka, 2013-11-03 08:19 review
idle_write_string_subclass-2.7.patch serhiy.storchaka, 2013-11-04 08:40 review
idle_write_string_subclass-3.x.patch serhiy.storchaka, 2013-11-04 08:50 review
Messages (17)
msg201991 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2013-11-03 04:29
This showed up on StackOverflow:

http://stackoverflow.com/questions/19749757/print-is-blocking-forever-when-printing-unicode-subclass-instance-from-idle

They were using 32-bit Python 2.7.5 on Windows 7; I reproduced using the same Python on Windows Vista.  To reproduce, open IDLE, and enter

>>> class Foo(unicode):
        pass
>>> foo = Foo('bar')
>>> print foo

IDLE hangs then, and Ctrl+C is ignored.  Stranger, these variants do *not* hang:

>>> foo
>>> print str(foo)
>>> print repr(foo)

Those all work as expected.  Cute :-)

And none of these hang in a DOS-box session.
msg202003 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-11-03 07:27
Win 7, console 2.7.5+, 32  bit, compiled Aug 24, does not have the problem. Idle started with 'import idlelib.idle' does, but only for 'print foo', as Tim reported. When I close the hung process with [X], there is no error message in the console. Installed 64bit 2.7.5 fails with 'print foo' also. I actually used F and f instead of Foo and foo, so it is not name specific. A subclass of str works fine.

Current 3.4a4 Idle works fine. The SO OP also reported that there is no problem is the class is imported from another file.

We need a test on something other than Windows, preferably both mac and linux.
msg202004 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2013-11-03 07:45
It's reproducible on OS X as well with a 32-bit Python 2.7.5 and a 64-bit Python 2.7.6rc1.  However, the example works OK if I start IDLE with no subprocess (-n).
msg202005 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-03 08:19
This patch fixes symptoms.
msg202070 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2013-11-04 00:21
LGTM
msg202072 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2013-11-04 00:28
Do we have a theory for _why_ IDLE goes nuts?  I'd like to know whether the patch is fixing the real problem, or just happens to work in this particular test case ;-)
msg202093 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-11-04 06:55
I am curious too, so I traced through the call chain.

In PyShell.py
1343: PseudoOutputFile.write(s) calls: self.shell.write(s, self.tags)
914: shell is an instance of PyShell and self.tags is 'stdout', 'stderr', or 'console'.
1291: PyShell.write(s,tags) calls:
 OutputWindow.write(self, s, tags, "iomark")
 (where 'iomark' must have been defined elsewhere, and the 'gravity' calls should not matter)

In OutputWindow.py
46: OutputWindow(EditorWindow).write(s,tags,mark='insert') calls: self.text.insert(mark, s, tags)
after trying to encode s if isinstance(s, str). It follows with:
        self.text.see(mark)
        self.text.update()
but if the insert succeeds, these should not care about the source of the inserted chars.

In EditorWindow.py
187: self.text = MultiCallCreator(Text)(text_frame, **text_options)
In MultiCall.py,
304: MultiCallCreator wraps a tk widget in a MultiCall instance that adds event methods but otherwise passes calls to the tk widget.

So PseudoOutputFile(s) becomes tk.Text().insert('iomark', s, 'stdout').
which becomes (lib-tk/tkinter.py, 3050)
  self.tk.call((self._w, 'insert', 'iomark', s) + args)

Tk handles either Latin-1 bytes or BMP unicode. It seems fine with a unicode subclass:
>>> import Tkinter as tk
>>> t = tk.Text()
>>> class F(unicode): pass

>>> f = F('foo')
>>> t.insert('1.0', u'abc', 'stdout') # 'iomark' is not defined
>>> t.insert('1.0', f, 'stdout')
>>> t.get('1.0', 'end')
u'abcfoo\n'

I remain puzzled.
msg202096 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-04 08:40
I suppose this is related to pickling.

I were puzzled why it works with bytearray subclasses. But now I investigated that print() implicitly converts str and bytearray subclasses to str and left unicode subclasses as is. You can reproduce this bug for str and bytearray subclasses if use sys.stdout.write() instead of print().

Here is a patch for 2.7 which fixes the issue for str and bytearray subclasses too. 3.x needs patch too.

>>> class U(unicode): pass

>>> class S(str): pass

>>> class BA(bytearray): pass

>>> import sys
>>> sys.stdout.write(u'\u20ac')
€
>>> sys.stdout.write('\xe2\x82\xac')
€
>>> sys.stdout.write(bytearray('\xe2\x82\xac'))
€
>>> sys.stdout.write(U(u'\u20ac'))
€
>>> sys.stdout.write(S('\xe2\x82\xac'))
€
>>> sys.stdout.write(BA('\xe2\x82\xac'))
€
msg202098 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-04 08:50
And here is a patch for 3.x. Without it following code hangs.

>>> class S(str): pass

>>> import sys
>>> sys.stdout.write('\u20ac')
€1
>>> sys.stdout.write(S('\u20ac'))
€1
msg202102 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2013-11-04 09:53
Pickling for the RPC protocol between the GUI process and the interpreter subprocess, which would explain why there is no problem when running idle -n (no subproces)?
msg205732 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-09 19:29
> Pickling for the RPC protocol between the GUI process and the interpreter subprocess, which would explain why there is no problem when running idle -n (no subproces)?

Yes, it is.

If there are no objections I'll commit these patches.
msg205740 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-12-09 21:10
> [2.7] print() implicitly converts str and bytearray subclasses to str and left unicode subclasses as is.

This strikes me as possibly a bug in print, but even if that were changed, there is still the issue of sys.stdout.write and pickle. While the patch is a great improvement, it changes the behavior of sys.stdout.write(s), which acts like it calls str.__str__(s) rather than str(s) == s.__str__

---
class S(str):
    def __str__(self):
        return 'S: ' + str.__str__(self)

s = S('foo')
print(s, str(s), str.__str__(s))

import sys
sys.stdout.write(s)
---
S: foo S: foo foo
foo

on the console (hang after first line on Idle)

I am testing the patch with str(s) changed to str.__str__(s).
msg205741 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-12-09 21:29
Confirmed that the revised patch for 3.3 fixes the hang and matches the console interpreter output.
msg205775 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-10 07:59
Good suggestion Terry. And for unicode in 2.7 we can use unicode.__getslice__(s, None, None) (because there is no unicode.__unicode__).
msg205776 - (view) Author: Roundup Robot (python-dev) Date: 2013-12-10 08:07
New changeset df9596ca838c by Serhiy Storchaka in branch '2.7':
Issue #19481: print() of unicode, str or bytearray subclass instance in IDLE
http://hg.python.org/cpython/rev/df9596ca838c

New changeset d462b2bf875b by Serhiy Storchaka in branch '3.3':
Issue #19481: print() of string subclass instance in IDLE no more hangs.
http://hg.python.org/cpython/rev/d462b2bf875b

New changeset 1d68ea8148ce by Serhiy Storchaka in branch 'default':
Issue #19481: print() of string subclass instance in IDLE no more hangs.
http://hg.python.org/cpython/rev/1d68ea8148ce
msg237180 - (view) Author: Martijn Pieters (mjpieters) Date: 2015-03-04 14:54
This changes causes printing BeautifulSoup NavigableString objects to fail; the code actually could never work as `unicode.__getslice__` insists on getting passed in integers, not None.

To reproduce, create a new file in IDLE and paste in:

from bs4 import BeautifulSoup
html_doc = """<title>The Dormouse's story</title>""" 
soup = BeautifulSoup(html_doc)
print soup.title.string

Then pick *Run Module* to see:

Traceback (most recent call last):
  File "/private/tmp/test.py", line 4, in <module>
    print soup.title.string
  File "/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/idlelib/PyShell.py", line 1353, in write
    s = unicode.__getslice__(s, None, None)
TypeError: an integer is required

The same error can be induced with:

    unicode.__getslice__(u'', None, None)

while specifying a start and end index (0 and len(s)) should fix this.
msg237182 - (view) Author: Martijn Pieters (mjpieters) Date: 2015-03-04 15:00
Created a new issue: http://bugs.python.org/issue23583
History
Date User Action Args
2015-03-04 15:00:12mjpieterssetmessages: + msg237182
2015-03-04 14:59:58ezio.melottisetnosy: + ezio.melotti
2015-03-04 14:54:38mjpieterssetnosy: + mjpieters
messages: + msg237180
2013-12-10 08:34:56serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2013-12-10 08:07:28python-devsetnosy: + python-dev
messages: + msg205776
2013-12-10 07:59:36serhiy.storchakasetmessages: + msg205775
2013-12-09 21:29:50terry.reedysetmessages: + msg205741
2013-12-09 21:10:40terry.reedysetmessages: + msg205740
2013-12-09 19:29:09serhiy.storchakasetassignee: serhiy.storchaka
messages: + msg205732
2013-11-04 09:53:59ned.deilysetmessages: + msg202102
2013-11-04 08:50:34serhiy.storchakasetfiles: + idle_write_string_subclass-3.x.patch

messages: + msg202098
2013-11-04 08:40:45serhiy.storchakasetfiles: + idle_write_string_subclass-2.7.patch

messages: + msg202096
versions: + Python 3.3, Python 3.4
2013-11-04 06:55:59terry.reedysetmessages: + msg202093
2013-11-04 00:28:28tim.peterssetmessages: + msg202072
2013-11-04 00:21:43ned.deilysetmessages: + msg202070
2013-11-03 08:19:01serhiy.storchakasetfiles: + idle_print_unicode_subclass.patch
keywords: + patch
messages: + msg202005

stage: patch review
2013-11-03 07:45:40ned.deilysetmessages: + msg202004
2013-11-03 07:27:15terry.reedysetnosy: + ned.deily
messages: + msg202003
2013-11-03 06:43:44serhiy.storchakasetnosy: + terry.reedy, kbk, roger.serwy, serhiy.storchaka
type: behavior
2013-11-03 04:29:16tim.peterscreate