classification
Title: Idle shell crash on printing non-BMP unicode character
Type: crash Stage:
Components: IDLE, Tkinter, Unicode Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: asvetlov Nosy List: asvetlov, ezio.melotti, haypo, loewis, ned.deily, python-dev, roger.serwy, terry.reedy, vbr
Priority: normal Keywords: patch

Created on 2012-03-05 12:39 by vbr, last changed 2012-03-31 12:15 by asvetlov. This issue is now closed.

Files
File name Uploaded Description Edit
unicodeerror.diff loewis, 2012-03-06 18:53 review
rpc_marshal_exception.patch roger.serwy, 2012-03-11 21:44 review
unicodeerror_rev1.diff roger.serwy, 2012-03-12 01:39 review
issue14200.patch roger.serwy, 2012-03-14 22:08 review
issue14200_rev1.patch roger.serwy, 2012-03-14 22:36 review
Messages (31)
msg154944 - (view) Author: Vlastimil Brom (vbr) Date: 2012-03-05 12:39
Hi,
while testing python 3.3a1 a bit, especially the new string handling of non-BMP characters, I noticed a problem in Idle in this regard:

Python 3.3.0a1 (default, Mar  4 2012, 17:27:59) [MSC v.1500 32 bit (Intel)] on win32 ... 
[using win XPp SP3 Czech]

>>> got_ahsa = "\N{GOTHIC LETTER AHSA}"
>>> len(got_ahsa)
1
>>> got_ahsa.encode("unicode-escape")
b'\\U00010330'
>>> got_ahsa

[crash - idle shell window closes immediately without any visible error message or traceback]


I realised later, that tkinter probably won't be able to print wide-unicode characters anyway (according to 
http://bugs.python.org/issue12342 ), but Idle should probably just print the exception introduced there, e.g.
ValueError: character U+10330 is above the range (U+0000-U+FFFF) allowed by Tcl

Regards
        vbr
msg154961 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012-03-05 16:33
Hi Vlastimil,

Can you repeat your test case while running IDLE from the command prompt and report the error you see?

    python -m idlelib.idle

IDLE closes suddenly on Windows because IDLE uses pythonw.exe which has no stdout or stderr. When Tkinter encounters an error and tries to write to stderr, an error is raised in the Tkinter eventloop and the eventloop terminates.
msg154965 - (view) Author: Vlastimil Brom (vbr) Date: 2012-03-05 17:44
Hi,
thanks for the pointer, after invoking idle using python.exe, I don't see the crash mentioned in the report:

Python 3.3.0a1 (default, Mar  4 2012, 17:27:59) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> got_ahsa = "\N{GOTHIC LETTER AHSA}"
>>> len(got_ahsa)
1
>>> got_ahsa.encode("unicode-escape")
b'\\U00010330'
>>> got_ahsa

>>> print(got_ahsa)

>>> 


I just get empty line as "answer" but no crash.

The console indeed contains the traceback with the error I expected

   vbr

============

Microsoft Windows XP [Verze 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Python33>python.exe -m idlelib.idle
*** Internal Error: rpc.py:SocketIO.localcall()

 Object: stdout
 Method: <bound method PseudoFile.write of <idlelib.PyShell.PseudoFile object at
 0x01CDDB50>>
 Args: ("'\U00010330'",)

Traceback (most recent call last):
  File "C:\Python33\lib\idlelib\rpc.py", line 188, in localcall
    ret = method(*args, **kwargs)
  File "C:\Python33\lib\idlelib\PyShell.py", line 1244, in write
    self.shell.write(s, self.tags)
  File "C:\Python33\lib\idlelib\PyShell.py", line 1226, in write
    OutputWindow.write(self, s, tags, "iomark")
  File "C:\Python33\lib\idlelib\OutputWindow.py", line 40, in write
    self.text.insert(mark, s, tags)
  File "C:\Python33\lib\idlelib\Percolator.py", line 25, in insert
    self.top.insert(index, chars, tags)
  File "C:\Python33\lib\idlelib\ColorDelegator.py", line 80, in insert
    self.delegate.insert(index, chars, tags)
  File "C:\Python33\lib\idlelib\PyShell.py", line 322, in insert
    UndoDelegator.insert(self, index, chars, tags)
  File "C:\Python33\lib\idlelib\UndoDelegator.py", line 81, in insert
    self.addcmd(InsertCommand(index, chars, tags))
  File "C:\Python33\lib\idlelib\UndoDelegator.py", line 116, in addcmd
    cmd.do(self.delegate)
  File "C:\Python33\lib\idlelib\UndoDelegator.py", line 219, in do
    text.insert(self.index1, self.chars, self.tags)
  File "C:\Python33\lib\idlelib\ColorDelegator.py", line 80, in insert
    self.delegate.insert(index, chars, tags)
  File "C:\Python33\lib\idlelib\WidgetRedirector.py", line 104, in __call__
    return self.tk_call(self.orig_and_operation + args)
ValueError: character U+10330 is above the range (U+0000-U+FFFF) allowed by Tcl
msg154967 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-03-05 18:00
On 3.2.2, Win7, the length is 2 and printing in Idle prints a square, as it usually does for chars it cannot print. I presume Tk recognizes surrogate pairs. Printing to the screen should not raise an exception, so the square would be better. Even better would be to do what the 3.2 and 3.3 Command Prompt Interpreters do, which is to print an evaluable representation:

>>> c
'\U00010330'

I assume that this string is produced by python.exe rather than Windows. If so, neither of the two pythonw processes is currently doing the same conversion. My understanding is that the user pythonw process uses idlelib.rpc.RPCproxy objects to ship i/o calls to the idle pythonw process.

I presume we could find the idle process window .write methods and change lines like
        self.text.insert(mark, s, tags)
to
        try:
            self.text.insert(mark, s, tags)
        except SomeTkError:
            self.text.insert(mark, expand(s), tags)
But it seems to me that the expansion should really be done in C in _tkinter, where the internal .kind attribute of strings is available. 

---
There is also an input crash. On 3.2, I tried to cut the square char and paste it into "ord('')" (both shell and edit window) to see what unicode char it is and IDLE fades away as you describe. That puzzles me, as I am normally able to paste BMP chars into idle without problem. In any case, I presume the problem is not idle-specific and would best be handled in _tkinter. Or does the crash happen in Windows or tcl/tk code before _tkinter ever sees the input?

When I paste the same into the 3.2 or 3.2 interpreter, it is converted to ascii '?'. I presume this is done by Windows Command Prompt before sending anything to python.
msg154996 - (view) Author: Vlastimil Brom (vbr) Date: 2012-03-06 00:39
I'd like to add some further observations to the mentioned issue;
it seems, that the crash is indeed not specific to idle.
In a sample tkinter app, where I just display e.g. chr(66352) in an Entry widget, I also get the same immediate crash via pythonw.exe and the previously mentioned "proper" ValueError without a crash with python.exe.

I also tried to explicitly display surrogate pair, which were used automatically until python 3.2; these can be used in tkinter in 3.3, but there are limitations and discrepancies:

>>> 
>>> got_ahsa = "\N{GOTHIC LETTER AHSA}"
>>> def wide_char_to_surrog_pair(char):
    code_point = ord(char)
    if code_point <= 0xFFFF:
        return char
    else:
        high_surr = (code_point - 0x10000) // 0x400 + 0xD800
        low_surr = (code_point - 0x10000) % 0x400 + 0xDC00
        return chr(high_surr)+chr(low_surr)

>>> ahsa_surrog = wide_char_to_surrog_pair(got_ahsa)
>>> print(ahsa_surrog)
𐌰
>>> repr(ahsa_surrog)
"'_ud800\x00udf30'"
>>> ahsa_surrog
'Pud800 udf30'

[the space in the middle of the last item might be \x00, as it terminates the clipboard content, the rest is copied separately]

the printed square corresponds with the given character and can be used in other programs etc. (whereas in py 3.2, the same value was used for repr and a direct "display" of the string in the interpreter, there are three different formats in py 3.3.

I also noticed that surogate pair is not supported as input for unicodedata.name(...) anymore:
 
>>> import unicodedata
>>> unicodedata.name(ahsa_surrog)
Traceback (most recent call last):
  File "<pyshell#60>", line 1, in <module>
    unicodedata.name(ahsa_surrog)
TypeError: need a single Unicode character as parameter
>>> 

(in 3.2 and probably others it returns the expected 'GOTHIC LETTER AHSA')

(I for my part would think, that e.g. keeping a  bit liberal (but still non-ambiguous) input possibilities for unicodedata wouldn't hurt. Also, if tkinter is not going to support wide unicode natively any time soon, the output conversion using surrogates, which are also understandable for other programs, seems the most usable option in this regard.

Hopefully, this is somehow relevant for the original issue -
I am somehow not sure, whether some parts would be better posted as separate issues, or whether this is the planned and expected behaviour anyway.

regards,
   vbr
msg155004 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-03-06 07:31
Vlastimil: you are mixing issues. Some of your observations are actually correct behaviour; please don't clutter the report with that, but report each separate behavior in a separate report. In Python 3.3, surrogate pairs do *not* substitute for the the actual character, since the internal representation is not UTF-16 anymore.

Also, when you run a Tkinter app in IDLE: while you get a "proper" traceback output, your conclusion that python.exe does not "crash" is incorrect: it crashes just in the very same way that IDLE crashes. Except when run inside IDLE, it is a subprocess that "crashes" (i.e. terminates with a traceback output), not IDLE itself.
msg155009 - (view) Author: Vlastimil Brom (vbr) Date: 2012-03-06 09:46
Sorry for mixing the different problems, these were somehow things I noticed "at once" in the new python version, but I should have noticed the different domains myself.
I still might not understand the term "crash" properly - I just meant to distinguish between a single appropriate exception on an invalid operation (while the app is staying alive and works on next valid input) - as is the case with calling through python.exe, and - on the other hand - the immediate termination on encountering the invalid input, which happens with pythonw.exe.

Now I see, that with pythonw a tk app terminates with the first exception (in general) in py 3.3 and also 3.2 (as opposed to py 2.7, where it just swallows the exception and stays alive, as one would probably expect).

Should this be reported in a separate issue, or is this what remains relevant in *this* report? (Sorry for the confusion.)

vbr
msg155032 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-03-06 18:53
That pythonw suddenly closes is a separate issue: if pythonw attempts to write to stderr, it crashes. To get your example to "run" in pythonw.exe,
try

pythonw.exe Lib\idlelib\idle.py 2> out.txt

I think the behavior of pythonw terminating when it can't write to stderr is actually correct: an exception is raised on attempting to write to stderr, which then can be printed (because there is no stderr).

So the real fault here is the traceback that python.exe reports.

To fix this, I think rpc.py should learn to marshal exceptions back to the subprocess. Then the initial sys.stdout.write should raise a UnicodeError (which it currently doesn't, either). This would get into the displayhook, which would then run use sys_displayhook_unencodable to backslashescape the unsupported character.

I'll attach a patch that at least makes the exception UnicodeEncodeError.
msg155410 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012-03-11 21:44
Attached is a patch to have the rpc marshal exceptions. When used with Martin's patch, IDLE returns 

>>> '\U00010330'
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    '\U00010330'
ValueError: character U+10330 is above the range (U+0000-U+FFFF) allowed by Tcl


Martin: I disagree with the approach of raising a UnicodeEncodeError if IDLE can't render the output of a user's program, especially when the program would otherwise run without error if ran from outside of IDLE.

Would replacing these characters with "?" and documenting this limitation in IDLE's docs be an acceptable solution?
msg155421 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012-03-11 23:29
I made a mistake in msg155410. The results in the message are WITHOUT "unicodeerror.diff" applied. When it is applied, the IDLE shell gives:


>>> '\U00010330'
Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    '\U00010330'
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 1-1: Non-BMP character not supported in Tk
Traceback (most recent call last):
** IDLE Internal Exception: 
  File "idlelib/run.py", line 98, in main
    ret = method(*args, **kwargs)
  File "idlelib/run.py", line 305, in runcode
    print_exception()
  File "idlelib/run.py", line 168, in print_exception
    print(line, end='', file=efile)
  File "idlelib/rpc.py", line 599, in __call__
    value = self.sockio.remotecall(self.oid, self.name, args, kwargs)
  File "idlelib/rpc.py", line 214, in remotecall
    return self.asyncreturn(seq)
  File "idlelib/rpc.py", line 245, in asyncreturn
    return self.decoderesponse(response)
  File "idlelib/rpc.py", line 265, in decoderesponse
    raise what
ValueError: max() arg is an empty sequence

I will need to rework the rpc_marshal_exception patch.
msg155426 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-03-12 01:25
> Martin: I disagree with the approach of raising a UnicodeEncodeError
> if IDLE can't render the output of a user's program, especially when
> the program would otherwise run without error if ran from outside of
> IDLE.

This is really an independent issue, and I'd appreciate if people would
treat it as such. *This* issue is about IDLE crashing, not about how
Tkinter deals with non-BMP characters.

So if the RPC exception marshalling works, and can resolve this issue,
I'll be ready to commit this and close this issue. Opening another issue
dealing with the more general Tk problem would be fine with me.

I don't *quite* understand what you are proposing. If it is that
Tkinter always replaces non-BMP characters in string objects with
question marks, then I'm opposed. Tkinter can't know whether the
replacement is an acceptable loss or not; errors should never pass
silently.

If you are suggesting that IDLE's write function should write
a question mark instead of raising an exception: perhaps, but
a) I'd rather use REPLACEMENT CHARACTER instead of QUESTION MARK
b) I'd really try to find out first whether Tcl unknowingly
    supports UTF-16, at least for rendering.
msg155428 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012-03-12 01:39
Having had some time to work on it, the bug is in the unicodeerror.diff patch. If the string is empty then max(s) will raise a ValueError. This is easy to trigger by generating an exception at the python prompt, like "1/0". 

Attached is a revised version of Martin's patch.
msg155429 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012-03-12 01:44
Martin, I got your message after I submitted the last one.

This issue does involve IDLE crashing, but it's not crashing due to non-BMP characters. That is a side-effect of a bigger issue with pythonw.exe. See Issue13582 for more information.

IDLE's shell output has a gross deficiency due to Tkinter's inability to handle Unicode properly. Why penalize a program for running in IDLE just because IDLE can't write something to the text widget? This is precisely what your approach is doing - making IDLE an even more restricted environment than it needs to be.
msg155789 - (view) Author: Roundup Robot (python-dev) Date: 2012-03-14 20:46
New changeset c06b94c5c609 by Andrew Svetlov in branch 'default':
Issue #14200: Idle shell crash on printing non-BMP unicode character.
http://hg.python.org/cpython/rev/c06b94c5c609
msg155794 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-03-14 21:02
Patch escapes avery non-ascii char while better to escape only non-BMP.

Will be done after #14304
msg155805 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012-03-14 21:56
Andrew, please reopen this issue. Your committed patch does not work if IDLE is not using the subprocess.

    >>> got_ahsa = "\N{GOTHIC LETTER AHSA}"
    >>> got_ahsa
    Traceback (most recent call last):
      File "<pyshell#1>", line 1, in <module>
        got_ahsa
      File "idlelib/PyShell.py", line 1255, in write
        return self.shell.write(s, self.tags)
      File "idlelib/PyShell.py", line 1233, in write
        'Non-BMP character not supported in Tk')
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 1-1: Non-BMP character not supported in Tk

However, it does work when IDLE uses a subprocess.
msg155807 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012-03-14 22:08
Attached is a patch to undo Andrew's and fixes the issue in a simple manner. The tcl_unicode_range.patch from Issue12342 has already been applied, so catching ValueError within IDLE is all that is now needed.
msg155813 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012-03-14 22:36
Attached is a better implementation of the patch. The Percolator which ultimately handles writing to the Text widget should intercept the ValueError due to non-BMP characters. The issue14200_rev1.patch fixes this issue and Issue13153.
msg155817 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-03-14 23:39
Roger, you are missing the difference between calling print() and evaluating expression in python interactive mode.
While later should be unicode escaped the former should to raise error — we need to follow the same way as console python interactive session does.

For the rest I like your simplification. And definitelly IDLE should to work both in subprocess and embedded modes — thank you for that point.

I'll make the final (I hope) patch a bit later.
msg155844 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012-03-15 02:25
Andrew, I do admit that I have a lot to learn about Unicode support in Python, for instance with its error-handling and its corner cases.

On Windows Vista, I do see that print() behaves differently than evaluating the expression. An exception is raised for:
   print('\N{GOTHIC LETTER AHSA}')

On Linux, I see the character print as ? in xterm and as a '?' when evaluated. In gnome-terminal (Ubuntu Mono font) it prints as a box containing the code point in hex. No exception is raised.

I do see your point. The patch I provided always substitutes the unsupported character with its full expansion. Returning to a point earlier raised by Martin, using REPLACEMENT CHARACTER instead would be better. It would make the behavior of IDLE more consistent with xterm and gnome-terminal, although it would cause IDLE to hide errors if the program ran from a Windows console instead of IDLE. 

Given that Windows and Linux (Ubuntu) behave differently, I'd rather let IDLE mimic the behavior of a Linux console than a Windows console.
msg155851 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-03-15 04:20
I consulted with Martin at PyCon sprint and he suggested sulution which I'm following — to split `print` and REPL (read-eval-print loop).

Output passed to print() function encoded with sys.stdout.encoding

UTF has been invented to support any character.
Linux usually setted up to use utf-8 encoding by default (see LANG environment variable). There are no encoding issues with that.

xterm (old enough terminal) which you use cannot print non-BMP characters and replaces it with question marks.
Modern gnome-terminal prints that symbols very well.

Let's return to non-UTF terminal encodings.
If character cannot be encoded Python throws UnicodeEncodeError.
There's example:

andrew@tiktaalik ~/p/cpython> bash -c "LANG=C; ./python"
Python 3.3.0a1+ (qbase qtip tip tk:c3ce8a8e6c9c+, Mar 14 2012, 15:54:55) 
[GCC 4.6.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> '\U00010340'
'\U00010340'
>>> print('\U00010340')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\U00010340' in position 0: ordinal not in range(128)
>>> 

As you can see I have switched LANG to C (alias for ASCII) locale.

Eval printed with unicode escaping but `print` call raises error.
This happens because python's REPL calls sys.displayhook.
You can look at http://docs.python.org/dev/library/sys.html#sys.displayhook details. 
That code escapes unicode if terminal doesn't support it.

The same for Windows, OS X and any other platform.
msg155898 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-03-15 16:06
> On Windows Vista, I do see that print() behaves differently than
> evaluating the expression. An exception is raised for:
> print('\N{GOTHIC LETTER AHSA}')

As is for most other characters not supported in your OEM code
page, e.g. (likely) '\N{GREEK SMALL LETTER ALPHA}'

> On Linux, I see the character print as ? in xterm and as a '?' when
> evaluated. In gnome-terminal (Ubuntu Mono font) it prints as a box
> containing the code point in hex. No exception is raised.

That's because your terminal output encoding is UTF-8. If you change
your locale to C, or any other locale that doesn't cover full Unicode
(e.g. de_DE.ISO-8859-1, if supported on your Linux installation),
you get the same behavior on Linux as you do on Windows.

> Given that Windows and Linux (Ubuntu) behave differently

That's not a given, see above.
msg155922 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012-03-15 18:04
I stand corrected. Thank you for the information.

The behavior of the console depends on its locale. IDLE has no facility for changing the locale of the PyShell window. Should this option be included somewhere?
msg155927 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-03-15 18:23
I think that doesn't make sense.
msg155930 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012-03-15 18:37
The Tkinter Text widget is the output for the IDLE shell and it has the limitation imposed by Tcl/Tk of not handling non-BMP unicode characters. 

Is the following reasonable: The IDLE shell console has a locale of "non-BMP utf8"?

If so, would it be reasonable to add a menu item to switch locales for the shell? This amounts to adding some extra code to OutputWindow's write() to raise encoding errors if the string contains unsupported characters, and possibly replacing characters to work around Tcl/Tk's non-BMP limitation.
msg155931 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-03-15 18:38
> The behavior of the console depends on its locale. IDLE has no
> facility for changing the locale of the PyShell window. Should this
> option be included somewhere?

It may be remotely desirable to be able to set the terminal encoding
in IDLE for debuggging purposes. But it's unrelated to the issue at
hand.
msg155933 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-03-15 18:50
> Is the following reasonable: The IDLE shell console has a locale of
> "non-BMP utf8"?

[BMP utf8]
That's indeed the approach that Andrew and I were discussing. 
Unfortunately, there is no codec for it yet. We were discussing
to add a "utf8bom" encoding to Python. This is a medium-sized
project, though (and again out of scope for this issue).

> If so, would it be reasonable to add a menu item to switch locales
> for the shell? This amounts to adding some extra code to
> OutputWindow's write() to raise encoding errors if the string
> contains unsupported characters, and possibly replacing characters to
> work around Tcl/Tk's non-BMP limitation.

Please open a separate issue for this.
msg155943 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012-03-15 19:23
Martin, you are right. I created a separate issue #14326.

Let me know what I can do to help.
msg156744 - (view) Author: Roundup Robot (python-dev) Date: 2012-03-25 08:44
New changeset 89878808f4ce by Andrew Svetlov in branch 'default':
Issue #14200 — now displayhook for IDLE works in non-subprocess mode as well as subprecess.
http://hg.python.org/cpython/rev/89878808f4ce
msg156767 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-03-25 20:07
After experiments with non-BMP characters I figured out:
— non-bmp symbols processed by Tk text widgets (Entry, Text etc.) differently. For example Entry can display non-bmp with spaces after glyph, Text reduces symbol to BMP. Editing is also weird.
— looks like tk event loop passes input of non-bmp directly to tkinter as is.

Obviously Tk does not support non-BMP chars by spec while not rejects ones strictly. Details are implementation specific and depends not only from Tcl/Tk version but from concrete widget class. 

After that my position is: 
— implement utf8-bmp codec
— first implementation of utf8-bmp can be done with pure python using utf-8 codec and checks. This way is simple enough while has potential performance degradation. Doesn't matter if codec will be used only for converting relative short strings in Tk widgets.
— use it in _tkinter AsObj/FromObj functions with 'replace' mode.
— my approach is a bit incompatible in dark corner matter of non-BMP chars (not supported but silently passed to low-level platform API with weird transitions on the way). I think this is not a problem at all. 
— with utf-8-bmp codec IDLE still can use 'strict' mode in .write function (`print` and displayhook I mean) to keep current behavior or use escaping for displayhook and 'replace' for regular `print`. In implementation of #14326 we can use directly specified encoding for `print` as well.

I experimented with Ubuntu box but pretty sure — the same result can be reproduced on OS X and Windows as well. Also we need to make Tk to be crossplatform — so replacing non-BMP is not bad but it is good solution until Tcl/Tk will process non-bmp in native manner.
msg157182 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-03-31 12:15
Closing again. Now IDLE works fine both in subprocess and inprocess mode. 

Future support of non-BMP can be continues after implementing codec for that — #14304

Now I like to close that as «good enough for now».
At least IDLE doesn't crashed on printing anything.
History
Date User Action Args
2012-03-31 12:15:42asvetlovsetstatus: open -> closed
resolution: fixed
messages: + msg157182
2012-03-25 20:07:10asvetlovsetmessages: + msg156767
2012-03-25 08:44:41python-devsetmessages: + msg156744
2012-03-15 19:23:24roger.serwysetmessages: + msg155943
2012-03-15 18:50:29loewissetmessages: + msg155933
2012-03-15 18:38:17loewissetmessages: + msg155931
2012-03-15 18:37:56roger.serwysetmessages: + msg155930
2012-03-15 18:23:23asvetlovsetmessages: + msg155927
2012-03-15 18:04:59roger.serwysetmessages: + msg155922
2012-03-15 16:06:26loewissetmessages: + msg155898
2012-03-15 04:20:02asvetlovsetmessages: + msg155851
2012-03-15 02:25:21roger.serwysetmessages: + msg155844
2012-03-14 23:39:18asvetlovsetassignee: asvetlov
resolution: fixed -> (no value)
messages: + msg155817
2012-03-14 22:36:59roger.serwysetstatus: closed -> open
files: + issue14200_rev1.patch
messages: + msg155813
2012-03-14 22:08:47roger.serwysetfiles: + issue14200.patch

messages: + msg155807
2012-03-14 21:56:37roger.serwysetmessages: + msg155805
2012-03-14 21:48:11asvetlovlinkissue12342 superseder
2012-03-14 21:03:01asvetlovsetstatus: open -> closed
2012-03-14 21:02:49asvetlovsetresolution: fixed
messages: + msg155794
2012-03-14 20:46:18python-devsetmessages: + msg155789
2012-03-13 21:32:57asvetlovsetnosy: + asvetlov
2012-03-12 01:44:41roger.serwysetmessages: + msg155429
2012-03-12 01:39:22roger.serwysetfiles: + unicodeerror_rev1.diff

messages: + msg155428
2012-03-12 01:25:41loewissetmessages: + msg155426
2012-03-11 23:29:37roger.serwysetmessages: + msg155421
2012-03-11 21:44:05roger.serwysetfiles: + rpc_marshal_exception.patch

messages: + msg155410
2012-03-06 18:53:24loewissetfiles: + unicodeerror.diff
keywords: + patch
messages: + msg155032
2012-03-06 09:46:39vbrsetmessages: + msg155009
2012-03-06 07:45:36loewissetmessages: - msg155006
2012-03-06 07:45:25loewissetmessages: - msg155005
2012-03-06 07:43:57loewissetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg155006
2012-03-06 07:42:50loewissetstatus: open -> closed
resolution: fixed
2012-03-06 07:42:24python-devsetnosy: + python-dev
messages: + msg155005
2012-03-06 07:31:56loewissetmessages: + msg155004
2012-03-06 00:39:10vbrsetmessages: + msg154996
2012-03-05 18:00:15terry.reedysetmessages: + msg154967
2012-03-05 17:44:12vbrsetmessages: + msg154965
2012-03-05 16:33:28roger.serwysetmessages: + msg154961
2012-03-05 15:20:44terry.reedysetnosy: + roger.serwy
2012-03-05 12:58:05ezio.melottisetnosy: + loewis, terry.reedy, haypo, ned.deily
type: crash
2012-03-05 12:39:35vbrcreate