classification
Title: IDLE shows traceback when printing non-BMP character
Type: behavior Stage: resolved
Components: IDLE Versions: Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: terry.reedy Nosy List: THRlWiTi, belopolsky, martin.panter, serhiy.storchaka, terry.reedy
Priority: normal Keywords:

Created on 2014-10-27 16:18 by belopolsky, last changed 2019-10-08 12:05 by serhiy.storchaka. This issue is now closed.

Messages (8)
msg230078 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-10-27 16:18
>>> print("\N{ROCKET}")
Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    print("\N{ROCKET}")
  File "idlelib/PyShell.py", line 1352, in write
    return self.shell.write(s, self.tags)
UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001f680' in position 0: Non-BMP character not supported in Tk

Shouldn't IDLE replace non-encodable characters with "\uFFFD"?

I think

>>> "\N{ROCKET}"
�

is user-friendlier than the traceback.

See also #14304.
msg230416 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-11-01 00:36
I think Idle should consistently display astral chars with their \U escape.  It sometimes does, just not always.

>>> s='\U0001f680'
>>> s
'\U0001f680'
>>> str(s)
'\U0001f680'
>>> repr(s)
"'\U0001f680'"
>>> print(s) # gives error above.
>>> print(str(s))  #ditto

I thought that implicit print of expression and overt print of the same expression were supposed to be the same.

#21084 is also about this general issue.
msg340675 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-04-22 19:05
On my puzzlement above: repr(s) is a string of 3 characters -- s bracketed by quote characters.  print(repr(s)) fails.  I am not sure how s gets expanded to the full escape in IDLE.  ascii(s) expands all non-ascii and adds extra quotes.  Need to check Shell code.

In the python REPL, astral chars are not expanded to escape sequences.

>>> s='\U0001f603'
>>> s
'😃'  # Windows REPL shows two replacement boxes instead of 😃


#36698 is about astral chars in exceptions messages.

>>> raise Exception(s)

results in the Exception traceback, 3 Unicodedecode tracebacks, and a restart.
msg340820 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2019-04-25 00:55
I haven’t looked at the code, but I suspect Idle implements a custom “sys.displayhook”:

>>> help(sys.displayhook)
Help on function displayhook in module idlelib.rpc:

displayhook(value)
    Override standard display hook to use non-locale encoding

>>> sys.displayhook('\N{ROCKET}')
'\U0001f680'
>>> sys.__displayhook__('\N{ROCKET}')
Traceback (most recent call last):
  File "<pyshell#20>", line 1, in <module>
    sys.__displayhook__('\N{ROCKET}')
  File "/usr/lib/python3.5/idlelib/PyShell.py", line 1344, in write
    return self.shell.write(s, self.tags)
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 1-1: Non-BMP character not supported in Tk
msg353926 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-10-04 11:46
Fixed by PR 16545 (see issue13153).
msg353931 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-10-04 11:59
It was fixed for all valid Unicode characters, you can still get an error when print a surrogate character to the stderr on Linux:

>>> import sys
>>> print('\ud800', file=sys.stderr)
Traceback (most recent call last):
  File "<pyshell#4>", line 1, in <module>
    print('\ud800', file=sys.stderr)
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed

In the Python REPL you get an escaped sequence.

>>> import sys
>>> print('\ud800', file=sys.stderr)
\ud800
msg353963 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-10-04 17:55
Printing the unquoted escape representation rather than a replacement char is a bit strange and not what I expect from the python docs.  I could see it as a bug.  In any case, on Windows, it is the Python REPL that raises, but only for sys.stdout.

>>> import sys
>>> print('\ud800', file=sys.stderr)
\ud800
>>> print('\ud800', file=sys.stdout)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed

whereas on Windows the surrogate is displayed as a box with diagonal lines ([X] compressed in one char) in both cases.  When copied and pasted into FireFox, the pasted surrogate shows as a square box containing mini D 8 0 0 chars.
>>> print('\ud800', file=sys.stdout)
�
>>> print('\ud800', file=sys.stderr)
�

I consider putting the undisplayable codepoint, rather than a replacement character, into the editor buffer (however tcl encodes it) so that IDLE can retrieve it without loss of information the proper thing for tk to do. IDLE can then potentially identify the character to the user.
===

An oddity though.  With

>>> import tkinter as tk
>>> r = tk.Tk()
>>> t = tk.Text(r)
>>> t.pack()
>>> t.insert('insert', 'a\ud800b')

the box is an empty square, not crossed.  But when I copy-paste 'a�b' into the font sample (Serhiy, making this editable was a great idea), it is crossed for every font I tried, even for Courier, which is what is being used in text t.
msg354193 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-10-08 12:05
And with PR 16583 it is now completely fixed. I.e. it can only fail in cases when the regular interactive interpreter fails too.
History
Date User Action Args
2019-10-08 12:05:45serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg354193

stage: needs patch -> resolved
2019-10-04 17:55:38terry.reedysetmessages: + msg353963
stage: needs patch
2019-10-04 11:59:57serhiy.storchakasetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg353931

stage: resolved -> (no value)
2019-10-04 11:46:09serhiy.storchakasetstatus: open -> closed

nosy: + serhiy.storchaka
messages: + msg353926

resolution: fixed
stage: needs patch -> resolved
2019-04-25 00:55:41martin.pantersetnosy: + martin.panter
messages: + msg340820
2019-04-22 19:09:13terry.reedylinkissue36698 superseder
2019-04-22 19:05:27terry.reedysetmessages: + msg340675
versions: + Python 3.8, - Python 3.6
2017-06-19 19:06:18terry.reedysetassignee: terry.reedy
components: + IDLE, - Library (Lib)
versions: + Python 3.6, Python 3.7, - Python 2.7, Python 3.4, Python 3.5
2015-12-06 13:00:03THRlWiTisetnosy: + THRlWiTi
2014-11-01 00:36:20terry.reedysetversions: + Python 2.7, Python 3.4, Python 3.5
nosy: + terry.reedy

messages: + msg230416

stage: needs patch
2014-10-27 16:18:24belopolskycreate