Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDLE shows traceback when printing non-BMP character #66931

Closed
abalkin opened this issue Oct 27, 2014 · 8 comments
Closed

IDLE shows traceback when printing non-BMP character #66931

abalkin opened this issue Oct 27, 2014 · 8 comments
Assignees
Labels
3.7 (EOL) end of life 3.8 only security fixes topic-IDLE type-bug An unexpected behavior, bug, or error

Comments

@abalkin
Copy link
Member

abalkin commented Oct 27, 2014

BPO 22742
Nosy @terryjreedy, @abalkin, @vadmium, @serhiy-storchaka

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/terryjreedy'
closed_at = <Date 2019-10-08.12:05:45.249>
created_at = <Date 2014-10-27.16:18:24.835>
labels = ['3.8', 'expert-IDLE', 'type-bug', '3.7']
title = 'IDLE shows traceback when printing non-BMP character'
updated_at = <Date 2019-10-08.12:05:45.248>
user = 'https://github.com/abalkin'

bugs.python.org fields:

activity = <Date 2019-10-08.12:05:45.248>
actor = 'serhiy.storchaka'
assignee = 'terry.reedy'
closed = True
closed_date = <Date 2019-10-08.12:05:45.249>
closer = 'serhiy.storchaka'
components = ['IDLE']
creation = <Date 2014-10-27.16:18:24.835>
creator = 'belopolsky'
dependencies = []
files = []
hgrepos = []
issue_num = 22742
keywords = []
message_count = 8.0
messages = ['230078', '230416', '340675', '340820', '353926', '353931', '353963', '354193']
nosy_count = 5.0
nosy_names = ['terry.reedy', 'belopolsky', 'THRlWiTi', 'martin.panter', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue22742'
versions = ['Python 3.7', 'Python 3.8']

@abalkin
Copy link
Member Author

abalkin commented Oct 27, 2014

>>> print("\N{ROCKET}")
Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    print("\N{ROCKET}")
  File "idlelib/PyShell.py", line 1352, in write
    return self.shell.write(s, self.tags)
UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001f680' in position 0: Non-BMP character not supported in Tk

Shouldn't IDLE replace non-encodable characters with "\uFFFD"?

I think

>>> "\N{ROCKET}"

is user-friendlier than the traceback.

See also bpo-14304.

@abalkin abalkin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Oct 27, 2014
@terryjreedy
Copy link
Member

I think Idle should consistently display astral chars with their \U escape. It sometimes does, just not always.

>>> s='\U0001f680'
>>> s
'\U0001f680'
>>> str(s)
'\U0001f680'
>>> repr(s)
"'\U0001f680'"
>>> print(s) # gives error above.
>>> print(str(s))  #ditto

I thought that implicit print of expression and overt print of the same expression were supposed to be the same.

bpo-21084 is also about this general issue.

@terryjreedy terryjreedy added topic-IDLE 3.7 (EOL) end of life and removed stdlib Python modules in the Lib dir labels Jun 19, 2017
@terryjreedy terryjreedy self-assigned this Jun 19, 2017
@terryjreedy
Copy link
Member

On my puzzlement above: repr(s) is a string of 3 characters -- s bracketed by quote characters. print(repr(s)) fails. I am not sure how s gets expanded to the full escape in IDLE. ascii(s) expands all non-ascii and adds extra quotes. Need to check Shell code.

In the python REPL, astral chars are not expanded to escape sequences.

>>> s='\U0001f603'
>>> s
'😃'  # Windows REPL shows two replacement boxes instead of 😃

bpo-36698 is about astral chars in exceptions messages.

>> raise Exception(s)

results in the Exception traceback, 3 Unicodedecode tracebacks, and a restart.

@terryjreedy terryjreedy added the 3.8 only security fixes label Apr 22, 2019
@vadmium
Copy link
Member

vadmium commented Apr 25, 2019

I haven’t looked at the code, but I suspect Idle implements a custom “sys.displayhook”:

>>> help(sys.displayhook)
Help on function displayhook in module idlelib.rpc:
displayhook(value)
    Override standard display hook to use non-locale encoding
>>> sys.displayhook('\N{ROCKET}')
'\U0001f680'
>>> sys.__displayhook__('\N{ROCKET}')
Traceback (most recent call last):
  File "<pyshell#20>", line 1, in <module>
    sys.__displayhook__('\N{ROCKET}')
  File "/usr/lib/python3.5/idlelib/PyShell.py", line 1344, in write
    return self.shell.write(s, self.tags)
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 1-1: Non-BMP character not supported in Tk

@serhiy-storchaka
Copy link
Member

Fixed by PR 16545 (see bpo-13153).

@serhiy-storchaka
Copy link
Member

It was fixed for all valid Unicode characters, you can still get an error when print a surrogate character to the stderr on Linux:

>>> import sys
>>> print('\ud800', file=sys.stderr)
Traceback (most recent call last):
  File "<pyshell#4>", line 1, in <module>
    print('\ud800', file=sys.stderr)
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed

In the Python REPL you get an escaped sequence.

>>> import sys
>>> print('\ud800', file=sys.stderr)
\ud800

@terryjreedy
Copy link
Member

Printing the unquoted escape representation rather than a replacement char is a bit strange and not what I expect from the python docs. I could see it as a bug. In any case, on Windows, it is the Python REPL that raises, but only for sys.stdout.

>>> import sys
>>> print('\ud800', file=sys.stderr)
\ud800
>>> print('\ud800', file=sys.stdout)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed

whereas on Windows the surrogate is displayed as a box with diagonal lines ([X] compressed in one char) in both cases.  When copied and pasted into FireFox, the pasted surrogate shows as a square box containing mini D 8 0 0 chars.
>>> print('\ud800', file=sys.stdout)
�
>>> print('\ud800', file=sys.stderr)
�

I consider putting the undisplayable codepoint, rather than a replacement character, into the editor buffer (however tcl encodes it) so that IDLE can retrieve it without loss of information the proper thing for tk to do. IDLE can then potentially identify the character to the user.
===

An oddity though. With

>> import tkinter as tk
>> r = tk.Tk()
>> t = tk.Text(r)
>> t.pack()
>> t.insert('insert', 'a\ud800b')

the box is an empty square, not crossed. But when I copy-paste 'a�b' into the font sample (Serhiy, making this editable was a great idea), it is crossed for every font I tried, even for Courier, which is what is being used in text t.

@serhiy-storchaka
Copy link
Member

And with PR 16583 it is now completely fixed. I.e. it can only fail in cases when the regular interactive interpreter fails too.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.7 (EOL) end of life 3.8 only security fixes topic-IDLE type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants