New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IDLE shows traceback when printing non-BMP character #66931
Comments
>>> print("\N{ROCKET}")
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
print("\N{ROCKET}")
File "idlelib/PyShell.py", line 1352, in write
return self.shell.write(s, self.tags)
UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001f680' in position 0: Non-BMP character not supported in Tk Shouldn't IDLE replace non-encodable characters with "\uFFFD"? I think >>> "\N{ROCKET}"
� is user-friendlier than the traceback. See also bpo-14304. |
I think Idle should consistently display astral chars with their \U escape. It sometimes does, just not always. >>> s='\U0001f680'
>>> s
'\U0001f680'
>>> str(s)
'\U0001f680'
>>> repr(s)
"'\U0001f680'"
>>> print(s) # gives error above.
>>> print(str(s)) #ditto I thought that implicit print of expression and overt print of the same expression were supposed to be the same. bpo-21084 is also about this general issue. |
On my puzzlement above: repr(s) is a string of 3 characters -- s bracketed by quote characters. print(repr(s)) fails. I am not sure how s gets expanded to the full escape in IDLE. ascii(s) expands all non-ascii and adds extra quotes. Need to check Shell code. In the python REPL, astral chars are not expanded to escape sequences. >>> s='\U0001f603'
>>> s
'😃' # Windows REPL shows two replacement boxes instead of 😃 bpo-36698 is about astral chars in exceptions messages.
results in the Exception traceback, 3 Unicodedecode tracebacks, and a restart. |
I haven’t looked at the code, but I suspect Idle implements a custom “sys.displayhook”: >>> help(sys.displayhook)
Help on function displayhook in module idlelib.rpc: displayhook(value)
Override standard display hook to use non-locale encoding >>> sys.displayhook('\N{ROCKET}')
'\U0001f680'
>>> sys.__displayhook__('\N{ROCKET}')
Traceback (most recent call last):
File "<pyshell#20>", line 1, in <module>
sys.__displayhook__('\N{ROCKET}')
File "/usr/lib/python3.5/idlelib/PyShell.py", line 1344, in write
return self.shell.write(s, self.tags)
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 1-1: Non-BMP character not supported in Tk |
It was fixed for all valid Unicode characters, you can still get an error when print a surrogate character to the stderr on Linux: >>> import sys
>>> print('\ud800', file=sys.stderr)
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
print('\ud800', file=sys.stderr)
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed In the Python REPL you get an escaped sequence. >>> import sys
>>> print('\ud800', file=sys.stderr)
\ud800 |
Printing the unquoted escape representation rather than a replacement char is a bit strange and not what I expect from the python docs. I could see it as a bug. In any case, on Windows, it is the Python REPL that raises, but only for sys.stdout. >>> import sys
>>> print('\ud800', file=sys.stderr)
\ud800
>>> print('\ud800', file=sys.stdout)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed
whereas on Windows the surrogate is displayed as a box with diagonal lines ([X] compressed in one char) in both cases. When copied and pasted into FireFox, the pasted surrogate shows as a square box containing mini D 8 0 0 chars.
>>> print('\ud800', file=sys.stdout)
�
>>> print('\ud800', file=sys.stderr)
� I consider putting the undisplayable codepoint, rather than a replacement character, into the editor buffer (however tcl encodes it) so that IDLE can retrieve it without loss of information the proper thing for tk to do. IDLE can then potentially identify the character to the user. An oddity though. With
the box is an empty square, not crossed. But when I copy-paste 'a�b' into the font sample (Serhiy, making this editable was a great idea), it is crossed for every font I tried, even for Courier, which is what is being used in text t. |
And with PR 16583 it is now completely fixed. I.e. it can only fail in cases when the regular interactive interpreter fails too. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: