Message353963
Printing the unquoted escape representation rather than a replacement char is a bit strange and not what I expect from the python docs. I could see it as a bug. In any case, on Windows, it is the Python REPL that raises, but only for sys.stdout.
>>> import sys
>>> print('\ud800', file=sys.stderr)
\ud800
>>> print('\ud800', file=sys.stdout)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed
whereas on Windows the surrogate is displayed as a box with diagonal lines ([X] compressed in one char) in both cases. When copied and pasted into FireFox, the pasted surrogate shows as a square box containing mini D 8 0 0 chars.
>>> print('\ud800', file=sys.stdout)
�
>>> print('\ud800', file=sys.stderr)
�
I consider putting the undisplayable codepoint, rather than a replacement character, into the editor buffer (however tcl encodes it) so that IDLE can retrieve it without loss of information the proper thing for tk to do. IDLE can then potentially identify the character to the user.
===
An oddity though. With
>>> import tkinter as tk
>>> r = tk.Tk()
>>> t = tk.Text(r)
>>> t.pack()
>>> t.insert('insert', 'a\ud800b')
the box is an empty square, not crossed. But when I copy-paste 'a�b' into the font sample (Serhiy, making this editable was a great idea), it is crossed for every font I tried, even for Courier, which is what is being used in text t. |
|
Date |
User |
Action |
Args |
2019-10-04 17:55:38 | terry.reedy | set | recipients:
+ terry.reedy, belopolsky, THRlWiTi, martin.panter, serhiy.storchaka |
2019-10-04 17:55:38 | terry.reedy | set | messageid: <1570211738.79.0.664655820922.issue22742@roundup.psfhosted.org> |
2019-10-04 17:55:38 | terry.reedy | link | issue22742 messages |
2019-10-04 17:55:38 | terry.reedy | create | |
|