classification
Title: Non-bmp (astral) unicode characters confuse the editor
Type: behavior Stage:
Components: IDLE Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: terry.reedy Nosy List: dmaxime, terry.reedy
Priority: normal Keywords:

Created on 2019-12-23 21:11 by dmaxime, last changed 2019-12-23 22:22 by terry.reedy.

Files
File name Uploaded Description Edit
python3ide bug.mp4 dmaxime, 2019-12-23 21:11
Messages (2)
msg358836 - (view) Author: dmaxime (dmaxime) Date: 2019-12-23 21:11
>>> b'\xf0\x9f\x98\x86'.decode('utf8')
'😆'
>>> '😆'.encode('utf8')
b'\xf0\x9f\x98\x86'

...now if you write '😆'.encode() then you move the cursor between the brackets and type "'utf8'" you will have this result while the cursor remains in the brackets:

>>> '😆'.encode()''8ftu
SyntaxError: invalid syntax
>>>

I've attached a video that shows this behavior.
Thanks for your attention. Cheers.
msg358838 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-12-23 22:22
I am aware of this.  Before the recent (Oct 4) fix for #13153, pasting an astral character into an edit line or window crashed IDLE.  After the fix, the character appears (and printing such chars also works reliably).  But astral chars confuse the tk text widget, which cannot properly handle them. (I believe that they are stored as 2 surrogate chars, displayed as one.)

I don't want to immediately replace such chars with escape sequences.

>>> hex(ord('😆'))
'0x1f606'
>>> '\U0001f606'
'😆'

The effect is limited to the line containing the odd char, and once entered, strange cursor placement does not matter too much.  But we probably should document the situation and add an option to escape or unescape such chars.
History
Date User Action Args
2019-12-23 22:22:38terry.reedysettitle: Some characters confuse the editor -> Non-bmp (astral) unicode characters confuse the editor
messages: + msg358838
versions: + Python 3.9, - Python 3.8
2019-12-23 21:11:54dmaximecreate