This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Tkinter/IDLE: literal astral char discombobulates text editing
Type: behavior Stage:
Components: IDLE, Tkinter Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: terry.reedy Nosy List: erlendaasland, serhiy.storchaka, shreyanavigyan, steven.daprano, taleinat, terry.reedy
Priority: normal Keywords:

Created on 2021-05-23 15:36 by shreyanavigyan, last changed 2022-04-11 14:59 by admin.

Messages (15)
msg394201 - (view) Author: Shreyan Avigyan (shreyanavigyan) * Date: 2021-05-23 15:36
In IDLE, suppose I want to print 😀. I can write it like, print(b'\xf0\x9f\x98\x80'.decode()). I can't print it in console so I tried it in IDLE. It worked. Hurray! Hold on though. Now I thought how can I print 😀 directly. So I used the command 'print("😀")' in IDLE and I saw a weird behavior. Actually to be accurate, many weird behaviors!

1. If I type in 'print(😀))' and I want to delete the extra ')' but it just stuck. I can't backspace or delete anything I've typed but I can type more.

2. If we move our cursor (typing cursor, not the flying one) over the output using arrow keys we would see that the characters are moving and occasionally dancing.

3. (Here comes the laughing part) If we type in 'print(😀)' and then move our typing cursor just between the emoji and the ')' and start typing we would see we're typing in the opposite way. So if we move our cursor and intend to type 'hello' we would actually end up typing 'olleh'.

I'm not sure but these look like bugs to me.
msg394206 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2021-05-23 16:09
The smiley emoji 😀 is U+1F600 which is outside of the Unicode Basic Multilingual Plane (BMP). IDLE's underlying graphical toolkit, Tcl/Tk, has problems with Unicode characters outside of the BMP, so this may not be fixable by us.

If all you want is to print the emoji, the best way is one of these:

    print('\U0001F600')

    print('\N{GRINNING FACE}')


or use another editor or IDE. Python itself has no problems here, it is just the IDLE editor.

I'm not a Tcl/Tk expert, but it looks like they may be working towards fixing this:

https://core.tcl-lang.org/tips/doc/trunk/tip/542.md
msg394207 - (view) Author: Shreyan Avigyan (shreyanavigyan) * Date: 2021-05-23 16:16
It's not a Python problem at all. This was occurring in IDLE only. And yeah I know how it print the character. It's just that I was confused that why IDLE is behaving like that. I am happy to learn that it has a fix coming soon.
msg394214 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-05-23 18:28
> The smiley emoji 😀 is U+1F600 which is outside of the Unicode Basic Multilingual Plane (BMP).

Correct, and this is documented:
https://docs.python.org/3/library/idle.html#user-output-in-shell

Suggesting to close this as not-a-bug.
msg394215 - (view) Author: Shreyan Avigyan (shreyanavigyan) * Date: 2021-05-23 18:32
What about closing this as third party? (Tcl/Tk is a dependency but still it's a third party right?)
msg394216 - (view) Author: Shreyan Avigyan (shreyanavigyan) * Date: 2021-05-23 18:33
Also what's up with this open-pending issue?
msg394217 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-05-23 18:44
> What about closing this as third party? (Tcl/Tk is a dependency but still it's a third party right?)

Sure. I'll leave that for Terry or any of the other IDLE devs. We can adjust the resolution after close if needed.

> Also what's up with this open-pending issue?

It seems like a bpo bug, but I've got a feeling that fixing bpo bugs will not be prioritised (see PEP 581 and PEP 588).
msg394230 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-05-24 01:31
This is fundamentally a tk text widget issue.  Prior to fall 2019, attempts to insert an astral (non-BMP) chars in tkinter widgets raised a unicode decode exception.  Then Serhiy fixed tkinter so that such chars could be displayed, either as the actual chars or as a replacement box for unavailable chars.  There were 2 remaining problems:

1. The presence of the literal astral char in an editable widget, especially a Text widget, make editing past the char weird.  It can be demonstrated with, for instance, the following.

import tkinter as tk
r = tk.Tk()
t.insert = tk.Text(r); t.pack()
t.insert('insert', "a😀b'a😀b'a😀b)\n"

We decided that beings able to output such chars to the view-only part of shell more than made up for confusion such as Shreyan experienced.

Re-experimenting now, it appears that the behavior is worse in IDLE.  In particular, backspace delete does not work (nothing happens).  IDLE intercepts <backspace> in order to invoke its smart backspace routine.  I will try to find out why nothing happens.  Whatever the result, the behavior should be better documented.

The 8.7a3 was released Nov 2019, so I have no idea how soon 8.7 will arrive.  Maybe the devs found enough bugs to work on, especially on Mac.  And then 8.6.11 was needed to have tcl/tk work on the new Apple hardware and OS version.

2. On some Linux systems with some fonts with some XWindows, attempts to display some colored chars (such as particular emoji), in color, causes an XWindows error ('too complex').  This is a Linux-Xwindows-font bug.
msg394232 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2021-05-24 04:51
This is at least partly a tcl/tk issue.

Using Terry's last example, the Text widget reports that the length of the line is 14 (t.index('1.end') -> '1.14'), despite it only including 11 characters. It appears that each emoji character adds an extra character.

Minimal reproducer:

>>> t.delete('1.0', 'end')
>>> t.insert('1.0', 'a')
>>> t.index('1.end')
'1.1'
>>> t.delete('1.0', 'end')
>>> t.insert('1.0', '😀')
>>> t.index('1.end')
'1.2'


The same happens when using tcl/tk directly, so it's not a tkinter issue:

$ wish
% tk::text .t -width 40 -height 5  -wrap none -yscrollcommand ".ys set" -xscrollcommand ".xs set"
.t
% ttk::scrollbar .ys -orient vertical -command ".t yview"
.ys
% ttk::scrollbar .xs -orient horizontal -command ".t xview".t
extra characters after close-quote
% ttk::scrollbar .xs -orient horizontal -command ".t xview"
.xs
% .t insert end "a"
% grid .t -column 0 -row 0 -sticky nwes
% grid .t -column 0 -row 0 -sticky nwes
% grid .ys -column 1 -row 0 -sticky ns
% grid columnconfigure . 0 -weight 1
% grid rowconfigure . 0 -weight 1
% .t delete 1.0 end  
% .t insert end "😀"  
% .t index 1.end
1.2
% .t get 1.0
😀
msg394236 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-05-24 07:40
It is partially an IDLE issue. The code expects that indices in Python string correspond indices in Tcl string, but this is not true in case of astral characters which are encoded as 2 (or maybe even 4) characters in Tcl.

To fix it we need to translate between Python and Tcl indices every time when we pass indices from one language to other. It is virtually impossible to do in general (in Tkinter code) because there are tons of methods which return or accept indices. It can be fixed in IDLE specifically, but it is still a lot of work. I'll try to fix at least some code (backspace and highlighting).

And every Tkinter application which works with string indices and lengths and can support astral characters should fix it separately. It can help if helper function for conversion between indices in IDLE be exposed as public API in Tkinter.

On other hand, the problem will gone in Tcl/Tk 8.7 or 9.0. So we can just wait several years.
msg394238 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-05-24 07:49
STM like the most reasonable thing to do is to enhance the docs, as Terry suggested.
msg394239 - (view) Author: Shreyan Avigyan (shreyanavigyan) * Date: 2021-05-24 08:01
The first and third behavior is only occurring for IDLE. I believe the third one is a cause of the first one. 

The second behavior of dancing is a Tcl/Tk problem.
msg394247 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2021-05-24 11:49
> It is partially an IDLE issue. The code expects that indices in Python string correspond indices in Tcl string, but this is not true in case of astral characters which are encoded as 2 (or maybe even 4) characters in Tcl.

It's not just that - Tk's Text widget is the indexing in the line itself wrong. In the string from Terry's example, which has 11 characters in a line including three smiley emojis, the can be fetch using t.get('1.1'), t.get('1.2') etc. through t.get('1.11'). t.get('1.12') returns '\n' since it is at or after the end of the line. So, as far as indexing is concerned, each of those emoji characters is treated as a single character.
msg394387 - (view) Author: Shreyan Avigyan (shreyanavigyan) * Date: 2021-05-25 18:46
I executed the code Tal provided and yes both Serhiy and Tal you're right. It seems the backspace problem is related to Tcl/Tk's indexing. 1.1 and 1.2 both refer to 😀. Also the dancing I told actually tells why this is happening. Since Tcl/Tk uses BMP it treats 😀 also as 2 bytes. Now when we move the typing cursor from right to left using arrow keys the 😀 gets split into ??. Notice there are two ?. So it's all coming together now.
msg394389 - (view) Author: Shreyan Avigyan (shreyanavigyan) * Date: 2021-05-25 18:53
Sorry, 1.0 and 1.1 refer to 😀 not 1.1 and 1.2
History
Date User Action Args
2022-04-11 14:59:46adminsetgithub: 88383
2021-05-25 18:53:05shreyanavigyansetmessages: + msg394389
2021-05-25 18:46:47shreyanavigyansetmessages: + msg394387
components: + Tkinter
2021-05-24 11:49:42taleinatsetmessages: + msg394247
2021-05-24 08:01:23shreyanavigyansetmessages: + msg394239
2021-05-24 07:49:23erlendaaslandsetmessages: + msg394238
2021-05-24 07:40:45serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg394236
2021-05-24 04:51:27taleinatsetmessages: + msg394232
2021-05-24 01:31:30terry.reedysetnosy: + taleinat

messages: + msg394230
title: [IDLE] Weird behaviour in IDLE when printing non-BMP unicode characters -> Tkinter/IDLE: literal astral char discombobulates text editing
2021-05-23 18:44:22erlendaaslandsetstatus: pending -> open

messages: + msg394217
2021-05-23 18:33:21shreyanavigyansetstatus: open -> pending

messages: + msg394216
2021-05-23 18:32:57shreyanavigyansetstatus: pending -> open

messages: + msg394215
2021-05-23 18:30:25erlendaaslandsetstatus: open -> pending
2021-05-23 18:30:12erlendaaslandsetstatus: pending -> open
title: [IDLE] Weird behaviour in IDLE while dealing with non-ASCII characters -> [IDLE] Weird behaviour in IDLE when printing non-BMP unicode characters
2021-05-23 18:28:58erlendaaslandsetstatus: open -> pending
nosy: + erlendaasland
messages: + msg394214

2021-05-23 16:16:32shreyanavigyansetmessages: + msg394207
2021-05-23 16:09:40steven.dapranosetnosy: + steven.daprano
messages: + msg394206
2021-05-23 15:36:54shreyanavigyansetversions: + Python 3.9, Python 3.10, Python 3.11
2021-05-23 15:36:23shreyanavigyancreate