msg394201 - (view) |
Author: Shreyan Avigyan (shreyanavigyan) * |
Date: 2021-05-23 15:36 |
In IDLE, suppose I want to print 😀. I can write it like, print(b'\xf0\x9f\x98\x80'.decode()). I can't print it in console so I tried it in IDLE. It worked. Hurray! Hold on though. Now I thought how can I print 😀 directly. So I used the command 'print("😀")' in IDLE and I saw a weird behavior. Actually to be accurate, many weird behaviors!
1. If I type in 'print(😀))' and I want to delete the extra ')' but it just stuck. I can't backspace or delete anything I've typed but I can type more.
2. If we move our cursor (typing cursor, not the flying one) over the output using arrow keys we would see that the characters are moving and occasionally dancing.
3. (Here comes the laughing part) If we type in 'print(😀)' and then move our typing cursor just between the emoji and the ')' and start typing we would see we're typing in the opposite way. So if we move our cursor and intend to type 'hello' we would actually end up typing 'olleh'.
I'm not sure but these look like bugs to me.
|
msg394206 - (view) |
Author: Steven D'Aprano (steven.daprano) * |
Date: 2021-05-23 16:09 |
The smiley emoji 😀 is U+1F600 which is outside of the Unicode Basic Multilingual Plane (BMP). IDLE's underlying graphical toolkit, Tcl/Tk, has problems with Unicode characters outside of the BMP, so this may not be fixable by us.
If all you want is to print the emoji, the best way is one of these:
print('\U0001F600')
print('\N{GRINNING FACE}')
or use another editor or IDE. Python itself has no problems here, it is just the IDLE editor.
I'm not a Tcl/Tk expert, but it looks like they may be working towards fixing this:
https://core.tcl-lang.org/tips/doc/trunk/tip/542.md
|
msg394207 - (view) |
Author: Shreyan Avigyan (shreyanavigyan) * |
Date: 2021-05-23 16:16 |
It's not a Python problem at all. This was occurring in IDLE only. And yeah I know how it print the character. It's just that I was confused that why IDLE is behaving like that. I am happy to learn that it has a fix coming soon.
|
msg394214 - (view) |
Author: Erlend E. Aasland (erlendaasland) * |
Date: 2021-05-23 18:28 |
> The smiley emoji 😀 is U+1F600 which is outside of the Unicode Basic Multilingual Plane (BMP).
Correct, and this is documented:
https://docs.python.org/3/library/idle.html#user-output-in-shell
Suggesting to close this as not-a-bug.
|
msg394215 - (view) |
Author: Shreyan Avigyan (shreyanavigyan) * |
Date: 2021-05-23 18:32 |
What about closing this as third party? (Tcl/Tk is a dependency but still it's a third party right?)
|
msg394216 - (view) |
Author: Shreyan Avigyan (shreyanavigyan) * |
Date: 2021-05-23 18:33 |
Also what's up with this open-pending issue?
|
msg394217 - (view) |
Author: Erlend E. Aasland (erlendaasland) * |
Date: 2021-05-23 18:44 |
> What about closing this as third party? (Tcl/Tk is a dependency but still it's a third party right?)
Sure. I'll leave that for Terry or any of the other IDLE devs. We can adjust the resolution after close if needed.
> Also what's up with this open-pending issue?
It seems like a bpo bug, but I've got a feeling that fixing bpo bugs will not be prioritised (see PEP 581 and PEP 588).
|
msg394230 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2021-05-24 01:31 |
This is fundamentally a tk text widget issue. Prior to fall 2019, attempts to insert an astral (non-BMP) chars in tkinter widgets raised a unicode decode exception. Then Serhiy fixed tkinter so that such chars could be displayed, either as the actual chars or as a replacement box for unavailable chars. There were 2 remaining problems:
1. The presence of the literal astral char in an editable widget, especially a Text widget, make editing past the char weird. It can be demonstrated with, for instance, the following.
import tkinter as tk
r = tk.Tk()
t.insert = tk.Text(r); t.pack()
t.insert('insert', "a😀b'a😀b'a😀b)\n"
We decided that beings able to output such chars to the view-only part of shell more than made up for confusion such as Shreyan experienced.
Re-experimenting now, it appears that the behavior is worse in IDLE. In particular, backspace delete does not work (nothing happens). IDLE intercepts <backspace> in order to invoke its smart backspace routine. I will try to find out why nothing happens. Whatever the result, the behavior should be better documented.
The 8.7a3 was released Nov 2019, so I have no idea how soon 8.7 will arrive. Maybe the devs found enough bugs to work on, especially on Mac. And then 8.6.11 was needed to have tcl/tk work on the new Apple hardware and OS version.
2. On some Linux systems with some fonts with some XWindows, attempts to display some colored chars (such as particular emoji), in color, causes an XWindows error ('too complex'). This is a Linux-Xwindows-font bug.
|
msg394232 - (view) |
Author: Tal Einat (taleinat) * |
Date: 2021-05-24 04:51 |
This is at least partly a tcl/tk issue.
Using Terry's last example, the Text widget reports that the length of the line is 14 (t.index('1.end') -> '1.14'), despite it only including 11 characters. It appears that each emoji character adds an extra character.
Minimal reproducer:
>>> t.delete('1.0', 'end')
>>> t.insert('1.0', 'a')
>>> t.index('1.end')
'1.1'
>>> t.delete('1.0', 'end')
>>> t.insert('1.0', '😀')
>>> t.index('1.end')
'1.2'
The same happens when using tcl/tk directly, so it's not a tkinter issue:
$ wish
% tk::text .t -width 40 -height 5 -wrap none -yscrollcommand ".ys set" -xscrollcommand ".xs set"
.t
% ttk::scrollbar .ys -orient vertical -command ".t yview"
.ys
% ttk::scrollbar .xs -orient horizontal -command ".t xview".t
extra characters after close-quote
% ttk::scrollbar .xs -orient horizontal -command ".t xview"
.xs
% .t insert end "a"
% grid .t -column 0 -row 0 -sticky nwes
% grid .t -column 0 -row 0 -sticky nwes
% grid .ys -column 1 -row 0 -sticky ns
% grid columnconfigure . 0 -weight 1
% grid rowconfigure . 0 -weight 1
% .t delete 1.0 end
% .t insert end "😀"
% .t index 1.end
1.2
% .t get 1.0
😀
|
msg394236 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2021-05-24 07:40 |
It is partially an IDLE issue. The code expects that indices in Python string correspond indices in Tcl string, but this is not true in case of astral characters which are encoded as 2 (or maybe even 4) characters in Tcl.
To fix it we need to translate between Python and Tcl indices every time when we pass indices from one language to other. It is virtually impossible to do in general (in Tkinter code) because there are tons of methods which return or accept indices. It can be fixed in IDLE specifically, but it is still a lot of work. I'll try to fix at least some code (backspace and highlighting).
And every Tkinter application which works with string indices and lengths and can support astral characters should fix it separately. It can help if helper function for conversion between indices in IDLE be exposed as public API in Tkinter.
On other hand, the problem will gone in Tcl/Tk 8.7 or 9.0. So we can just wait several years.
|
msg394238 - (view) |
Author: Erlend E. Aasland (erlendaasland) * |
Date: 2021-05-24 07:49 |
STM like the most reasonable thing to do is to enhance the docs, as Terry suggested.
|
msg394239 - (view) |
Author: Shreyan Avigyan (shreyanavigyan) * |
Date: 2021-05-24 08:01 |
The first and third behavior is only occurring for IDLE. I believe the third one is a cause of the first one.
The second behavior of dancing is a Tcl/Tk problem.
|
msg394247 - (view) |
Author: Tal Einat (taleinat) * |
Date: 2021-05-24 11:49 |
> It is partially an IDLE issue. The code expects that indices in Python string correspond indices in Tcl string, but this is not true in case of astral characters which are encoded as 2 (or maybe even 4) characters in Tcl.
It's not just that - Tk's Text widget is the indexing in the line itself wrong. In the string from Terry's example, which has 11 characters in a line including three smiley emojis, the can be fetch using t.get('1.1'), t.get('1.2') etc. through t.get('1.11'). t.get('1.12') returns '\n' since it is at or after the end of the line. So, as far as indexing is concerned, each of those emoji characters is treated as a single character.
|
msg394387 - (view) |
Author: Shreyan Avigyan (shreyanavigyan) * |
Date: 2021-05-25 18:46 |
I executed the code Tal provided and yes both Serhiy and Tal you're right. It seems the backspace problem is related to Tcl/Tk's indexing. 1.1 and 1.2 both refer to 😀. Also the dancing I told actually tells why this is happening. Since Tcl/Tk uses BMP it treats 😀 also as 2 bytes. Now when we move the typing cursor from right to left using arrow keys the 😀 gets split into ??. Notice there are two ?. So it's all coming together now.
|
msg394389 - (view) |
Author: Shreyan Avigyan (shreyanavigyan) * |
Date: 2021-05-25 18:53 |
Sorry, 1.0 and 1.1 refer to 😀 not 1.1 and 1.2
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:46 | admin | set | github: 88383 |
2021-05-25 18:53:05 | shreyanavigyan | set | messages:
+ msg394389 |
2021-05-25 18:46:47 | shreyanavigyan | set | messages:
+ msg394387 components:
+ Tkinter |
2021-05-24 11:49:42 | taleinat | set | messages:
+ msg394247 |
2021-05-24 08:01:23 | shreyanavigyan | set | messages:
+ msg394239 |
2021-05-24 07:49:23 | erlendaasland | set | messages:
+ msg394238 |
2021-05-24 07:40:45 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg394236
|
2021-05-24 04:51:27 | taleinat | set | messages:
+ msg394232 |
2021-05-24 01:31:30 | terry.reedy | set | nosy:
+ taleinat
messages:
+ msg394230 title: [IDLE] Weird behaviour in IDLE when printing non-BMP unicode characters -> Tkinter/IDLE: literal astral char discombobulates text editing |
2021-05-23 18:44:22 | erlendaasland | set | status: pending -> open
messages:
+ msg394217 |
2021-05-23 18:33:21 | shreyanavigyan | set | status: open -> pending
messages:
+ msg394216 |
2021-05-23 18:32:57 | shreyanavigyan | set | status: pending -> open
messages:
+ msg394215 |
2021-05-23 18:30:25 | erlendaasland | set | status: open -> pending |
2021-05-23 18:30:12 | erlendaasland | set | status: pending -> open title: [IDLE] Weird behaviour in IDLE while dealing with non-ASCII characters -> [IDLE] Weird behaviour in IDLE when printing non-BMP unicode characters |
2021-05-23 18:28:58 | erlendaasland | set | status: open -> pending nosy:
+ erlendaasland messages:
+ msg394214
|
2021-05-23 16:16:32 | shreyanavigyan | set | messages:
+ msg394207 |
2021-05-23 16:09:40 | steven.daprano | set | nosy:
+ steven.daprano messages:
+ msg394206
|
2021-05-23 15:36:54 | shreyanavigyan | set | versions:
+ Python 3.9, Python 3.10, Python 3.11 |
2021-05-23 15:36:23 | shreyanavigyan | create | |