msg380715 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2020-11-10 21:22 |
As mentioned in msg380552: I get an SyntaxError with message "utf-8' codec can't encode characters in position 7-12: surrogates not allowed." when I paste a smiley emoji in an IDLE interactive shell and try to execute that line, for example using:
>>> print("😀")
The error is likely due to a surrogate pair being present in the UTF-8 representation of a Tcl/Tk string.
It should be possible to work around this in _tkinter.c:unicodeFromTclStringAndSize by merging surrogate pairs.
This is with:
- Python 3.10
- macOS 11 (arm64)
- Tk 8.6.10
With Tk 8.6.8 (as included in the macOS installers on python.org) printing won't work at all, as mentioned in bpo-42225.
|
msg380879 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2020-11-13 12:51 |
Just to be sure, what is the result of pasting and executing the following code on Tk 8.6.8 and 8.6.10?
print(ascii("😀"))
|
msg380881 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2020-11-13 12:56 |
Well, it is likely the same syntax error. Then what will print
print(ascii(input()))
when you paste 😀 and press Enter?
|
msg380906 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2020-11-13 16:09 |
With 8.6.8 both "hang", in that the Shell window no longer accepts input. I've checked that ``print(input())`` works when I don't use an emoji.
Interestingly enough, pasting ``print(ascii("😀"))`` into an edit window does work, I can continue editing, but the display is messed up. It looks like:
print(ascii("😀"))print(ascii("
But with the first two identifiers coloured and the two other identifiers black. Saving the file results in the expected file contents.
|
msg380907 - (view) |
Author: Yash Shete (Pixmew) * |
Date: 2020-11-13 16:28 |
Well for me in Python 3.9.0
print("😀") prints 😀 and
print(ascii("😀")) prints '\U0001f600'
It does not Raises error "utf-8' codec can't encode characters in position 7-12: surrogates not allowed." as you are suggesting
|
msg380908 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2020-11-13 16:35 |
With 8.6.10:
>>> print(ascii("😀")) raises the SyntaxError mentioned earlier
>>> print(ascii(input())) works and prints:
'\udced\udca0\udcbd\udced\udcb8\udc84'
In an editor window I don't get spurious text, but syntax colouring is a bit off: The text after the closing quote is coloured as if it is inside the string literal. That continues for the characters on the next line
|
msg380909 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2020-11-13 16:38 |
@Pixmew: I get this error with Tk 8.6.10 on macOS 11. With Tk 8.6.8 on macOS 10.15 (from the python.org installer) I get the behaviour described in msg380906.
8.6.10 is the version of Tk we'd like to switch to for the "universal2", it is the latest release in the 8.6.x branch and contains numerous bug fixes.
The "Intel" installers (the ones currently on Python.org) we'll continue to use Tk 8.6.8 due to build issues on macOS 10.9 with newer Tk versions.
|
msg380910 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2020-11-13 17:04 |
BTW. The unicodeFromTclStringAndSize() basically undoes the special treatment of \0 in Modified UTF-8 [1]. That page says that all known implementation of MUTF-8 treat surrogate pairs the same as CESU-8 [2], which is UTF-8 with characters outside of the BMP encoded as surrogate pairs which are then converted to UTF-8.
Neither encoding is currently supported by Python.
[1] https://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8
[2] https://en.wikipedia.org/wiki/CESU-8
|
msg380917 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2020-11-13 18:27 |
Well, try copy 😀 (or other text with color emoji) to clipboard and run the following code:
import tkinter
root = tkinter.Tk()
print(ascii(root.clipboard_get()))
|
msg380918 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2020-11-13 18:37 |
When I assign root.clipboard_get() to "v" I get:
>>> print(ascii(v))
'\udced\udca0\udcbd\udced\udcb8\udc84'
>>> print(v)
??????
This is with Tk 8.6.10.
|
msg380919 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2020-11-13 18:38 |
You can ignore msg380917. It was written before I read msg380908. Now I have the needed information. Thank you.
|
msg380920 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2020-11-13 18:46 |
And yet one question. What do you see if you print '\udcf0\udc9f\udc98\udc80' in IDLE?
|
msg380924 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2020-11-13 18:50 |
> And yet one question. What do you see if you print '\udcf0\udc9f\udc98\udc80' in IDLE?
This prints a smiley emoji, likewise for printing chr(128516)
|
msg380953 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2020-11-14 02:16 |
Yash, this is specifically a macOS issue. Printing astral chars in tkinter/IDLE on Windows and Linux has 'worked' (details not important) for over a year.
|
msg381019 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2020-11-15 16:17 |
New changeset a26215db11cfcf7b5f55cab9e91396761a0e0bcf by Serhiy Storchaka in branch 'master':
bpo-42318: Fix support of non-BMP characters in Tkinter on macOS (GH-23281)
https://github.com/python/cpython/commit/a26215db11cfcf7b5f55cab9e91396761a0e0bcf
|
msg383077 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2020-12-15 16:45 |
Oh, the fix is not backported yet.
Automatically backporting does not work because of renames in the supporting test library.
|
msg383088 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2020-12-15 18:45 |
New changeset 28bf6ab61f77c69b732a211c398ac882bf3f65f4 by Serhiy Storchaka in branch '3.9':
[3.9] bpo-42318: Fix support of non-BMP characters in Tkinter on macOS (GH-23281). (GH-23784)
https://github.com/python/cpython/commit/28bf6ab61f77c69b732a211c398ac882bf3f65f4
|
msg383775 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2020-12-25 22:35 |
New changeset 4d840e428ab1a2712f219c5e4008658cbe15892e by Miss Islington (bot) in branch '3.8':
[3.8] bpo-42318: Fix support of non-BMP characters in Tkinter on macOS (GH-23281). (GH-23784) (GH-23787)
https://github.com/python/cpython/commit/4d840e428ab1a2712f219c5e4008658cbe15892e
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:38 | admin | set | github: 86484 |
2020-12-25 22:36:49 | serhiy.storchaka | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2020-12-25 22:35:49 | serhiy.storchaka | set | messages:
+ msg383775 |
2020-12-15 18:45:10 | miss-islington | set | nosy:
+ miss-islington pull_requests:
+ pull_request22644
|
2020-12-15 18:45:09 | serhiy.storchaka | set | messages:
+ msg383088 |
2020-12-15 16:55:27 | serhiy.storchaka | set | pull_requests:
+ pull_request22641 |
2020-12-15 16:45:41 | serhiy.storchaka | set | messages:
+ msg383077 |
2020-11-15 16:17:03 | serhiy.storchaka | set | messages:
+ msg381019 |
2020-11-14 11:48:37 | serhiy.storchaka | set | keywords:
+ patch stage: needs patch -> patch review pull_requests:
+ pull_request22175 |
2020-11-14 02:16:10 | terry.reedy | set | nosy:
+ terry.reedy messages:
+ msg380953
|
2020-11-13 18:50:14 | ronaldoussoren | set | messages:
+ msg380924 |
2020-11-13 18:46:32 | serhiy.storchaka | set | messages:
+ msg380920 |
2020-11-13 18:38:19 | serhiy.storchaka | set | messages:
+ msg380919 |
2020-11-13 18:37:15 | ronaldoussoren | set | messages:
+ msg380918 |
2020-11-13 18:27:47 | serhiy.storchaka | set | messages:
+ msg380917 |
2020-11-13 17:04:27 | ronaldoussoren | set | messages:
+ msg380910 |
2020-11-13 16:38:46 | ronaldoussoren | set | messages:
+ msg380909 |
2020-11-13 16:35:31 | ronaldoussoren | set | messages:
+ msg380908 |
2020-11-13 16:28:24 | Pixmew | set | nosy:
+ Pixmew messages:
+ msg380907
|
2020-11-13 16:09:58 | ronaldoussoren | set | messages:
+ msg380906 |
2020-11-13 12:56:40 | serhiy.storchaka | set | messages:
+ msg380881 |
2020-11-13 12:51:33 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka
messages:
+ msg380879 versions:
+ Python 3.8, Python 3.9, Python 3.10 |
2020-11-10 21:22:18 | ronaldoussoren | create | |