Author serhiy.storchaka
Recipients JBernardo, Ramchandra Apte, Rosuav, William.Schwartz, asvetlov, ezio.melotti, loewis, ned.deily, python-dev, roger.serwy, serhiy.storchaka, terry.reedy
Date 2015-11-06.07:15:51
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1446794152.31.0.346242659495.issue13153@psf.upfronthosting.co.za>
In-reply-to
Content
There is no the Snake emoji in my font, I use the Cat Face emoji U+1F431 🐱 (\xf0\x9f\x90\xb1 in UTF-8, \x3d\xd8\x31\xdc in UTF-16LE).

Move cursor or press Backspace. I had needed to press Left 2 times to move cursor to the begin of the line, press Right 4 times to move cursor back to the end of line, and press Backspace 4 times to remove all stuff. What is called "Tk doesn't support astral characters".

Get the text programmically.

>>> text.get('1.0', '1.end')
'ð゚ミᄆ'
>>> print(ascii(text.get('1.0', '1.end')))
'\xf0\uff9f\uff90\uffb1'

On Linux the clipboard uses UTF-8, and this symbol is represented by 4-bytes bytestring b'\xf0\x9f\x90\xb1' (that is why Tk sometimes interpret it as 4 characters). When you request the text content as a Unicode, Tcl fails to decode the string from UTF-8 and falls back to Latin1. Due to other bug it extends the sign of some bytes. When you programmically insert the same string back, it will be encoded to b'\xc3\xb0\xef\xbe\x9f\xef\xbe\x90\xef\xbe\xb1' and displayed as 'ð゚ミᄆ'.

On Windows the clipboard uses UTF-16LE and you can see different results.

The underlying graphical system can support astral characters, but Tk fails to handle them correctly.
History
Date User Action Args
2015-11-06 07:15:52serhiy.storchakasetrecipients: + serhiy.storchaka, loewis, terry.reedy, ned.deily, ezio.melotti, roger.serwy, asvetlov, python-dev, JBernardo, Rosuav, Ramchandra Apte, William.Schwartz
2015-11-06 07:15:52serhiy.storchakasetmessageid: <1446794152.31.0.346242659495.issue13153@psf.upfronthosting.co.za>
2015-11-06 07:15:52serhiy.storchakalinkissue13153 messages
2015-11-06 07:15:51serhiy.storchakacreate