This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author taleinat
Recipients ezio.melotti, ned.deily, serhiy.storchaka, taleinat, terry.reedy
Date 2019-09-24.12:23:06
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1569327786.81.0.341658453038.issue13153@roundup.psfhosted.org>
In-reply-to
Content
I can confirm that the crash from pasting these characters happens when trying to fetch the clipboard contents.  We can override the built-in <<Paste>> event, but then we have to get the clipboard's contents directly, and the only portable way to do that in the stdlib is via Tkinter's clipboard_get(). (For a non-stdlib solution, check out pyperclip on PyPI.)

clipboard_get(), which I assume calls what Tk uses internally to handle the <<Paste>> event, crashes in the C code with a UnicodeDecodeError.  Here's a traceback from calling clipboard_get() with 🐱 in the clipboard (Windows 10, recent master branch, i.e. to be 3.9):

Exception in Tkinter callback
Traceback (most recent call last):
  File "C:\Users\Tal\dev\cpython\lib\tkinter\__init__.py", line 1885, in __call__
    return self.func(*args)
  File "C:\Users\Tal\dev\cpython\lib\idlelib\multicall.py", line 176, in handler
    r = l[i](event)
  File "C:\Users\Tal\dev\cpython\lib\idlelib\editor.py", line 618, in paste
    print(self.text.clipboard_get())
  File "C:\Users\Tal\dev\cpython\lib\tkinter\__init__.py", line 867, in clipboard_get
    return self.tk.call(('clipboard', 'get') + self._options(kw))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 0: invalid continuation byte

From a quick look, this appears to be happening in _tkinter.c, here:

static PyObject *
unicodeFromTclStringAndSize(const char *s, Py_ssize_t size)
{
    PyObject *r = PyUnicode_DecodeUTF8(s, size, NULL);
    ...

My guess is that Tk is passing the clipboard contents as-is, and we're simply not decoding it with the proper encoding (i.e. utf-16le on Windows).

Is this something worth fixing / working around in Tkinter, e.g. by using a proper encoding depending on the platform for fetching clipboard contents? Or are we content to continue waiting for Tk to fix this?
History
Date User Action Args
2019-09-24 12:23:06taleinatsetrecipients: + taleinat, terry.reedy, ned.deily, ezio.melotti, serhiy.storchaka
2019-09-24 12:23:06taleinatsetmessageid: <1569327786.81.0.341658453038.issue13153@roundup.psfhosted.org>
2019-09-24 12:23:06taleinatlinkissue13153 messages
2019-09-24 12:23:06taleinatcreate