msg160378 - (view) |
Author: Thomas Kluyver (takluyver) * |
Date: 2012-05-10 22:47 |
With the text 'abc€' copied to the clipboard, on Linux, where UTF-8 is the default encoding:
Python 3.2.3 (default, Apr 12 2012, 21:55:50)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tkinter
>>> root = tkinter.Tk()
>>> root.clipboard_get()
'abcâ\x82¬'
>>> 'abc€'.encode('utf-8').decode('latin-1')
'abcâ\x82¬'
I see the same behaviour in 2.7.3 as well (it returns a unicode string u'abc\xe2\x82\xac').
If the clipboard is only accessible at a bytes level, I think clipboard_get should return a bytes object. But I can reliably copy and paste non-ascii characters between programs, so it looks like it's possible to return unicode.
|
msg160379 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-05-10 23:09 |
Still worse. I get 'abc?'. Linux, Python 3.1, 3.2, and 3.3, UTF-8 locale.
|
msg160419 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2012-05-11 16:39 |
3.3, Win 7, Idle
>>> root.clipboard_get()
'abc€'
after cut from here
|
msg160438 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-05-11 18:41 |
This issue can be reproduced by pure Tcl/Tk:
$ wish
% clipboard get
abc?
% clipboard get -type STRING
abc?
% clipboard get -type UTF8_STRING
abc€
Use `root.clipboard_get(type='UTF8_STRING')` in Python.
I don't know whether it should just be documented (UTF8_STRING is not even mentioned in the clipboard_get docstring), or do we need to change the default behavior.
|
msg160440 - (view) |
Author: Thomas Kluyver (takluyver) * |
Date: 2012-05-11 19:09 |
On this computer, I see this from Tcl:
$ wish
% clipboard get
abc\u20ac
But here Python's following suit:
>>> root.clipboard_get()
'abc\\u20ac'
Which is odd, because as far as I know, my two computers run the same OS (Ubuntu 12.04) in the same configuration. I briefly thought the presence of xsel might be affecting it, but uninstalling it doesn't seem to make any difference.
|
msg160441 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2012-05-11 19:24 |
As is often the case with Tcl/Tk issues, there are platform differences. On OS X, with the two native Tcl/Tk implementations (Aqua Cocoa and Aqua Carbon), the examples work appear to work as is *and* type "UTF8_STRING" does not exist. The less commonly used X11 Tcl/Tk on OS X does support and require "UTF8_STRING" for the example given. So any doc change needs to be carefully worded.
|
msg160444 - (view) |
Author: Thomas Kluyver (takluyver) * |
Date: 2012-05-11 19:31 |
OK, after a quick bit of reading, I see why I'm confused: the clipboard actually works by requesting the text from the source program, so where you copy it from makes a difference. In my case, copying from firefox gives 'abc\\u20ac', and copying from Geany gives u'abc\xe2\x82\xac'.
However, I still think there's something that can be improved in Python. As Serhiy points out, specifying type='UTF8_STRING' makes it work properly from both programs. The Tcl documentation recommends this as the best option for "modern X11 systems"[1].
From what Ned says, we can't make UTF8_STRING the default everywhere, but is there a way to detect if we're inside X11, and use UTF8_STRING by default there?
[1] http://www.tcl.tk/man/tcl/TkCmd/clipboard.htm
|
msg160450 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2012-05-11 21:02 |
There are definitely platform differences. As I noted, the original example works fine on Windows. However
>>> root.clipboard_get(type='STRING')
'abc€'
>>> root.clipboard_get(type='UTF8_STRING')
Traceback (most recent call last):
File "<pyshell#21>", line 1, in <module>
root.clipboard_get(type='UTF8_STRING')
File "C:\Programs\Python33\lib\tkinter\__init__.py", line 549, in clipboard_get
return self.tk.call(('clipboard', 'get') + self._options(kw))
_tkinter.TclError: CLIPBOARD selection doesn't exist or form "UTF8_STRING" not defined
Of course, on Windows I suspect that the unicode string is not copied to clipboard as utf8 bytes, so if clipboard contents are tagged, there would not be such a thing. Perhaps clipboards work differently on diffferent OSes.
>>> help(root.clipboard_get)
...
The type keyword specifies the form in which the data is
to be returned and should be an atom name such as STRING
or FILE_NAME. Type defaults to STRING.
(Actually, FILE_NAME give the same exception as UTF8_STRING.)
|
msg160451 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2012-05-11 21:19 |
Most likely the best way to determine the windowing system is to use the "tk windowingsystem" command (http://www.tcl.tk/man/tcl8.5/TkCmd/tk.htm#M10), so something like this:
root = tkinter.Tk()
root.call(('tk', 'windowingsystem'))
As documented, the call returns 'x11' for X11-based systems, 'win32' for Windows, and 'aqua' for the native OS X implementations.
|
msg160452 - (view) |
Author: Thomas Kluyver (takluyver) * |
Date: 2012-05-11 21:25 |
Thanks, Ned.
Does it seem like a good idea to test the windowing system like that, and default to UTF8_STRING if it's x11? So far, I've not found any case on X where STRING works but UTF8_STRING doesn't. If it seems reasonable, I'm happy to have a go at making a patch.
|
msg160456 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2012-05-11 22:02 |
A patch would be great. I don't have a strong opinion about the issue one way or another. I suppose it would simplify things for Python 3 users if the clipboard results were returned properly in the default case when no 'type' argument is passed to clipboard_get(). For Python 2, changing things seems a little more questionable but, as long as it was already returning a unicode object in that case, it sounds like a bug fix rather than a feature. Martin, Andrew: any opinions on this?
|
msg160486 - (view) |
Author: Thomas Kluyver (takluyver) * |
Date: 2012-05-12 17:21 |
Here's a patch that makes UTF8_STRING the default type for clipboard_get and selection_get when running in X11.
|
msg160545 - (view) |
Author: Andrew Svetlov (asvetlov) * |
Date: 2012-05-13 18:50 |
Patch looks good for me, works fine.
I think it can be applied to 2.7 as well.
There are only problem: I don't know how to make test for it without using external tools like xclip or ctypes bindings for X so library.
|
msg160548 - (view) |
Author: Thomas Kluyver (takluyver) * |
Date: 2012-05-13 18:55 |
Indeed, and there don't seem to be any other tests for the clipboard functionality.
|
msg160551 - (view) |
Author: Andrew Svetlov (asvetlov) * |
Date: 2012-05-13 19:04 |
You are right: there are no tests as well as for the most part of tkinter.
Why don't make it if possible?
|
msg160552 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-05-13 19:13 |
I'm skeptical about the patch. In both 2.7 and 3.x, clipboard_get returns a Unicode string, yet it fails to decode it properly. So I think this is the bug that ought to be fixed (using the proper encoding).
Defaulting to UTF8_STRING is a new feature, IMO, and shouldn't be done for 2.7 (or 3.2).
|
msg160555 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2012-05-13 19:23 |
Martin, is there a way for _tkinter to know whether the result returned from Tcl/Tk is an encoded string or not in this case?
With regard to the patch, it would be better to cache the results of the first-time call to get the windowingsystem value so that we don't have to make two calls down into Tcl for each clipboard_get.
|
msg160556 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-05-13 19:29 |
У пт, 2012-05-11 у 21:25 +0000, Thomas Kluyver пише:
> So far, I've not found any case on X where STRING works but UTF8_STRING doesn't.
Perhaps there will be problems with the old (very old) closed source
software.
A few years ago (in Debian Sarge) even xsel did not work with the
non-ascii strings.
|
msg160557 - (view) |
Author: Thomas Kluyver (takluyver) * |
Date: 2012-05-13 19:33 |
But the encoding used seemingly depends on the source application - Geany (GTK 2, I think) seemingly sends UTF8 text anyway, whereas Firefox escapes the unicode character. So I don't think we can correctly decode the STRING value in all cases.
The Tk documentation describes UTF8_STRING as being the "most useful" type on modern X11.
|
msg160559 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-05-13 19:38 |
> But the encoding used seemingly depends on the source application - Geany (GTK 2, I think) seemingly sends UTF8 text anyway, whereas Firefox escapes the unicode character. So I don't think we can correctly decode the STRING value in all cases.
Agree. Opera sends 'abc?' literally.
|
msg160560 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-05-13 19:40 |
> Martin, is that a way for _tkinter to know whether the result
> returned from Tcl/Tk is an encoded string or not in this case?
Off-hand, I don't know. I suppose there is a way to do this correctly,
but one might have to dig through many layers of software to find out
what that way is.
> With regard to the patch, it would be better to cache the results of
> the first-time call to get the windowingsystem value so that we don't
> have to make two calls down into Tcl for each clipboard_get.
That also.
|
msg160561 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-05-13 19:43 |
> But the encoding used seemingly depends on the source application -
> Geany (GTK 2, I think) seemingly sends UTF8 text anyway, whereas
> Firefox escapes the unicode character. So I don't think we can
> correctly decode the STRING value in all cases.
Ah, ok. IIUC, support for UTF8_STRING would also be in the realm of
the source application, right? If so, I think we should use something
more involved where we try UTF8_STRING first, and fall back to STRING
if the application doesn't support that.
This I could also accept for 2.7, since it "shouldn't" have a potential
for breakage.
|
msg160562 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2012-05-13 19:58 |
+1 to Martin's proposal
|
msg160563 - (view) |
Author: Thomas Kluyver (takluyver) * |
Date: 2012-05-13 19:59 |
OK, I'll produce an updated patch.
|
msg160569 - (view) |
Author: Thomas Kluyver (takluyver) * |
Date: 2012-05-13 20:21 |
As requested, the second version of the patch (x11-clipboard-try-utf8):
- Caches the windowing system per object. The tk call to find the windowing system is made the first time clipboard_get or selection_get are called without specifying `type=`.
- If using UTF8_STRING throws an error, it falls back to the default call with no type specified (i.e. STRING).
|
msg160571 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2012-05-13 20:34 |
Not to bikeshed here but I think it would be better to cache the windowingsystem value at the module level since I assume an application could be calling clipboard_get on different tkinter objects and I don't there is any possibility that the windowingsystem value could vary within one interpreter invocation.
|
msg160573 - (view) |
Author: Thomas Kluyver (takluyver) * |
Date: 2012-05-13 20:40 |
I'm happy to put the cache at the module level, but I'll give other people a chance to express their views before I dive into the code again.
I imagine most applications would only call clipboard_get() on one item, so it wouldn't matter. However, my own interest in this is from IPython, where we create a Tk object just to call clipboard_get() once, so a module level cache would be quicker, albeit only a tiny bit.
|
msg160576 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-05-13 20:49 |
> Not to bikeshed here but I think it would be better to cache the windowingsystem value at the module level since I assume an application could be calling clipboard_get on different tkinter objects and I don't there is any possibility that the windowingsystem value could vary within one interpreter invocation.
Why Misc.tk is not a module level variable?
|
msg160580 - (view) |
Author: Thomas Kluyver (takluyver) * |
Date: 2012-05-13 21:29 |
The 3rd revision of the patch has the cache at the module level. It's a bit awkward, because there's no module level function to call to retrieve it (as far as I know), so it's exposed by objects which can call Tk.
Also, serhiy pointed out a mistake in the 2nd revision, which is fixed ('selection' instead of 'clipboard').
|
msg160588 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2012-05-14 00:40 |
Serhiy, I don't know why Misc.Tk is not module level but it isn't so caching global attributes there isn't effective. However, upon further consideration, I take back my original suggestion of caching at the module level primarily because I can think of future scenarios where it might be possible that there are different windowing systems supported in the same Python instance. I now think the best solution is to cache at the Tk root object level; that appears to be a simple change to Thomas's 2nd revision. Sorry about that! Here is a fourth version (one for 3.x and one for 2.7) based on the second which includes the fix from the 3rd.
I started to write a simple test for the clipboard functions but then realized that there doesn't seem to be a practical way to effectively test in a machine-independent way without destroying the contents of the Tk clipboard and hence the user's desktop clipboard, not a friendly thing to do. For example, the clipboard might contain a data type not supported by the platform's Tk, like pict data on OS X. So I'm not including the test here but it did verify that the attribute was being properly cached across multiple tkinter objects.
Thanks to Thomas for the patch and to Serhiy for reviewing. By the way, Thomas, for your patch to be included, you should submit a PSF contributor agreement as described here: http://www.python.org/psf/contrib/. Once that is in place and if the patch looks good to everyone, I'll apply it.
|
msg160714 - (view) |
Author: Thomas Kluyver (takluyver) * |
Date: 2012-05-15 11:31 |
I've submitted the contributor agreement, though I've not yet heard anything back about it.
|
msg160716 - (view) |
Author: Thomas Kluyver (takluyver) * |
Date: 2012-05-15 11:43 |
...And mere minutes after I said I hadn't heard anything, I've got the confirmation email. :-)
|
msg160718 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-05-15 11:56 |
> ...And mere minutes after I said I hadn't heard anything, I've got the confirmation email. :-)
Congratulations!
|
msg160722 - (view) |
Author: Andrew Svetlov (asvetlov) * |
Date: 2012-05-15 12:38 |
I'm ok with last patch version.
|
msg160789 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2012-05-16 01:14 |
New changeset f70fa654f70e by Ned Deily in branch '2.7':
Issue #14777: In an X11 windowing environment, tkinter may return
http://hg.python.org/cpython/rev/f70fa654f70e
New changeset 41382250e5e1 by Ned Deily in branch '3.2':
Issue #14777: In an X11 windowing environment, tkinter may return
http://hg.python.org/cpython/rev/41382250e5e1
New changeset 97601cbf169f by Ned Deily in branch 'default':
Issue #14777: merge
http://hg.python.org/cpython/rev/97601cbf169f
|
msg160790 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2012-05-16 01:17 |
Applied for release in 2.7.4, 3.2.4 and 3.3.0. Thanks all!
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:30 | admin | set | github: 58982 |
2012-05-16 01:17:32 | ned.deily | set | status: open -> closed resolution: fixed messages:
+ msg160790
stage: patch review -> resolved |
2012-05-16 01:14:05 | python-dev | set | nosy:
+ python-dev messages:
+ msg160789
|
2012-05-15 12:38:32 | asvetlov | set | messages:
+ msg160722 |
2012-05-15 11:56:36 | serhiy.storchaka | set | messages:
+ msg160718 |
2012-05-15 11:43:57 | takluyver | set | messages:
+ msg160716 |
2012-05-15 11:31:06 | takluyver | set | messages:
+ msg160714 |
2012-05-14 00:40:46 | ned.deily | set | files:
+ x11-clipboard-try-utf8-4_27.patch |
2012-05-14 00:40:22 | ned.deily | set | files:
+ x11-clipboard-try-utf8-4.patch
messages:
+ msg160588 stage: patch review |
2012-05-13 21:29:47 | takluyver | set | files:
+ x11-clipboard-try-utf8-3.patch
messages:
+ msg160580 |
2012-05-13 20:49:51 | serhiy.storchaka | set | messages:
+ msg160576 |
2012-05-13 20:40:41 | takluyver | set | messages:
+ msg160573 |
2012-05-13 20:34:32 | ned.deily | set | messages:
+ msg160571 |
2012-05-13 20:21:00 | takluyver | set | files:
+ x11-clipboard-try-utf8.patch
messages:
+ msg160569 |
2012-05-13 19:59:43 | takluyver | set | messages:
+ msg160563 |
2012-05-13 19:58:21 | ned.deily | set | messages:
+ msg160562 |
2012-05-13 19:43:02 | loewis | set | messages:
+ msg160561 |
2012-05-13 19:40:24 | loewis | set | messages:
+ msg160560 |
2012-05-13 19:38:10 | serhiy.storchaka | set | messages:
+ msg160559 |
2012-05-13 19:33:39 | takluyver | set | messages:
+ msg160557 |
2012-05-13 19:29:56 | serhiy.storchaka | set | messages:
+ msg160556 |
2012-05-13 19:23:05 | ned.deily | set | messages:
+ msg160555 |
2012-05-13 19:22:51 | ned.deily | set | messages:
- msg160554 |
2012-05-13 19:22:24 | ned.deily | set | messages:
+ msg160554 |
2012-05-13 19:13:29 | loewis | set | messages:
+ msg160552 |
2012-05-13 19:04:32 | asvetlov | set | messages:
+ msg160551 |
2012-05-13 18:55:49 | takluyver | set | messages:
+ msg160548 |
2012-05-13 18:50:55 | asvetlov | set | messages:
+ msg160545 |
2012-05-12 17:21:21 | takluyver | set | files:
+ x11-clipboard-utf8.patch keywords:
+ patch messages:
+ msg160486
|
2012-05-11 22:02:24 | ned.deily | set | nosy:
+ loewis, asvetlov messages:
+ msg160456
|
2012-05-11 21:25:42 | takluyver | set | messages:
+ msg160452 |
2012-05-11 21:19:01 | ned.deily | set | messages:
+ msg160451 |
2012-05-11 21:02:01 | terry.reedy | set | messages:
+ msg160450 |
2012-05-11 19:31:32 | takluyver | set | messages:
+ msg160444 |
2012-05-11 19:24:14 | ned.deily | set | nosy:
+ ned.deily messages:
+ msg160441
|
2012-05-11 19:09:53 | takluyver | set | messages:
+ msg160440 |
2012-05-11 18:41:49 | serhiy.storchaka | set | messages:
+ msg160438 |
2012-05-11 16:39:00 | terry.reedy | set | nosy:
+ terry.reedy messages:
+ msg160419
|
2012-05-10 23:09:13 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka
messages:
+ msg160379 versions:
+ Python 3.3 |
2012-05-10 22:47:56 | takluyver | create | |