Issue1028
Created on 2007-08-26 20:48 by kbk, last changed 2009-09-16 11:40 by amaury.forgeotdarc.
|
msg55311 - (view) |
Author: Kurt B. Kaiser (kbk) |
Date: 2007-08-26 20:48 |
|
The control-spacebar binding is used in IDLE to
force open the completions window. It's causing
IDLE to exit with a utf8 decode error. Attached
is
a Tkinter cut-down
exhibiting the problem and a patch.
The cutdown runs ok on 2.6 but not on py3k because
the latter uses PyUnicode_FromString on all the
arguments and errs out when it encounters a
character outside the utf-8 range.
Strangely, on my system, control-spacebar is
sending a two byte
string, C0E8 via the %A parameter. Control-2
does
the same. Other keys with combinations of
modifier
keys send one byte.
Linux trader 2.6.18-ARCH #1 SMP PREEMPT Sun Nov
19
09:14:35 CET 2006 i686 Intel(R) Pentium(R) 4 CPU
2.40GHz GenuineIntel GNU/Linux
Can the problem be confirmed?
Using PyUnicode_FromUnicode on %A works because
the
unicode string is copied instead of decoded, and
that parameter is supposed to be unicode, in any
case.
The patch fixes the problem on my system but
should
be reviewed, especially whether the cast in the
call
to PyUnicode_FromUnicode is suitably cross-
platform.
Assigning to Neal since he's working a lot of
Unicode issues right now. I can check it in if I
get approval.
|
|
msg55312 - (view) |
Author: Kurt B. Kaiser (kbk) |
Date: 2007-08-26 20:53 |
|
Heh, I see we have the same damn problem SF had: when a comment is
edited,
it doesn't re-wrap properly when submitted. You have to remove the
returns
manually after editing.
|
|
msg55313 - (view) |
Author: Kurt B. Kaiser (kbk) |
Date: 2007-08-26 20:54 |
|
Nope, you have to make sure not to type too wide.
|
|
msg55321 - (view) |
Author: Kurt B. Kaiser (kbk) |
Date: 2007-08-26 23:18 |
|
Well, maybe someday Tk will send a multibyte unicode
character. Update the patch.
|
|
msg55325 - (view) |
Author: Neal Norwitz (nnorwitz) |
Date: 2007-08-26 23:40 |
|
I can confirm the problem and that your patch fixes the problem. Go
ahead and check it in. Thanks!
|
|
msg55330 - (view) |
Author: Kurt B. Kaiser (kbk) |
Date: 2007-08-27 01:57 |
|
OK, thanks for the review! I suppose Tk is sending a bad string.
r57540
|
|
msg75999 - (view) |
Author: Hirokazu Yamamoto (ocean-city) |
Date: 2008-11-18 04:38 |
|
Sorry, I reverted r57540 because it caused segfault at IDLE exit. (See
issue4313) I reopened this issue.
|
|
msg76032 - (view) |
Author: Guilherme Polo (gpolo) |
Date: 2008-11-19 00:34 |
|
I can reproduce it here with tk8.4, using tk8.5 doesn't cause this.
|
|
msg76034 - (view) |
Author: Guilherme Polo (gpolo) |
Date: 2008-11-19 01:15 |
|
Here is a patch that doesn't use magic numbers :P
I didn't hit the problem described in issue4313 with this one, and
PythonCmd should be doing this anyway, but ideally we should move to
Tcl_CreateObjCommand.
|
|
msg76035 - (view) |
Author: Guilherme Polo (gpolo) |
Date: 2008-11-19 01:19 |
|
Removed some repeated code in the patch
|
|
msg76041 - (view) |
Author: Hirokazu Yamamoto (ocean-city) |
Date: 2008-11-19 06:01 |
|
I confirmed PythonCmd_check_for_utf.diff worked on my machine. IDLE
didn't crash.
>I can reproduce it here with tk8.4, using tk8.5 doesn't cause this.
That is, this is a bug of tk8.4, and solved in tk8.5 which is already
stable release? If so, I feel python don't have to workaround this bug.
# I'm a little worry about performance because Tcl_NumUtfChars() will be
called for every command string.
By the way, I cannot reproduce this bug with tk8.4.12(on windows). What
is your tk version? Maybe older than that?
|
|
msg76043 - (view) |
Author: Guilherme Polo (gpolo) |
Date: 2008-11-19 10:27 |
|
tk 8.4.19 here, but windows and linux almost surely uses different
window managers (you could run gnome and others under windows, but I'm
betting it is not the case).
Now, it is very hard to say that we shouldn't care about this bug here.
Tcl has it documented that its string arguments to Tcl_CmdProc are
encoded in normalized utf-8 since tcl 8.1 which was released almost 10
years ago. I guess we are just luck that this was the first time the bug
was noticed.
It also says that Tcl_CreateCommand shouldn't be used anymore, instead
Tcl_CreateObjCommand should be used like I said in the previous comment.
|
|
msg76164 - (view) |
Author: Hirokazu Yamamoto (ocean-city) |
Date: 2008-11-21 08:19 |
|
I suceeded to reproduce this issue with coLinux + UltraVNC on Win2000.
Yes, py3k claimed utf-8 error, so I tried trunk. Here is result.
*** event.keycode: 8
*** event.state: 0
*** event.char: ''
*** event.keycode: 16
*** event.state: 4
*** event.char: '\xc0\x80'
This '\xc0\x80' seems to be used in tcl as null byte '\0'. You can see
this magic value in tcl source and google.
I think we should convert this to '\x00' at python side. (shouldn't
treat this as utf-16)
I can see py3k + adhok.patch can output this result.
*** event.keycode: 8
*** event.state: 0
*** event.char: ''
*** event.keycode: 16
*** event.state: 4
*** event.char: '\x00'
Probably Tcl_GetUnicode does this conversion inside. (I'm not sure,
because I didn't look into source code so deeply) And I'm not sure why
this error doesn't happen with tk8.5.
|
|
msg76165 - (view) |
Author: Hirokazu Yamamoto (ocean-city) |
Date: 2008-11-21 08:21 |
|
I did little modification to tkintertest.py. Please use this line.
my_print("*** event.char: ", repr(event.char))
|
|
msg76173 - (view) |
Author: Guilherme Polo (gpolo) |
Date: 2008-11-21 10:36 |
|
You are missing the point on using Tcl_CreateObjCommand, I didn't mean
to just go and and do s/Tcl_CreateCommand/Tcl_CreateObjCommand/ because
if you are going to convert everything to unicode then there is no point
in using Tcl_CreateObjCommand.
Also, Tcl_ObjCmdProc should use Tcl_Obj *CONST objv[] instead of Tcl_Obj
*const objv[] because Tcl may define CONST as nothing, and it uses CONST
when defining Tcl_ObjCmdProc.
|
|
msg76178 - (view) |
Author: Guilherme Polo (gpolo) |
Date: 2008-11-21 13:37 |
|
I'm sorry if it sounded like I were bashing you, I was just pointing out
my view of the patch -- you didn't need to remove it. The patch I
submitted here can also be improved (although it "works"), but I'm
leaving it as a possible idea for someone else that might look into
this, since I can't invest much time into this right now.
|
|
msg76179 - (view) |
Author: Hirokazu Yamamoto (ocean-city) |
Date: 2008-11-21 13:40 |
|
>You are missing the point on using Tcl_CreateObjCommand, I didn't mean
>to just go and and do s/Tcl_CreateCommand/Tcl_CreateObjCommand/ because
>if you are going to convert everything to unicode then there is no
>point in using Tcl_CreateObjCommand.
I'm not tcl/tk expert, so probably missng many things. :-(
Can you explain how to solve this issue by moving to Tcl_CreateObjCommand?
>Also, Tcl_ObjCmdProc should use Tcl_Obj *CONST objv[] instead of
>Tcl_Obj *const objv[] because Tcl may define CONST as nothing, and it
>uses CONST when defining Tcl_ObjCmdProc.
I created adhok.patch just for explanation. This is not solution. I used
Tcl_CreateObjCommand + Tcl_GetUnicode to demonstrate Tcl converts
'\xc0\x80' to null byte. (adhok.patch contained Japanese characters, so
I'll repost that as just_for_explanation.patch)
|
|
msg76180 - (view) |
Author: Guilherme Polo (gpolo) |
Date: 2008-11-21 14:03 |
|
> Hirokazu Yamamoto added the comment:
>
>>You are missing the point on using Tcl_CreateObjCommand, I didn't mean
>>to just go and and do s/Tcl_CreateCommand/Tcl_CreateObjCommand/ because
>>if you are going to convert everything to unicode then there is no
>>point in using Tcl_CreateObjCommand.
>
> I'm not tcl/tk expert, so probably missng many things. :-(
> Can you explain how to solve this issue by moving to Tcl_CreateObjCommand?
>
By moving to Tcl_CreateObjCommand we would start using the FromObj
function present in _tkinter.c that is responsible for converting tcl
objects to python objects. Then what remains to be verified is how
compatible this would be with current tkinter code, and checking how
correct FromObj is nowadays.
|
|
msg76701 - (view) |
Author: Guilherme Polo (gpolo) |
Date: 2008-12-01 17:45 |
|
Some more clarifications about this bug:
Tcl shouldn't be giving us a UTF-8 string with a 0xC0 byte, since that
is not valid UTF-8. I'm aware that Tcl uses the sequence 0xC0 0x80 for
special purposes but it is also said that such sequences shouldn't be
passed as is when exported.
This bug doesn't affect python 2.x because it uses PyString_FromString
to convert such value to a Python string, where python 3.x uses
PyUnicode_FromString which assumes that it is receiving a valid utf-8
string but it turns out that is not always the case here.
It is indeed related to tk 8.4, but not sure which ones exactly (I hit
it with tk 8.4.19).
|
|
msg76846 - (view) |
Author: Guilherme Polo (gpolo) |
Date: 2008-12-03 21:48 |
|
I've been working on a new _tkinter (named it as "plumage") these days
and I hit this same problem for trusting too much that nothing from tcl,
including tk and extensions, would give me this embedded null.
Checking another bridge to Tcl (one done for Perl) it is possible to
notice that it also chose to verify for these bytes and convert them to
something else, a 0. The code for this for Python can be found at
http://code.google.com/p/plumage/source/browse/trunk/src/utils.c#42 up
to line 76, it could/should be adapted to the _tkinter in py3k and also
for python 2.x.
|
|
msg77209 - (view) |
Author: (gumpy) |
Date: 2008-12-07 03:07 |
|
This problem exists for me on Ubuntu8.04 with both tk/tcl8.4.16 and 8.5.
|
|
msg81060 - (view) |
Author: Guilherme Polo (gpolo) |
Date: 2009-02-03 14:07 |
|
Can you tell what:
print(tkinter.Tcl().tk.call('info', 'patchlevel'))
prints ? Specifically to know which tk 8.5.x has the problem.
|
|
msg81989 - (view) |
Author: (gumpy) |
Date: 2009-02-14 04:06 |
|
8.5.0
This is still an issue with both tk versions in the 3.0.1 python release.
|
|
msg90665 - (view) |
Author: Ezio Melotti (ezio.melotti) |
Date: 2009-07-18 09:34 |
|
More users reported this problem in #6144 and #6512.
|
|
msg90707 - (view) |
Author: Winfried Plappert (wplappert) |
Date: 2009-07-19 07:19 |
|
I have the problem described in issue6512 and here is some information
Python version - hand compiled on Ubuntu 9.04:
Python 3.1 (r31:73572, Jul 18 2009, 11:13:40)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tkinter
>>> print(tkinter.Tcl().tk.call('info', 'patchlevel'))
8.5.6
The previous Tcl/Tk version I had initially installed was a 8.4.x
version - which fails on Control-2 error described in detail in issue6512.
- I decided to upgrade Tcl/Tk to version 8.5.
- So I made a "make distclean"
- copied the contents of /usr/include/tcl85 one level higher, so Python
could access the necessary tk.h and tcl.h files
- cd to my Python 3.1 source
- ./configure
- make
- sudo make install
and tested again and my test program is now happily responding to a
Control-2 keystroke.
|
|
msg91405 - (view) |
Author: Guilherme Polo (gpolo) |
Date: 2009-08-07 14:41 |
|
Attaching a patch against trunk, I believe this solves the problems
described here.
|
|
msg91408 - (view) |
Author: Guilherme Polo (gpolo) |
Date: 2009-08-07 17:35 |
|
Uhm, in the long run I believe it will be better to move to
Tcl_CreateObjCommand since it is said that commands created by it are
significantly faster than the ones created by Tcl_CreateCommand (more
information about this can be found at tcl documentation).
I'm only writing this because, as other places that deal with tcl
objects, more care must be taken. For instance, I have applied the
issue1028.diff on the tk_and_idle_maintenance branch and found two
problems that are now patched by adjusts1.diff. It is very likely that
there are other bugs around, I'll be trying some tkinter applications to
try to find some of them but help is very much needed. Note that there
are some tkinter tests on this tk_and_idle_maintenance and they all
pass, but they do not fully cover tkinter at this moment so improving
them would be good too.
|
|
msg91420 - (view) |
Author: Guilherme Polo (gpolo) |
Date: 2009-08-08 15:43 |
|
Today I noticed the StringObj manpage (from tcl) says that the bytes
that represent an tcl object should be treated as read-only (although it
uses char *) so this issue1028.diff may very well cause a segfault at
some point.
I'm attaching a new patch that fixes this and also uses
Tcl_GetStringFromObj, instead of directly accessing the bytes member of
a tcl object, so we know its string representation is not invalid.
|
|
msg92678 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) |
Date: 2009-09-16 11:40 |
|
Isn't this better implemented via a codec error handler?
|
|
| Date |
User |
Action |
Args |
| 2009-09-16 11:40:08 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages:
+ msg92678
|
| 2009-08-08 15:43:02 | gpolo | set | files:
+ issue1028_2.diff
messages:
+ msg91420 |
| 2009-08-07 17:35:08 | gpolo | set | files:
+ adjusts1.diff
messages:
+ msg91408 |
| 2009-08-07 14:41:17 | gpolo | set | files:
+ issue1028.diff
messages:
+ msg91405 versions:
+ Python 2.6, Python 2.7 |
| 2009-07-19 07:19:40 | wplappert | set | nosy:
+ wplappert
messages:
+ msg90707 versions:
+ Python 3.1 |
| 2009-07-18 09:39:51 | ezio.melotti | link | issue6512 superseder |
| 2009-07-18 09:39:35 | ezio.melotti | link | issue6144 superseder |
| 2009-07-18 09:38:41 | ezio.melotti | set | superseder: [IDLE] UnicodeDecodeError when invoking force-open-completions -> |
| 2009-07-18 09:36:32 | ezio.melotti | set | superseder: [IDLE] UnicodeDecodeError when invoking force-open-completions |
| 2009-07-18 09:34:55 | ezio.melotti | set | priority: normal
nosy:
+ ezio.melotti messages:
+ msg90665
type: behavior |
| 2009-02-14 04:06:27 | gumpy | set | messages:
+ msg81989 |
| 2009-02-03 14:07:36 | gpolo | set | messages:
+ msg81060 |
| 2008-12-07 03:07:25 | gumpy | set | nosy:
+ gumpy messages:
+ msg77209 |
| 2008-12-03 21:48:43 | gpolo | set | messages:
+ msg76846 |
| 2008-12-01 17:45:17 | gpolo | set | messages:
+ msg76701 |
| 2008-11-21 14:03:07 | gpolo | set | messages:
+ msg76180 |
| 2008-11-21 13:40:05 | ocean-city | set | files:
+ just_for_explanation.patch messages:
+ msg76179 |
| 2008-11-21 13:37:39 | gpolo | set | messages:
+ msg76178 |
| 2008-11-21 13:31:02 | ocean-city | set | files:
- adhok.patch |
| 2008-11-21 10:36:01 | gpolo | set | messages:
+ msg76173 |
| 2008-11-21 08:21:54 | ocean-city | set | messages:
+ msg76165 |
| 2008-11-21 08:19:11 | ocean-city | set | files:
+ adhok.patch messages:
+ msg76164 |
| 2008-11-19 10:27:07 | gpolo | set | messages:
+ msg76043 |
| 2008-11-19 06:01:21 | ocean-city | set | messages:
+ msg76041 |
| 2008-11-19 01:19:34 | gpolo | set | files:
+ PythonCmd_check_for_utf.diff messages:
+ msg76035 |
| 2008-11-19 01:19:15 | gpolo | set | files:
- PythonCmd_check_for_utf.diff |
| 2008-11-19 01:15:36 | gpolo | set | files:
+ PythonCmd_check_for_utf.diff messages:
+ msg76034 |
| 2008-11-19 00:34:18 | gpolo | set | nosy:
+ gpolo messages:
+ msg76032 |
| 2008-11-18 04:38:29 | ocean-city | set | status: closed -> open nosy:
+ ocean-city resolution: accepted -> messages:
+ msg75999 |
| 2008-01-06 22:29:45 | admin | set | keywords:
- py3k versions:
Python 3.0 |
| 2007-09-10 21:28:35 | loewis | link | issue1774736 superseder |
| 2007-08-27 01:57:04 | kbk | set | status: open -> closed messages:
+ msg55330 |
| 2007-08-26 23:40:38 | nnorwitz | set | assignee: nnorwitz -> kbk resolution: accepted messages:
+ msg55325 nosy:
+ nnorwitz |
| 2007-08-26 23:18:36 | kbk | set | files:
+ _tkinter.c.patch messages:
+ msg55321 |
| 2007-08-26 23:17:33 | kbk | set | files:
- _tkinter.c.patch |
| 2007-08-26 20:54:14 | kbk | set | messages:
+ msg55313 |
| 2007-08-26 20:53:20 | kbk | set | files:
+ _tkinter.c.patch messages:
+ msg55312 |
| 2007-08-26 20:48:31 | kbk | create | |
|