Issue13153
Created on 2011-10-11 20:01 by JBernardo, last changed 2013-02-18 11:08 by serhiy.storchaka. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| tkinter_nobmp_error.patch | serhiy.storchaka, 2012-12-20 12:57 | review | ||
| Messages (33) | |||
|---|---|---|---|
| msg145363 - (view) | Author: João Bernardo (JBernardo) | Date: 2011-10-11 20:01 | |
I was playing with some unicode chars on Python 3.2 (x64 on Windows 7), but when pasted a char bigger than 0xFFFF, IDLE crashes without any error message. Example (works fine): >>> '\U000104a2' '𐒢' But, if I try to paste the above char, the window will instantly close. The interpreter uses 2-bytes per char (UTF-16) and I don't know if that's causing the problem (as side note, why don't the default Windows build uses 4-bytes char?). I can't check now with my Ubuntu install (UTF-32) if the problem persists. |
|||
| msg145366 - (view) | Author: Ned Deily (ned.deily) * ![]() |
Date: 2011-10-11 21:06 | |
This is related to Issue12342. The problem is that Tcl/Tk 8.5 (and earlier) do not support Unicode code points outside the BMP range as in this example. So IDLE will be unable to display such characters but it should not crash either. |
|||
| msg145369 - (view) | Author: João Bernardo (JBernardo) | Date: 2011-10-11 21:54 | |
@Ned That looks like a bit different case. IDLE *can* print the char after you entered the '\Uxxxxxxxx' version of it. It doesn't accept you to paste those caracters... |
|||
| msg145573 - (view) | Author: Terry J. Reedy (terry.reedy) * ![]() |
Date: 2011-10-15 00:01 | |
The current Windows build used 2-byte unicode chars because that is what Windows does. In 3.3, all builds will use a new unicode implementation that uses 1,2,or4 bytes as needed. But I suspect we will still have the paste problem unless we can somehow bypass the tk limitation. Printing a Python string to the screen does not seem to involve conversion to a tk string. Or else tk blindly copies surrogate pairs to Windows even though it cannot create them. In any case, true window-closing crashes (as opposed to an error traceback) are obnoxious bugs that we try to fix if possible. I verified this on my 64-bit Win 7 system. Thanks for the report. Feel free to look into the code if you can. |
|||
| msg145580 - (view) | Author: João Bernardo (JBernardo) | Date: 2011-10-15 04:03 | |
Just for comparison, on Python 2.7.1 (x32 on Windows 7) it's possible to paste the char (but can't use it) and a nice error is given.
>>> u'𐒢'
Unsupported characters in input
So the problem was partially solved but something might have happened with the 3.x port...
Searching on both source codes, I can see the following block was commented on Python3.2 but not on Python2.7 (Maybe someone removed someone else's bug fix?) and an `assert` was added.
#--- Lines 605 to 613 of PyShell.py
assert isinstance(source, str)
# v-- on Python2.7 it is types.UnicodeType instead
#if isinstance(source, str):
# from idlelib import IOBinding
# try:
# source = source.encode(IOBinding.encoding)
# except UnicodeError:
# self.tkconsole.resetoutput()
# self.write("Unsupported characters in input\n")
# return
I uncommented those lines, removed the `assert` and deleted __pycache__ for fresh bytecode but the error keeps happening.
This function `runsource()` is only called after the return key is pressed so the bug was introduced on another part of the program.
I'll search further but it's hard to do that without traceback of the error.
(Maybe `runit()` is the problem because it seems to build the line and call `runsource(line)`)
------
PS: @Terry Reedy
That looks nice to have different lengths for chars but what will be the impact on performance? Indexing will still be in constant time?
|
|||
| msg145581 - (view) | Author: João Bernardo (JBernardo) | Date: 2011-10-15 04:33 | |
Just to complete my monologue:
Here's the traceback from running IDLE in cmd line.
C:\Python32\Lib\idlelib>python -i idle.py
Traceback (most recent call last):
File "idle.py", line 11, in <module>
idlelib.PyShell.main()
File "C:\Python32\Lib\idlelib\PyShell.py", line 1429, in main
root.mainloop()
File "C:\Python32\Lib\tkinter\__init__.py", line 1009, in mainloop
self.tk.mainloop(n)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2: invalid continuation byte
Not much meaningful but is better than nothing... Looks like some traceback is missing, and this one points to tkinter.
|
|||
| msg145584 - (view) | Author: Terry J. Reedy (terry.reedy) * ![]() |
Date: 2011-10-15 05:48 | |
[Yes, indexing will still be O(1), though I personally consider that less important than most make it to be. Consistency across platforms and total time and space performance of typical apps should be the concern. There is ongoing work on improving the new implementation. Some operations already take less space and run faster.] The traceback may very well be helpful. It implies that copying a supplemental char does not produce proper utf-8 encoded bytes. Or if it does, tkinter (or tk underneath it) does not recognize them. But then the problem should be the initial byte, not the continuation bytes, which are the same for all chars and which all have 10 for their two high order bits. See https://secure.wikimedia.org/wikipedia/en/wiki/Utf-8 for a fuller explanation. Line 1009 is the definition of Misc.mainloop(). I believe self.tk represents the embedded tcl interpreter, which is a black box from Python's viewpoint. Perhaps we should wrap the call with try: self.tk.mainloop(n) except Exception as e: <print error message with all info attached to e before exiting> This should catch any miscellaneous crashes which are not otherwise caught and maybe turn the crash issues into bug reports -- the same way that running from the command line did. (It will still be good to catch what we can at error sites and give better, more specific messages.) (What I am not familiar with is how the command line interpreter might turn a tcl error into a python exception and why IDLE does not.) When I copy '𐒢' and paste into the command line interpreter or Notepad++, I get '??'. I am guessing that ?? represent a surrogate pair and that Windows separately encodes each. The result would be 'illegal' utf-8 with an illegal continuation chars. An application can choose to decode the 'illegal' utf-8 -- or not. Python can when "errors='surrogate_escape" (or something like that) is specified. It might be possible to access the raw undecoded bytes of the clipboard with the third party pythonwin module. I do not know if there is anyway to do so with tk. I wonder if tcl is calling back to Python for decoding and whether there was a change in the default for errors or the callback specification that would explain a change from 2.7 to 3.2. Ezio, do you know anything about these speculations? |
|||
| msg145585 - (view) | Author: Ned Deily (ned.deily) * ![]() |
Date: 2011-10-15 10:08 | |
Thanks for the additional investigation. You don't see more in the traceback because the exception is occurring in the _tkinter C glue layer. I am able to reproduce the problem on some other platforms as well (e.g. Python 3.x on OS X with Carbon Tk 8.4). More later. |
|||
| msg145605 - (view) | Author: Ezio Melotti (ezio.melotti) * ![]() |
Date: 2011-10-15 20:25 | |
> Ezio, do you know anything about these speculations?
Assuming that the non-BMP character is represented with two surrogates (\ud801\udca2) and that _tkinter tries to decode them independently, the error message ("invalid continuation byte") would be correct.
Python 2 UTF-8 codec is more permissive and allows encoding/decoding of surrogates (this might also explain why it works on Python 2):
>>> u'\ud801'.encode('utf-8')
'\xed\xa0\x81'
>>> '\xed\xa0\x81'.decode('utf-8')
u'\ud801'
But on Python 3, trying to decode that results in an error:
>>> b'\xed\xa0\x81'.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: invalid continuation byte
> But then the problem should be the initial byte, not the continuation
> bytes, which are the same for all chars and which all have 10 for
> their two high order bits.
While it's true that all continuation bytes have the first two bits equal to '10', the opposite is not always true. Some start bytes have additional restrictions on the continuation bytes. For example, even if the first two bits of 0xA0 (0b10100000) are '10', the valid continuation bytes for a sequence starting with 0xED are restricted to the range 80..9F.
The fact that
>>> '\U000104a2'
'𐒢'
works is because the input is all ASCII, so the decoding doesn't fail.
> [...]
> This should catch any miscellaneous crashes which are not otherwise
> caught and maybe turn the crash issues into bug reports -- the same
> way that running from the command line did.
Having some "safe net" to catch all the unhandled exceptions seems like a good idea. This won't work in case of segfaults, but it's still better than nothing. I'm not sure what you mean with "turn them into bug reports" though.
|
|||
| msg145607 - (view) | Author: Ezio Melotti (ezio.melotti) * ![]() |
Date: 2011-10-15 20:48 | |
This can also be reproduced by doing:
>>> print('\U000104a2'[0])
and then copy/pasting the lone surrogate.
The traceback is:
[...]
File "C:\Programs\Python32\Lib\tkinter\__init__.py", line 1009, in mainloop
self.tk.mainloop(n)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: invalid continuation byte
|
|||
| msg145611 - (view) | Author: Terry J. Reedy (terry.reedy) * ![]() |
Date: 2011-10-16 03:45 | |
> I'm not sure what you mean with "turn them into bug reports" though. In about the last month, there have been, I think, 4 reports about IDLE crashing (quitting unexpectedly with no error traceback). I would consider it preferable if it quit with an error traceback that gave as much info as available, or if there is none, just said "IDLE has met an unexpected problem.", perhaps followed by something like "Pleaee note the circumstances and make a report of the tracker is there is none already." |
|||
| msg145616 - (view) | Author: Ezio Melotti (ezio.melotti) * ![]() |
Date: 2011-10-16 07:18 | |
> I would consider it preferable if it quit Note that if we catch the error there might be no reason for IDLE to quit (unless the error left IDLE in some invalid state). > with an error traceback that gave as much info as available, That might scare newbies away. > or if there is none, just said "IDLE has met an unexpected problem.", So this might be better for all the cases. > perhaps followed by something like "Pleaee note the circumstances and > make a report of the tracker is there is none already." The first message could offer a "Report the problem" option that links to the tracker. In theory we could also have a way to auto-fill the tracker issue, but that might lead to duplicates. |
|||
| msg145635 - (view) | Author: Ned Deily (ned.deily) * ![]() |
Date: 2011-10-16 19:40 | |
Just to be sure we're talking about the same thing here, my understanding is that the "missing traceback" issues referred to here are only an issue when IDLE is run as a stand-alone GUI program, such as can be done on Windows and with the OS X IDLE.app. In that case, the standard Python tracebacks from the interpreter written to stderr are not readily visible to the user. In the OS X IDLE.app case it does get captured in a system log. I'm not sure if that happens anywhere in the Windows cases. If IDLE is started from a terminal window or console window where stderr is displayed, this is not an issue. But I think further discussion about proposed improvements to IDLE diagnostics could be useful but it is not germane to the specific bug here. It should be carried out elsewhere, possibly resulting in a feature request. |
|||
| msg155799 - (view) | Author: Ned Deily (ned.deily) * ![]() |
Date: 2012-03-14 21:42 | |
Reassigning to Andrew to investigate solution similar to the one used in Issue14200. |
|||
| msg155801 - (view) | Author: Ned Deily (ned.deily) * ![]() |
Date: 2012-03-14 21:43 | |
(Oops, wrong assignment!) |
|||
| msg155810 - (view) | Author: Roger Serwy (roger.serwy) * ![]() |
Date: 2012-03-14 22:21 | |
Issue13582 deals with the IDLE error feedback. |
|||
| msg155814 - (view) | Author: Roger Serwy (roger.serwy) * ![]() |
Date: 2012-03-14 22:37 | |
Issue14200 has a patch to fix this problem. |
|||
| msg155857 - (view) | Author: Andrew Svetlov (asvetlov) * ![]() |
Date: 2012-03-15 04:46 | |
Not sure. Let me to investigate the problem deeper. |
|||
| msg177750 - (view) | Author: Chris Angelico (Rosuav) | Date: 2012-12-19 15:59 | |
I'm experiencing a similar issue. Fresh install of 3.3 today from the .msi installer on the web site, identifies itself as: Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (Intel)] on win32 To reproduce: Copy and paste this character into IDLE. 𝐇 C:\Python33>.\python -m idlelib.idle Traceback (most recent call last): File "C:\Python33\lib\runpy.py", line 160, in _run_module_as_main "__main__", fname, loader, pkg_name) File "C:\Python33\lib\runpy.py", line 73, in _run_code exec(code, run_globals) File "C:\Python33\lib\idlelib\idle.py", line 11, in <module> idlelib.PyShell.main() File "C:\Python33\lib\idlelib\PyShell.py", line 1477, in main root.mainloop() File "C:\Python33\lib\tkinter\__init__.py", line 1038, in mainloop self.tk.mainloop(n) UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-2: invalid continuation byte (Incidentally, there appears to be a slight difference depending on whether I copy the character in Chrome or Firefox. IDLE terminates the same way, but a Latin-1 app sees the character from Firefox as a letter, but the same thing from Chrome is two question marks (presumably the surrogates).) |
|||
| msg177778 - (view) | Author: Terry J. Reedy (terry.reedy) * ![]() |
Date: 2012-12-19 20:43 | |
Same on 64-bit 3.3. Changed title since 3.3 is no longer '16-bit build'. |
|||
| msg177812 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2012-12-20 12:57 | |
A simplest solution is to raise a TclError instead of ValueError for non-BMP characters. This should not break any existing code, because a user code should be ready to catch a TclError in any case. Here is a patch. A more complicated solution is to add ValueError to any catch of TclError. And this will fix only IDLE, user programs should fix self every. Also we can silently encode non-BMP characters for Tcl with UTF-16 (and decode a result back). This can cause some subtle errors with shifted indices however. |
|||
| msg179276 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2013-01-07 19:11 | |
Ping. |
|||
| msg182106 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2013-02-14 16:06 | |
Ping. |
|||
| msg182130 - (view) | Author: Ned Deily (ned.deily) * ![]() |
Date: 2013-02-15 08:10 | |
LGTM. The patch does prevent the crash in IDLE which is certainly an improvement until such time as someone investigates having Tk/tkinter fully support non-BMP characters. |
|||
| msg182141 - (view) | Author: Ramchandra Apte (Ramchandra Apte) * | Date: 2013-02-15 13:21 | |
@Ned Deily Tk, at least on my system, doesn't render Unicode characters, even within BMP correctly but the characters are kept (cut-and-paste works correctly) What you mean by "support". |
|||
| msg182166 - (view) | Author: Terry J. Reedy (terry.reedy) * ![]() |
Date: 2013-02-15 20:27 | |
The characters tk can render depends on the font you tell it to use. On my Windows IDLE, I have Options Font Face set to Lucida Sans Unicode, though I am not sure what has the widest coverage. This page https://www.microsoft.com/typography/fonts/font.aspx?FMID=1263 only mentions West Asian, but I seem to get more than that. |
|||
| msg182172 - (view) | Author: Ezio Melotti (ezio.melotti) * ![]() |
Date: 2013-02-15 20:48 | |
The font used shouldn't affect the errors. Usually if a glyph is missing in the current font, either a placeholder (usually a box) is showed instead or the missing glyph is taken from another font (if possible). If you still want to do some tests, you can take a look at http://en.wikipedia.org/wiki/List_of_Unicode_fonts#Unicode_fonts |
|||
| msg182180 - (view) | Author: Ned Deily (ned.deily) * ![]() |
Date: 2013-02-15 21:56 | |
Also, there are differences in behavior among the various flavors of Tk. I know of at least four main flavors in use by current Python builds: Unix X11-based Tk 8.5, Windows Tk 8.5, OS X Cocoa Tk 8.5, OS X Carbon Tk 8.4. Some third-party distributors are starting to supply Tk 8.6, in its various flavors, now that 8.6 has been released. Each flavor has various build options and features to fit in with its host o/s environment. This makes testing tkinter issues *interesting*. |
|||
| msg182181 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2013-02-15 22:03 | |
Tkinter is not compatible with Tcl/Tk 8.6 yet (issue16809). |
|||
| msg182184 - (view) | Author: Ned Deily (ned.deily) * ![]() |
Date: 2013-02-15 22:11 | |
Serhiy, I'm aware of that; regardless, Tk 8.6 is starting to be used out in the field with tkinter. |
|||
| msg182207 - (view) | Author: Ramchandra Apte (Ramchandra Apte) * | Date: 2013-02-16 03:21 | |
I have set it to "Ubuntu", which supports the Unicode characters. Maybe Tkinter doesn't work properly with all the fonts. On 16 February 2013 01:57, Terry J. Reedy <report@bugs.python.org> wrote: > > Terry J. Reedy added the comment: > > The characters tk can render depends on the font you tell it to use. On my > Windows IDLE, I have Options Font Face set to Lucida Sans Unicode, though I > am not sure what has the widest coverage. This page > https://www.microsoft.com/typography/fonts/font.aspx?FMID=1263 > only mentions West Asian, but I seem to get more than that. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue13153> > _______________________________________ > |
|||
| msg182305 - (view) | Author: Ned Deily (ned.deily) * ![]() |
Date: 2013-02-18 08:30 | |
Serhiy, I think your patch is ready to commit and close this issue as it prevents the crash. A test would be nice if a reliable test could be devised without too much effort but it's not mandatory, IMO. Any tangential issues or more complex solutions can be pursued in other issues. |
|||
| msg182312 - (view) | Author: Roundup Robot (python-dev) | Date: 2013-02-18 11:07 | |
New changeset bb5a8564e186 by Serhiy Storchaka in branch '2.7': Issue #13153: Tkinter functions now raise TclError instead of ValueError when http://hg.python.org/cpython/rev/bb5a8564e186 New changeset 9904f245c3f0 by Serhiy Storchaka in branch '3.2': Issue #13153: Tkinter functions now raise TclError instead of ValueError when http://hg.python.org/cpython/rev/9904f245c3f0 New changeset 38bb2a46692e by Serhiy Storchaka in branch '3.3': Issue #13153: Tkinter functions now raise TclError instead of ValueError when http://hg.python.org/cpython/rev/38bb2a46692e New changeset 61993bb9ab0e by Serhiy Storchaka in branch 'default': Issue #13153: Tkinter functions now raise TclError instead of ValueError when http://hg.python.org/cpython/rev/61993bb9ab0e |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2013-02-18 11:08:48 | serhiy.storchaka | set | status: open -> closed assignee: asvetlov -> serhiy.storchaka resolution: fixed stage: commit review -> committed/rejected |
| 2013-02-18 11:07:07 | python-dev | set | nosy:
+ python-dev messages: + msg182312 |
| 2013-02-18 08:30:08 | ned.deily | set | messages:
+ msg182305 stage: test needed -> commit review |
| 2013-02-16 03:21:49 | Ramchandra Apte | set | messages: + msg182207 |
| 2013-02-15 22:11:14 | ned.deily | set | messages: + msg182184 |
| 2013-02-15 22:03:52 | serhiy.storchaka | set | messages: + msg182181 |
| 2013-02-15 21:57:00 | ned.deily | set | messages: + msg182180 |
| 2013-02-15 20:48:42 | ezio.melotti | set | messages: + msg182172 |
| 2013-02-15 20:27:08 | terry.reedy | set | messages: + msg182166 |
| 2013-02-15 13:21:08 | Ramchandra Apte | set | nosy:
+ Ramchandra Apte messages: + msg182141 |
| 2013-02-15 08:10:33 | ned.deily | set | messages: + msg182130 |
| 2013-02-14 16:06:34 | serhiy.storchaka | set | messages: + msg182106 |
| 2013-01-07 19:11:26 | serhiy.storchaka | set | messages: + msg179276 |
| 2012-12-20 12:57:02 | serhiy.storchaka | set | files:
+ tkinter_nobmp_error.patch versions: + Python 2.7 nosy: + serhiy.storchaka messages: + msg177812 keywords: + patch |
| 2012-12-19 20:43:49 | terry.reedy | set | title: IDLE crashes when pasting non-BMP unicode char on UCS-16 build -> IDLE crashes when pasting non-BMP unicode char on Py3 messages: + msg177778 versions: + Python 3.4 |
| 2012-12-19 16:01:59 | Rosuav | set | nosy:
+ Rosuav messages: + msg177750 |
| 2012-03-15 04:46:44 | asvetlov | set | messages: + msg155857 |
| 2012-03-14 22:37:27 | roger.serwy | set | messages: + msg155814 |
| 2012-03-14 22:21:19 | roger.serwy | set | nosy:
+ roger.serwy messages: + msg155810 |
| 2012-03-14 21:44:37 | ned.deily | set | nosy:
- astrand |
| 2012-03-14 21:43:59 | ned.deily | set | assignee: astrand -> asvetlov messages: + msg155801 nosy: + asvetlov |
| 2012-03-14 21:42:40 | ned.deily | set | title: IDLE crash with unicode bigger than 0xFFFF -> IDLE crashes when pasting non-BMP unicode char on UCS-16 build nosy: + astrand messages: + msg155799 assignee: ned.deily -> astrand versions: + Python 3.3 |
| 2011-10-16 19:40:29 | ned.deily | set | messages: + msg145635 |
| 2011-10-16 07:18:44 | ezio.melotti | set | messages: + msg145616 |
| 2011-10-16 03:45:05 | terry.reedy | set | messages: + msg145611 |
| 2011-10-15 20:48:11 | ezio.melotti | set | messages: + msg145607 |
| 2011-10-15 20:25:20 | ezio.melotti | set | messages: + msg145605 |
| 2011-10-15 10:08:01 | ned.deily | set | messages:
+ msg145585 components: + Tkinter, - Unicode, Windows |
| 2011-10-15 05:48:28 | terry.reedy | set | messages: + msg145584 |
| 2011-10-15 04:33:10 | JBernardo | set | messages: + msg145581 |
| 2011-10-15 04:03:45 | JBernardo | set | messages: + msg145580 |
| 2011-10-15 00:01:59 | terry.reedy | set | type: behavior -> crash messages: + msg145573 nosy: + terry.reedy |
| 2011-10-14 23:42:20 | ezio.melotti | set | nosy:
+ ezio.melotti type: behavior |
| 2011-10-11 21:54:21 | JBernardo | set | messages: + msg145369 |
| 2011-10-11 21:06:15 | ned.deily | set | nosy:
+ ned.deily messages: + msg145366 assignee: ned.deily stage: test needed |
| 2011-10-11 20:01:32 | JBernardo | create | |
