New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IDLE can crash if file name contains non-BMP Unicode characters #67860
Comments
I could run the IDLE in the past time.But today,when I try and launch IDLE, the icon appears on the dock for a second and then disappears and the application doesn't run.Moreover,When I run IDLE3 in Terminal,it says:
bogon:~ Kamisky$ idle3
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.4/bin/idle3", line 5, in <module>
main()
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/idlelib/PyShell.py", line 1560, in main
shell = flist.open_shell()
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/idlelib/PyShell.py", line 315, in open_shell
self.pyshell = PyShell(self)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/idlelib/PyShell.py", line 866, in __init__
OutputWindow.__init__(self, flist, None, None)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/idlelib/OutputWindow.py", line 16, in __init__
EditorWindow.__init__(self, *args)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/idlelib/EditorWindow.py", line 301, in __init__
self.update_recent_files_list()
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/idlelib/EditorWindow.py", line 927, in update_recent_files_list
underline=0)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tkinter/__init__.py", line 2719, in add_command
self.add('command', cnf or kw)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tkinter/__init__.py", line 2710, in add
self._options(cnf, kw))
_tkinter.TclError: character U+1f393 is above the range (U+0000-U+FFFF) allowed by Tcl Thanks. |
Try removing IDLE's recently-used file list: rm ~/.idlerc/recent-files.lst |
At least on some platforms (e.g. OS X), it is easy to create files with legitimate names containing code points above the BMP limit (<= U+FFFF) currently imposed by Tcl/Tk. For IDLE 3, I suspect _filename_to_unicode() in EditorWindow could be modified to check for such cases to prevent problems if such file names end up in recent-files.lst. That might not be sufficient: there may be other problematic places. I also was able to crash a current IDLE 2.7 just opening a file with such a name. |
The full path of a file being edited also ends up in the title bar and the Window menu. I do not know whether the title bar is displayed by tk or the OS (Windows obviously displays the title of taskbar icons) but the Window list is definitely by tk. It seems to me that files need two names: the system name used to open (and re-open) a file (utf-8 bytes on *nix?) and a tk name (BMD unicode) for display in the various places. The current limitation to BMP names is a limitation of the tcl/tk gui framework. I would like to add a workaround but do not consider its absence a bug. I am proposing on python-list the addition of some builtin means to replace non-BMP chars with \U000xxxxx escapes for display purposes. This would be generally useful for tkinter programming. The thread is "Add str.bmp() to only expand non-BMP chars, for tkinter use". If this does not happen in 3.5, I would consider a patch to add a private function to EditorWindow.py to do the same. It would be far less efficient, but fast enough for short path names. The EditorWindow.py line numbers are slight different from those in 3.4.3 (and 3.4.2, I believe), so I presume this is with 3.4.0 or 3.4.1. The result with 3.4.3 should be unchanged. |
Thinking more, there are two issue here. One is the fact that Idle stops when fed a filename with astral chars. This *is* a bug and should be fixed in all versions, even if that fix is to display a message box saying that Idle cannot work properly with such files. The second is the one I addressed in the first message, the inability to work properly with files. Fixing that would obviate the need to display a message, but might be more work, especially on 2.7. Kamisky, if you launch Idle and then try to open the file, do you see the name in the Open dialog? I presume that if you do, and select it, Idle would stop just as it did in your report. |
_filename_to_unicode returns unicode names as is. In 3.x, are filenames ever not unicode? They come from either a file dialog (via tk, hence unicode), or sys.argv. I do not know about the latter, which is possibly OS-dependent. This function is used in three places in idlelib, all within EditorWindow:
The _f2u output is not saved to disk or used to open files; it is display only. So replacing astral chars with either \Unnnnn escapes or the BMP box char should be fine.
2&3. The titles are display only. WindowList.py displays the long name for editor windows, but the callback is a wakeup function tied to the Window itself. |
On the lines of changes proposed by Terry and others in the comments, this is the patch I'm submitting. The function _filename_to_unicode() plays the major part in the plot. The function can get two types of filename ,'str' and 'bytes'. When its str, all the astral chars(characters that are outside BMP ) are replaced by the '�'(diamond question mark) character by performing a regex substitution in the return statement. If the received filename is in the form of b'..' (bytes) or any other encoding ,it is first decoded(as was earlier) into str and before returning, all the out of range Unicode characters are replaced by the '�'(diamond question mark) character . The effect on behavior is : the 1.IDLE is able to display correctly,the filename in the title-bar and in the file open dialog. This is my first patch,please review it ,if any errors found I will correct them and upload again :) |
Correction: This patch fixes the problem of idle not opening when recent file list has filenames outside BMP and the crashing of idle on using filenames with astral characters. The added benefit is that you can create a file with such chars and save it and access it again from the Recent Files List. What this patch doesn't fix is :
Exception in Tkinter callback
Traceback (most recent call last):
File "/home/sanad/devpy/pessoc/cpython/Lib/tkinter/__init__.py", line 1549, in __call__
return self.func(*args)
File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/ScriptBinding.py", line 124, in run_module_event
return self._run_module_event(event)
File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/ScriptBinding.py", line 145, in _run_module_event
interp.restart_subprocess(with_cwd=False, filename=code.co_filename)
File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/PyShell.py", line 502, in restart_subprocess
console.write("\n{0} {1} {0}".format(halfbar, tag))
File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/PyShell.py", line 1294, in write
'Non-BMP character not supported in Tk')
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 49-49: Non-BMP character not supported in Tk Which I guess is another related issue. |
New changeset dda625798111 by Terry Jan Reedy in branch '3.4': New changeset 97d50e6247e1 by Terry Jan Reedy in branch '3.5': New changeset 180bfaa7cdf8 by Terry Jan Reedy in branch 'default': |
I made a different fix for avoid the error posted when running. Sanad, PLEASE test running a file with astral char, the same way you did before, to see is there are any other problems. I cannot get such a file into an Idle editor on Windows. I *think* this patch is enough, but cannot be sure. I am leaving this open both for your test and a possible 2.7 backport. |
Hey Terry, after testing the committed patch on my Linux Mint, I have found the following behaviour:
PS : since you are unable to test it own windows and many other developers might face the same I'm uploading the screenshot of the IDLE windows after the patch is applied. |
Interesting, this doesn't work with non-UTF-8 locale. $ touch astral𝔼.py
$ LC_ALL=en_US.iso88591 ./python -m idlelib.idle -e astral𝔼.py
Traceback (most recent call last):
File "/home/serhiy/py/cpython/Lib/runpy.py", line 170, in _run_module_as_main
"__main__", mod_spec)
File "/home/serhiy/py/cpython/Lib/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/serhiy/py/cpython/Lib/idlelib/idle.py", line 11, in <module>
idlelib.PyShell.main()
File "/home/serhiy/py/cpython/Lib/idlelib/PyShell.py", line 1553, in main
if flist.open(filename) is None:
File "/home/serhiy/py/cpython/Lib/idlelib/FileList.py", line 36, in open
edit = self.EditorWindow(self, filename, key)
File "/home/serhiy/py/cpython/Lib/idlelib/PyShell.py", line 123, in __init__
EditorWindow.__init__(self, *args)
File "/home/serhiy/py/cpython/Lib/idlelib/EditorWindow.py", line 288, in __init__
if io.loadfile(filename):
File "/home/serhiy/py/cpython/Lib/idlelib/IOBinding.py", line 244, in loadfile
self.updaterecentfileslist(filename)
File "/home/serhiy/py/cpython/Lib/idlelib/IOBinding.py", line 525, in updaterecentfileslist
self.editwin.update_recent_files_list(filename)
File "/home/serhiy/py/cpython/Lib/idlelib/EditorWindow.py", line 899, in update_recent_files_list
if '\0' in path or not os.path.exists(path[0:-1]):
File "/home/serhiy/py/cpython/Lib/genericpath.py", line 19, in exists
os.stat(path)
UnicodeEncodeError: 'latin-1' codec can't encode character '\U0001d53c' in position 22: ordinal not in range(256) |
These observations are noted when the system locale is set to 'iso-88591'
Exception in Tkinter callback
Traceback (most recent call last):
File "/home/sanad/devpy/pessoc/cpython/Lib/tkinter/__init__.py", line 1549, in __call__
return self.func(*args)
File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/MultiCall.py", line 176, in handler
r = l[i](event)
File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/IOBinding.py", line 339, in save
self.save_as(event)
File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/IOBinding.py", line 353, in save_as
if self.writefile(filename):
File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/IOBinding.py", line 379, in writefile
with open(filename, "wb") as f:
UnicodeEncodeError: 'ascii' codec can't encode character '\U0001d53c' in position 39: ordinal not in range(128) I'm trying to figure out the root of the problem, feel free to give your inputs |
From msg248192 (Serhiy - which
On Windows, os.stat(astralpath) raises The EditorWindow line is testing for 'badfiles' to be excluded from the recent files list. We can work around the linux behavior with try-except. |
It appears that the failures in msg248192 and msg248365 are issues with non-latin1 chars in general, not with astral chars in particular. Anyone who wants filenames with astral chars should be using a utf-8 locale. This issue is about Idle working around the tk BMP limit where it can so as to not prevent editing and running files with system-legal names. According to msg248188, the patch works. So I am closing the issue as fixed. |
New changeset d54aa163e4ec by Terry Jan Reedy in branch '2.7': New changeset c1031eb12aa1 by Terry Jan Reedy in branch '3.4': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: