classification
Title: IDLE can crash if file name contains non-BMP Unicode characters
Type: behavior Stage: resolved
Components: IDLE Versions: Python 3.6, Python 3.5, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: terry.reedy Nosy List: kamisky, python-dev, roger.serwy, sanad, serhiy.storchaka, terry.reedy
Priority: high Keywords: patch

Created on 2015-03-15 08:51 by kamisky, last changed 2015-08-17 01:55 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
issue23672(updated third).patch sanad, 2015-08-02 07:53 review
idlerun.png sanad, 2015-08-07 13:08 IDLE's run window
Messages (17)
msg238136 - (view) Author: Kamisky (kamisky) Date: 2015-03-15 08:51
I could run the IDLE in the past time.But today,when I try and launch IDLE, the icon appears on the dock for a second and then disappears and the application doesn't run.Moreover,When I run IDLE3 in Terminal,it says:
bogon:~ Kamisky$ idle3
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/bin/idle3", line 5, in <module>
    main()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/idlelib/PyShell.py", line 1560, in main
    shell = flist.open_shell()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/idlelib/PyShell.py", line 315, in open_shell
    self.pyshell = PyShell(self)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/idlelib/PyShell.py", line 866, in __init__
    OutputWindow.__init__(self, flist, None, None)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/idlelib/OutputWindow.py", line 16, in __init__
    EditorWindow.__init__(self, *args)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/idlelib/EditorWindow.py", line 301, in __init__
    self.update_recent_files_list()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/idlelib/EditorWindow.py", line 927, in update_recent_files_list
    underline=0)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tkinter/__init__.py", line 2719, in add_command
    self.add('command', cnf or kw)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tkinter/__init__.py", line 2710, in add
    self._options(cnf, kw))
_tkinter.TclError: character U+1f393 is above the range (U+0000-U+FFFF) allowed by Tcl

Thanks.
msg238137 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2015-03-15 08:54
Try removing IDLE's recently-used file list:

rm ~/.idlerc/recent-files.lst
msg238140 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2015-03-15 10:27
At least on some platforms (e.g. OS X), it is easy to create files with legitimate names containing code points above the BMP limit (<= U+FFFF) currently imposed by Tcl/Tk.  For IDLE 3, I suspect _filename_to_unicode() in EditorWindow could be modified to check for such cases to prevent problems if such file names end up in recent-files.lst.  That might not be sufficient: there may be other problematic places.  I also was able to crash a current IDLE 2.7 just opening a file with such a name.
msg238164 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-03-15 23:12
The full path of a file being edited also ends up in the title bar and the Window menu.  I do not know whether the title bar is displayed by tk or the OS (Windows obviously displays the title of taskbar icons) but the Window list is definitely by tk.  It seems to me that files need two names: the system name used to open (and re-open) a file (utf-8 bytes on *nix?) and a tk name (BMD unicode) for display in the various places.

The current limitation to BMP names is a limitation of the tcl/tk gui framework.  I would like to add a workaround but do not consider its absence a bug.  I am proposing on python-list the addition of some builtin means to replace non-BMP chars with \U000xxxxx escapes for display purposes.  This would be generally useful for tkinter programming.  The thread is "Add str.bmp() to only expand non-BMP chars, for tkinter use".

If this does not happen in 3.5, I would consider a patch to add a private function to EditorWindow.py to do the same.  It would be far less efficient, but fast enough for short path names.

The EditorWindow.py line numbers are slight different from those in 3.4.3 (and 3.4.2, I believe), so I presume this is with 3.4.0 or 3.4.1.  The result with 3.4.3 should be unchanged.
msg238165 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-03-15 23:22
Thinking more, there are two issue here.  One is the fact that Idle stops when fed a filename with astral chars.  This *is* a bug and should be fixed in all versions, even if that fix is to display a message box saying that Idle cannot work properly with such files.

The second is the one I addressed in the first message, the inability to work properly with files.  Fixing that would obviate the need to display a message, but might be more work, especially on 2.7.

Kamisky, if you launch Idle and then try to open the file, do you see the name in the Open dialog?  I presume that if you do, and select it, Idle would stop just as it did in your report.
msg247646 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-07-30 02:25
_filename_to_unicode returns unicode names as is.  In 3.x, are filenames ever not unicode?  They come from either a file dialog (via tk, hence unicode), or sys.argv. I do not know about the latter, which is possibly OS-dependent.

This function is used in three places in idlelib, all within EditorWindow:
1.    def update_recent_files_list(self, new_file=None): ...
                ufile_name = self._filename_to_unicode(file_name)
2.    def short_title(self):  # reduce filename to basename
        return self._filename_to_unicode(filename)
3.    def long_title(self):
        return self._filename_to_unicode(self.io.filename or "")

The _f2u output is not saved to disk or used to open files; it is display only.  So replacing astral chars with either \Unnnnn escapes or the BMP box char should be fine.

1. The callback associated with each ufile_name encloses the original file_name, which is used to open the file.  The original filename is also saved back to disk before the _f2u call.

2&3. The titles are display only.  WindowList.py displays the long name for editor windows, but the callback is a wakeup function tied to the Window itself.
msg247856 - (view) Author: sanad (sanad) * Date: 2015-08-02 07:53
On the lines of changes proposed by Terry and others in the comments, this is the patch I'm submitting. 

The function _filename_to_unicode() plays the major part in the plot. The function can get two types of filename ,'str' and 'bytes'. When its str, all the astral chars(characters that are outside BMP ) are replaced by the '�'(diamond question mark) character by performing a regex substitution in the return statement.

If the received filename is in the form of b'..' (bytes) or any other encoding ,it is first decoded(as was earlier) into str and before returning, all the out of range Unicode characters are replaced by the '�'(diamond question mark) character .

The effect on behavior is : the 

1.IDLE is able to display correctly,the  filename in the title-bar and in the file open dialog.
2. Any file with name that have astral chars in them are easily imported without any crash.

This is my first patch,please review it ,if any errors found I will correct them and upload again :)
msg247858 - (view) Author: sanad (sanad) * Date: 2015-08-02 08:32
Correction:

This patch fixes the problem of idle not opening when recent file list has filenames outside BMP and the crashing of idle on using filenames with astral characters.

The added benefit is that you can create a file with such chars and save it and access it again from the Recent Files List.

What this patch doesn't fix is :
1. The File Open Dialog displays the filename incorrectly.
2. The File Open Dialog doesn't allows the file with such filename to be opened.
3. When trying the to run the file with such name, it gives an Traceback to Tkinter. The following is the error message :

Exception in Tkinter callback
Traceback (most recent call last):
  File "/home/sanad/devpy/pessoc/cpython/Lib/tkinter/__init__.py", line 1549, in __call__
    return self.func(*args)
  File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/ScriptBinding.py", line 124, in run_module_event
    return self._run_module_event(event)
  File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/ScriptBinding.py", line 145, in _run_module_event
    interp.restart_subprocess(with_cwd=False, filename=code.co_filename)
  File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/PyShell.py", line 502, in restart_subprocess
    console.write("\n{0} {1} {0}".format(halfbar, tag))
  File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/PyShell.py", line 1294, in write
    'Non-BMP character not supported in Tk')
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 49-49: Non-BMP character not supported in Tk

Which I guess is another related issue.
msg247918 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-08-03 04:16
The problem with astral chars and open/save dialogs is the subject of #21084. The problem with printing astral chars generated by programs is #22742 and maybe #21084.  I added a fix for the very new display of filenames on the run separator bar, which is the problem discovered above.
msg248105 - (view) Author: Roundup Robot (python-dev) Date: 2015-08-06 04:55
New changeset dda625798111 by Terry Jan Reedy in branch '3.4':
Issue #23672: Allow Idle to edit and run files with astral chars in name.
https://hg.python.org/cpython/rev/dda625798111

New changeset 97d50e6247e1 by Terry Jan Reedy in branch '3.5':
Issue #23672:Merge with 3.4
https://hg.python.org/cpython/rev/97d50e6247e1

New changeset 180bfaa7cdf8 by Terry Jan Reedy in branch 'default':
Issue #23672:Merge with 3.5
https://hg.python.org/cpython/rev/180bfaa7cdf8
msg248106 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-08-06 04:59
I made a different fix for avoid the error posted when running.  Sanad, PLEASE test running a file with astral char, the same way you did before, to see is there are any other problems.  I cannot get such a file into an Idle editor on Windows. I *think* this patch is enough, but cannot be sure.

I am leaving this open both for your test and a possible 2.7 backport.
msg248188 - (view) Author: sanad (sanad) * Date: 2015-08-07 13:08
Hey Terry, after testing the committed patch on my Linux Mint, I have found the following behaviour:

1. The issue of IDLE not starting when Recent File list has name outside BMP has been fixed.

2. The File Name is correctly formatted and displayed in the file editor window title and in the recent file list.(astral chars replaced by diamond question mark symbol)

3. Files with astral char names now RUN perfectly.

4. You can create a file with astral char in its name and run it effectively and re run it from the recent file list.

PS : since you are unable to test it own windows and many other developers might face the same I'm uploading the screenshot of the IDLE windows after the patch is applied.
In the given test , I had kept my file name as "𝔼𝔼hello𝒵My𝔼Name𝒫.py"
and it was displayed as "��hello�My�Name�.py"
msg248192 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-08-07 13:53
Interesting, this doesn't work with non-UTF-8 locale.

$ touch astral𝔼.py
$ LC_ALL=en_US.iso88591 ./python -m idlelib.idle -e astral𝔼.py
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/serhiy/py/cpython/Lib/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/serhiy/py/cpython/Lib/idlelib/idle.py", line 11, in <module>
    idlelib.PyShell.main()
  File "/home/serhiy/py/cpython/Lib/idlelib/PyShell.py", line 1553, in main
    if flist.open(filename) is None:
  File "/home/serhiy/py/cpython/Lib/idlelib/FileList.py", line 36, in open
    edit = self.EditorWindow(self, filename, key)
  File "/home/serhiy/py/cpython/Lib/idlelib/PyShell.py", line 123, in __init__
    EditorWindow.__init__(self, *args)
  File "/home/serhiy/py/cpython/Lib/idlelib/EditorWindow.py", line 288, in __init__
    if io.loadfile(filename):
  File "/home/serhiy/py/cpython/Lib/idlelib/IOBinding.py", line 244, in loadfile
    self.updaterecentfileslist(filename)
  File "/home/serhiy/py/cpython/Lib/idlelib/IOBinding.py", line 525, in updaterecentfileslist
    self.editwin.update_recent_files_list(filename)
  File "/home/serhiy/py/cpython/Lib/idlelib/EditorWindow.py", line 899, in update_recent_files_list
    if '\0' in path or not os.path.exists(path[0:-1]):
  File "/home/serhiy/py/cpython/Lib/genericpath.py", line 19, in exists
    os.stat(path)
UnicodeEncodeError: 'latin-1' codec can't encode character '\U0001d53c' in position 22: ordinal not in range(256)
msg248365 - (view) Author: sanad (sanad) * Date: 2015-08-10 18:59
These observations are noted when the system locale is set to 'iso-88591'

1. Ok, for some reason I'm able to execute this command without any error in Linux (idle window is opening with a file name as 'astralE.py'). Because the character '𝔼' is automatically being shown and treated as 'E' in both terminal and python command line interpreter(similar for chars '𝒵' = 'z' and '𝒫' = 'P').

2. But i'm unable to save/make a file with filname 'astral𝔼.py' and hence unable to run it. The following errors are thrown then : 

Exception in Tkinter callback
Traceback (most recent call last):
  File "/home/sanad/devpy/pessoc/cpython/Lib/tkinter/__init__.py", line 1549, in __call__
    return self.func(*args)
  File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/MultiCall.py", line 176, in handler
    r = l[i](event)
  File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/IOBinding.py", line 339, in save
    self.save_as(event)
  File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/IOBinding.py", line 353, in save_as
    if self.writefile(filename):
  File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/IOBinding.py", line 379, in writefile
    with open(filename, "wb") as f:
UnicodeEncodeError: 'ascii' codec can't encode character '\U0001d53c' in position 39: ordinal not in range(128)

I'm trying to figure out the root of the problem, feel free to give your inputs
msg248700 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-08-17 01:08
From msg248192 (Serhiy - which 
...
>  File "/home/serhiy/py/cpython/Lib/idlelib/EditorWindow.py", line 899, in update_recent_files_list
>    if '\0' in path or not os.path.exists(path[0:-1]):
>  File "/home/serhiy/py/cpython/Lib/genericpath.py", line 19, in exists
>    os.stat(path)
> UnicodeEncodeError: 'latin-1' codec can't encode character '\U0001d53c' in position 22: ordinal not in range(256)

On Windows, os.stat(astralpath) raises
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'as\U00011111.py'
and os.path.exists(astralpath) catches the exception and returns False. It is a linux issue, and perhaps a bug, that os.path.exists does not catch the exception. To me, as I read the docstring, it should always return True or False, with the latter the default, for any input string.

The EditorWindow line is testing for 'badfiles' to be excluded from the recent files list.  We can work around the linux behavior with try-except.
msg248702 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-08-17 01:39
It appears that the failures in msg248192 and msg248365 are issues with non-latin1 chars in general, not with astral chars in particular.  Anyone who wants filenames with astral chars should be using a utf-8 locale.

This issue is about Idle working around the tk BMP limit where it can so as to not prevent editing and running files with system-legal names. According to msg248188, the patch works.  So I am closing the issue as fixed.
msg248704 - (view) Author: Roundup Robot (python-dev) Date: 2015-08-17 01:55
New changeset d54aa163e4ec by Terry Jan Reedy in branch '2.7':
Issue #23672: ACKS
https://hg.python.org/cpython/rev/d54aa163e4ec

New changeset c1031eb12aa1 by Terry Jan Reedy in branch '3.4':
Issue #23672: ACKS
https://hg.python.org/cpython/rev/c1031eb12aa1
History
Date User Action Args
2015-08-17 01:55:56python-devsetmessages: + msg248704
2015-08-17 01:39:25terry.reedysetstatus: open -> closed
messages: + msg248702

assignee: terry.reedy
resolution: fixed
stage: commit review -> resolved
2015-08-17 01:08:23terry.reedysetmessages: + msg248700
2015-08-10 18:59:05sanadsetmessages: + msg248365
2015-08-07 13:53:24serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg248192
2015-08-07 13:08:02sanadsetfiles: + idlerun.png

messages: + msg248188
2015-08-06 04:59:40terry.reedysetmessages: + msg248106
stage: test needed -> commit review
2015-08-06 04:55:45python-devsetnosy: + python-dev
messages: + msg248105
2015-08-03 04:16:54terry.reedysetmessages: + msg247918
2015-08-02 08:32:03sanadsetmessages: + msg247858
2015-08-02 07:53:20sanadsetfiles: + issue23672(updated third).patch

nosy: + sanad
messages: + msg247856

keywords: + patch
2015-07-30 02:25:52terry.reedysetstage: needs patch -> test needed
messages: + msg247646
versions: + Python 3.6
2015-03-15 23:22:52terry.reedysettype: enhancement -> behavior
messages: + msg238165
versions: + Python 2.7, Python 3.4
2015-03-15 23:12:10terry.reedysettype: behavior -> enhancement
messages: + msg238164
versions: - Python 2.7, Python 3.4
2015-03-15 21:51:28rhettingersetpriority: normal -> high
2015-03-15 10:27:51ned.deilysetnosy: + terry.reedy, roger.serwy, - ned.deily
2015-03-15 10:27:22ned.deilysettitle: IDLE for osx 10.8.5 won't run -> IDLE can crash if file name contains non-BMP Unicode characters
nosy: ned.deily, kamisky
versions: + Python 2.7, Python 3.5
messages: + msg238140

stage: needs patch
2015-03-15 08:54:20ned.deilysetnosy: + ned.deily
messages: + msg238137
2015-03-15 08:51:42kamiskycreate