Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDLE can crash if file name contains non-BMP Unicode characters #67860

Closed
kamisky mannequin opened this issue Mar 15, 2015 · 17 comments
Closed

IDLE can crash if file name contains non-BMP Unicode characters #67860

kamisky mannequin opened this issue Mar 15, 2015 · 17 comments
Assignees
Labels
topic-IDLE type-bug An unexpected behavior, bug, or error

Comments

@kamisky
Copy link
Mannequin

kamisky mannequin commented Mar 15, 2015

BPO 23672
Nosy @terryjreedy, @serwy, @serhiy-storchaka
Files
  • issue23672(updated third).patch
  • idlerun.png: IDLE's run window
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/terryjreedy'
    closed_at = <Date 2015-08-17.01:39:25.132>
    created_at = <Date 2015-03-15.08:51:42.479>
    labels = ['expert-IDLE', 'type-bug']
    title = 'IDLE can crash if file name contains non-BMP Unicode characters'
    updated_at = <Date 2015-08-17.01:55:56.286>
    user = 'https://bugs.python.org/kamisky'

    bugs.python.org fields:

    activity = <Date 2015-08-17.01:55:56.286>
    actor = 'python-dev'
    assignee = 'terry.reedy'
    closed = True
    closed_date = <Date 2015-08-17.01:39:25.132>
    closer = 'terry.reedy'
    components = ['IDLE']
    creation = <Date 2015-03-15.08:51:42.479>
    creator = 'kamisky'
    dependencies = []
    files = ['40098', '40143']
    hgrepos = []
    issue_num = 23672
    keywords = ['patch']
    message_count = 17.0
    messages = ['238136', '238137', '238140', '238164', '238165', '247646', '247856', '247858', '247918', '248105', '248106', '248188', '248192', '248365', '248700', '248702', '248704']
    nosy_count = 6.0
    nosy_names = ['terry.reedy', 'roger.serwy', 'python-dev', 'serhiy.storchaka', 'kamisky', 'sanad']
    pr_nums = []
    priority = 'high'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue23672'
    versions = ['Python 2.7', 'Python 3.4', 'Python 3.5', 'Python 3.6']

    @kamisky
    Copy link
    Mannequin Author

    kamisky mannequin commented Mar 15, 2015

    I could run the IDLE in the past time.But today,when I try and launch IDLE, the icon appears on the dock for a second and then disappears and the application doesn't run.Moreover,When I run IDLE3 in Terminal,it says:
    bogon:~ Kamisky$ idle3
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/3.4/bin/idle3", line 5, in <module>
        main()
      File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/idlelib/PyShell.py", line 1560, in main
        shell = flist.open_shell()
      File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/idlelib/PyShell.py", line 315, in open_shell
        self.pyshell = PyShell(self)
      File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/idlelib/PyShell.py", line 866, in __init__
        OutputWindow.__init__(self, flist, None, None)
      File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/idlelib/OutputWindow.py", line 16, in __init__
        EditorWindow.__init__(self, *args)
      File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/idlelib/EditorWindow.py", line 301, in __init__
        self.update_recent_files_list()
      File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/idlelib/EditorWindow.py", line 927, in update_recent_files_list
        underline=0)
      File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tkinter/__init__.py", line 2719, in add_command
        self.add('command', cnf or kw)
      File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tkinter/__init__.py", line 2710, in add
        self._options(cnf, kw))
    _tkinter.TclError: character U+1f393 is above the range (U+0000-U+FFFF) allowed by Tcl

    Thanks.

    @kamisky kamisky mannequin added type-bug An unexpected behavior, bug, or error topic-IDLE labels Mar 15, 2015
    @ned-deily
    Copy link
    Member

    Try removing IDLE's recently-used file list:

    rm ~/.idlerc/recent-files.lst

    @ned-deily
    Copy link
    Member

    At least on some platforms (e.g. OS X), it is easy to create files with legitimate names containing code points above the BMP limit (<= U+FFFF) currently imposed by Tcl/Tk. For IDLE 3, I suspect _filename_to_unicode() in EditorWindow could be modified to check for such cases to prevent problems if such file names end up in recent-files.lst. That might not be sufficient: there may be other problematic places. I also was able to crash a current IDLE 2.7 just opening a file with such a name.

    @ned-deily ned-deily changed the title IDLE for osx 10.8.5 won't run IDLE can crash if file name contains non-BMP Unicode characters Mar 15, 2015
    @terryjreedy
    Copy link
    Member

    The full path of a file being edited also ends up in the title bar and the Window menu. I do not know whether the title bar is displayed by tk or the OS (Windows obviously displays the title of taskbar icons) but the Window list is definitely by tk. It seems to me that files need two names: the system name used to open (and re-open) a file (utf-8 bytes on *nix?) and a tk name (BMD unicode) for display in the various places.

    The current limitation to BMP names is a limitation of the tcl/tk gui framework. I would like to add a workaround but do not consider its absence a bug. I am proposing on python-list the addition of some builtin means to replace non-BMP chars with \U000xxxxx escapes for display purposes. This would be generally useful for tkinter programming. The thread is "Add str.bmp() to only expand non-BMP chars, for tkinter use".

    If this does not happen in 3.5, I would consider a patch to add a private function to EditorWindow.py to do the same. It would be far less efficient, but fast enough for short path names.

    The EditorWindow.py line numbers are slight different from those in 3.4.3 (and 3.4.2, I believe), so I presume this is with 3.4.0 or 3.4.1. The result with 3.4.3 should be unchanged.

    @terryjreedy terryjreedy added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Mar 15, 2015
    @terryjreedy
    Copy link
    Member

    Thinking more, there are two issue here. One is the fact that Idle stops when fed a filename with astral chars. This *is* a bug and should be fixed in all versions, even if that fix is to display a message box saying that Idle cannot work properly with such files.

    The second is the one I addressed in the first message, the inability to work properly with files. Fixing that would obviate the need to display a message, but might be more work, especially on 2.7.

    Kamisky, if you launch Idle and then try to open the file, do you see the name in the Open dialog? I presume that if you do, and select it, Idle would stop just as it did in your report.

    @terryjreedy terryjreedy added type-bug An unexpected behavior, bug, or error and removed type-feature A feature request or enhancement labels Mar 15, 2015
    @terryjreedy
    Copy link
    Member

    _filename_to_unicode returns unicode names as is. In 3.x, are filenames ever not unicode? They come from either a file dialog (via tk, hence unicode), or sys.argv. I do not know about the latter, which is possibly OS-dependent.

    This function is used in three places in idlelib, all within EditorWindow:

    1. def update_recent_files_list(self, new_file=None): ...
      ufile_name = self._filename_to_unicode(file_name)
    2. def short_title(self): # reduce filename to basename
      return self._filename_to_unicode(filename)
    3. def long_title(self):
      return self._filename_to_unicode(self.io.filename or "")

    The _f2u output is not saved to disk or used to open files; it is display only. So replacing astral chars with either \Unnnnn escapes or the BMP box char should be fine.

    1. The callback associated with each ufile_name encloses the original file_name, which is used to open the file. The original filename is also saved back to disk before the _f2u call.

    2&3. The titles are display only. WindowList.py displays the long name for editor windows, but the callback is a wakeup function tied to the Window itself.

    @sanad
    Copy link
    Mannequin

    sanad mannequin commented Aug 2, 2015

    On the lines of changes proposed by Terry and others in the comments, this is the patch I'm submitting.

    The function _filename_to_unicode() plays the major part in the plot. The function can get two types of filename ,'str' and 'bytes'. When its str, all the astral chars(characters that are outside BMP ) are replaced by the '�'(diamond question mark) character by performing a regex substitution in the return statement.

    If the received filename is in the form of b'..' (bytes) or any other encoding ,it is first decoded(as was earlier) into str and before returning, all the out of range Unicode characters are replaced by the '�'(diamond question mark) character .

    The effect on behavior is : the

    1.IDLE is able to display correctly,the filename in the title-bar and in the file open dialog.
    2. Any file with name that have astral chars in them are easily imported without any crash.

    This is my first patch,please review it ,if any errors found I will correct them and upload again :)

    @sanad
    Copy link
    Mannequin

    sanad mannequin commented Aug 2, 2015

    Correction:

    This patch fixes the problem of idle not opening when recent file list has filenames outside BMP and the crashing of idle on using filenames with astral characters.

    The added benefit is that you can create a file with such chars and save it and access it again from the Recent Files List.

    What this patch doesn't fix is :

    1. The File Open Dialog displays the filename incorrectly.
    2. The File Open Dialog doesn't allows the file with such filename to be opened.
    3. When trying the to run the file with such name, it gives an Traceback to Tkinter. The following is the error message :
    Exception in Tkinter callback
    Traceback (most recent call last):
      File "/home/sanad/devpy/pessoc/cpython/Lib/tkinter/__init__.py", line 1549, in __call__
        return self.func(*args)
      File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/ScriptBinding.py", line 124, in run_module_event
        return self._run_module_event(event)
      File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/ScriptBinding.py", line 145, in _run_module_event
        interp.restart_subprocess(with_cwd=False, filename=code.co_filename)
      File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/PyShell.py", line 502, in restart_subprocess
        console.write("\n{0} {1} {0}".format(halfbar, tag))
      File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/PyShell.py", line 1294, in write
        'Non-BMP character not supported in Tk')
    UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 49-49: Non-BMP character not supported in Tk

    Which I guess is another related issue.

    @terryjreedy
    Copy link
    Member

    The problem with astral chars and open/save dialogs is the subject of bpo-21084. The problem with printing astral chars generated by programs is bpo-22742 and maybe bpo-21084. I added a fix for the very new display of filenames on the run separator bar, which is the problem discovered above.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Aug 6, 2015

    New changeset dda625798111 by Terry Jan Reedy in branch '3.4':
    Issue bpo-23672: Allow Idle to edit and run files with astral chars in name.
    https://hg.python.org/cpython/rev/dda625798111

    New changeset 97d50e6247e1 by Terry Jan Reedy in branch '3.5':
    Issue bpo-23672:Merge with 3.4
    https://hg.python.org/cpython/rev/97d50e6247e1

    New changeset 180bfaa7cdf8 by Terry Jan Reedy in branch 'default':
    Issue bpo-23672:Merge with 3.5
    https://hg.python.org/cpython/rev/180bfaa7cdf8

    @terryjreedy
    Copy link
    Member

    I made a different fix for avoid the error posted when running. Sanad, PLEASE test running a file with astral char, the same way you did before, to see is there are any other problems. I cannot get such a file into an Idle editor on Windows. I *think* this patch is enough, but cannot be sure.

    I am leaving this open both for your test and a possible 2.7 backport.

    @sanad
    Copy link
    Mannequin

    sanad mannequin commented Aug 7, 2015

    Hey Terry, after testing the committed patch on my Linux Mint, I have found the following behaviour:

    1. The issue of IDLE not starting when Recent File list has name outside BMP has been fixed.

    2. The File Name is correctly formatted and displayed in the file editor window title and in the recent file list.(astral chars replaced by diamond question mark symbol)

    3. Files with astral char names now RUN perfectly.

    4. You can create a file with astral char in its name and run it effectively and re run it from the recent file list.

    PS : since you are unable to test it own windows and many other developers might face the same I'm uploading the screenshot of the IDLE windows after the patch is applied.
    In the given test , I had kept my file name as "𝔼𝔼hello𝒵My𝔼Name𝒫.py"
    and it was displayed as "��hello�My�Name�.py"

    @serhiy-storchaka
    Copy link
    Member

    Interesting, this doesn't work with non-UTF-8 locale.

    $ touch astral𝔼.py
    $ LC_ALL=en_US.iso88591 ./python -m idlelib.idle -e astral𝔼.py
    Traceback (most recent call last):
      File "/home/serhiy/py/cpython/Lib/runpy.py", line 170, in _run_module_as_main
        "__main__", mod_spec)
      File "/home/serhiy/py/cpython/Lib/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/serhiy/py/cpython/Lib/idlelib/idle.py", line 11, in <module>
        idlelib.PyShell.main()
      File "/home/serhiy/py/cpython/Lib/idlelib/PyShell.py", line 1553, in main
        if flist.open(filename) is None:
      File "/home/serhiy/py/cpython/Lib/idlelib/FileList.py", line 36, in open
        edit = self.EditorWindow(self, filename, key)
      File "/home/serhiy/py/cpython/Lib/idlelib/PyShell.py", line 123, in __init__
        EditorWindow.__init__(self, *args)
      File "/home/serhiy/py/cpython/Lib/idlelib/EditorWindow.py", line 288, in __init__
        if io.loadfile(filename):
      File "/home/serhiy/py/cpython/Lib/idlelib/IOBinding.py", line 244, in loadfile
        self.updaterecentfileslist(filename)
      File "/home/serhiy/py/cpython/Lib/idlelib/IOBinding.py", line 525, in updaterecentfileslist
        self.editwin.update_recent_files_list(filename)
      File "/home/serhiy/py/cpython/Lib/idlelib/EditorWindow.py", line 899, in update_recent_files_list
        if '\0' in path or not os.path.exists(path[0:-1]):
      File "/home/serhiy/py/cpython/Lib/genericpath.py", line 19, in exists
        os.stat(path)
    UnicodeEncodeError: 'latin-1' codec can't encode character '\U0001d53c' in position 22: ordinal not in range(256)

    @sanad
    Copy link
    Mannequin

    sanad mannequin commented Aug 10, 2015

    These observations are noted when the system locale is set to 'iso-88591'

    1. Ok, for some reason I'm able to execute this command without any error in Linux (idle window is opening with a file name as 'astralE.py'). Because the character '𝔼' is automatically being shown and treated as 'E' in both terminal and python command line interpreter(similar for chars '𝒵' = 'z' and '𝒫' = 'P').

    2. But i'm unable to save/make a file with filname 'astral𝔼.py' and hence unable to run it. The following errors are thrown then :

    Exception in Tkinter callback
    Traceback (most recent call last):
      File "/home/sanad/devpy/pessoc/cpython/Lib/tkinter/__init__.py", line 1549, in __call__
        return self.func(*args)
      File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/MultiCall.py", line 176, in handler
        r = l[i](event)
      File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/IOBinding.py", line 339, in save
        self.save_as(event)
      File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/IOBinding.py", line 353, in save_as
        if self.writefile(filename):
      File "/home/sanad/devpy/pessoc/cpython/Lib/idlelib/IOBinding.py", line 379, in writefile
        with open(filename, "wb") as f:
    UnicodeEncodeError: 'ascii' codec can't encode character '\U0001d53c' in position 39: ordinal not in range(128)

    I'm trying to figure out the root of the problem, feel free to give your inputs

    @terryjreedy
    Copy link
    Member

    From msg248192 (Serhiy - which
    ...

    File "/home/serhiy/py/cpython/Lib/idlelib/EditorWindow.py", line 899, in update_recent_files_list
    if '\0' in path or not os.path.exists(path[0:-1]):
    File "/home/serhiy/py/cpython/Lib/genericpath.py", line 19, in exists
    os.stat(path)
    UnicodeEncodeError: 'latin-1' codec can't encode character '\U0001d53c' in position 22: ordinal not in range(256)

    On Windows, os.stat(astralpath) raises
    FileNotFoundError: [WinError 2] The system cannot find the file specified: 'as\U00011111.py'
    and os.path.exists(astralpath) catches the exception and returns False. It is a linux issue, and perhaps a bug, that os.path.exists does not catch the exception. To me, as I read the docstring, it should always return True or False, with the latter the default, for any input string.

    The EditorWindow line is testing for 'badfiles' to be excluded from the recent files list. We can work around the linux behavior with try-except.

    @terryjreedy
    Copy link
    Member

    It appears that the failures in msg248192 and msg248365 are issues with non-latin1 chars in general, not with astral chars in particular. Anyone who wants filenames with astral chars should be using a utf-8 locale.

    This issue is about Idle working around the tk BMP limit where it can so as to not prevent editing and running files with system-legal names. According to msg248188, the patch works. So I am closing the issue as fixed.

    @terryjreedy terryjreedy self-assigned this Aug 17, 2015
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Aug 17, 2015

    New changeset d54aa163e4ec by Terry Jan Reedy in branch '2.7':
    Issue bpo-23672: ACKS
    https://hg.python.org/cpython/rev/d54aa163e4ec

    New changeset c1031eb12aa1 by Terry Jan Reedy in branch '3.4':
    Issue bpo-23672: ACKS
    https://hg.python.org/cpython/rev/c1031eb12aa1

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-IDLE type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants