Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: Windows-tkinter-idle, unicode, and 0xxx filename #63220

Closed
terryjreedy opened this issue Sep 14, 2013 · 16 comments
Closed

Regression: Windows-tkinter-idle, unicode, and 0xxx filename #63220

terryjreedy opened this issue Sep 14, 2013 · 16 comments
Assignees
Labels
topic-IDLE topic-tkinter type-bug An unexpected behavior, bug, or error

Comments

@terryjreedy
Copy link
Member

BPO 19020
Nosy @terryjreedy, @vstinner, @ezio-melotti, @serhiy-storchaka
Files
  • tkinter_configure_splitlist.patch
  • tkinter_checkParam_configure.patch: Test tuple values with configure
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2014-01-22.08:35:53.238>
    created_at = <Date 2013-09-14.22:11:21.970>
    labels = ['expert-IDLE', 'type-bug', 'expert-tkinter']
    title = 'Regression: Windows-tkinter-idle, unicode, and 0xxx filename'
    updated_at = <Date 2014-01-22.08:35:53.237>
    user = 'https://github.com/terryjreedy'

    bugs.python.org fields:

    activity = <Date 2014-01-22.08:35:53.237>
    actor = 'terry.reedy'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2014-01-22.08:35:53.238>
    closer = 'terry.reedy'
    components = ['IDLE', 'Tkinter']
    creation = <Date 2013-09-14.22:11:21.970>
    creator = 'terry.reedy'
    dependencies = []
    files = ['31766', '32556']
    hgrepos = []
    issue_num = 19020
    keywords = ['patch']
    message_count = 16.0
    messages = ['197736', '197738', '197758', '198804', '198811', '198860', '202496', '202516', '202542', '206556', '206570', '206586', '206587', '206868', '206924', '208758']
    nosy_count = 7.0
    nosy_names = ['terry.reedy', 'bsherwood', 'vstinner', 'gpolo', 'ezio.melotti', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'commit review'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue19020'
    versions = ['Python 2.7', 'Python 3.3', 'Python 3.4']

    @terryjreedy
    Copy link
    Member Author

    Current 3.3.2+ repository build (32 bit) and 3.4.0a2 repository build (32 bit) and installation (64 bit) have a problem that did not exist in my Win6 3.3.2 installation (64 bit). (Bruce Sherwood discovered the symption with some installed version of 3.4.0a?) Adding to recent files a path with a directory or file whose name begins with '0' causes Idle to stop with a traceback such as given below. Currently, opening *any* file causes the recent files list to be rebuilt (seen in traceback). So the problem happens whether one opens a new file with a leading 0 somewhere or if there merely is one such already on the list.

    Exception in Tkinter callback
    Traceback (most recent call last):
      File "C:\Python34\lib\tkinter\__init__.py", line 1475, in __call__
        return self.func(*args)
      File "C:\Python34\lib\idlelib\IOBinding.py", line 179, in open
        flist.open(filename, self.loadfile)
      File "C:\Python34\lib\idlelib\FileList.py", line 34, in open
        return action(filename)
      File "C:\Python34\lib\idlelib\IOBinding.py", line 240, in loadfile
        self.updaterecentfileslist(filename)
      File "C:\Python34\lib\idlelib\IOBinding.py", line 521, in updaterecentfileslist
        self.editwin.update_recent_files_list(filename)
      File "C:\Python34\lib\idlelib\EditorWindow.py", line 915, in update_recent_files_list
        menu.delete(0, END)  # clear, and rebuild:
      File "C:\Python34\lib\tkinter\__init__.py", line 2739, in delete
        if 'command' in self.entryconfig(i):
      File "C:\Python34\lib\tkinter\__init__.py", line 2749, in entryconfigure
        return self._configure(('entryconfigure', index), cnf, kw)
      File "C:\Python34\lib\tkinter\__init__.py", line 1247, in _configure
        self.tk.call(_flatten((self._w, cmd)))):
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 14: invalid start byte

    'invalid start byte' 0xc0 has appeared before in other issues.

    Bruce got this with
    H:\HP_Documents\0PythonWork\AirplaneKinematics\accel2.py
    I get 12' instead of '14' with
    F:\Python\mypy\0...' (file or directory, must be 0, not 1)
    It appears that the string is being improperly 're-cooked' at some point, so that '\x' sequences are 1 position rather than 2.

    At the point of failure, I believe tk is checking whether a particular menuitem has an associated callback that should be deleted before the menuitem itself is deleted (before later being rebuilt). (It is a separate issue of whether we could just adjust the recent file list (sub sub menu) rather than deleting and rebuilding it, possibly the same as it was. Even if we did, the issue would still arise when opening a new '0' file, or when closing Idle.)

    @terryjreedy terryjreedy added the type-bug An unexpected behavior, bug, or error label Sep 14, 2013
    @serhiy-storchaka
    Copy link
    Member

    Perhaps this is caused by bpo-18101.

    @serhiy-storchaka
    Copy link
    Member

    There are two methods: splitlist() and split(). The splitlist() method splits only one level and always returns a tuple (even if it is empty or one-element). The split() method splits recursive and can returns a string if it is not splittable. I.e. result type of split() depends on if the argument contains spaces.

    Before bpo-18101 split() was broken on Python 3. It splitted recursively only bytes objects but not unicode strings. After "fixing" it becomes split already parsed tuples of strings.

    Perhaps split() shouldn't recursive parse tuples. But in this case its purpose is questionable. Recursive parse only string argument? But how it can distinguish the list of two strings "a" and "b c" from the list of the "a" string and the ["b", "c"] list? Both are spelled as {a {b c}} in Tcl.

    The proposed patch gets rid from split() in the configure result parsing code and uses two-level splitline() instead. Perhaps we should replace all left split()-s to the appropriate number of splitline()-s.

    @serhiy-storchaka
    Copy link
    Member

    What would you say about this patch Terry?

    @terryjreedy
    Copy link
    Member Author

    I cannot say much since I do not know what .split and .splitlist do or are supposed to do. They have no docstrings. They are methods of tkinter.Tk().tk, the app or 'interpreter' returned by _tkinter.create. Modules/_tkinker.c maps then to the C functions TkappSplit(List), which have no comments.

    @serhiy-storchaka
    Copy link
    Member

    Tcl is weak typed language and all Tcl values formally are strings. "123" is the 123 integer, the "123" string, and Tcl list containing one element "123" (which can be a number, a string, a list, etc). Actually for optimization Tcl uses different specialized types internally and Tkinter uses it for converting Tcl values to Python values (when wantobject is true). When wantobject is false, tkinter always returns string. If Tkinter encounters unknown to it Tcl type, it returns Tcl_Obj. Tcl introduces new types in new versions and Tcl function which returned string or Tcl list in old version can return new type in new version.

    So any Tkinter method which supposed return a "list", can return a tuple, a string, or a Tcl_Obj.

    splitlist() splits a string, a tuple or a Tcl_Obj to Python tuple.

    '' -> ()
    'abc' -> ('abc',)
    'abc def' -> ('abc', 'def')
    'abc {def ghi}' -> ('abc', 'def ghi')

    It always returns a tuple (of strings if an argument is a string). If an argument already is a tuple, splitlist() just returns it. If an argument is Tcl list, splitlist() returns a tuple which contains it's elements.

    split() is more intelligent. It try guess a structure of data and splits "list" to subelements while it is possible.

    '' -> ''
    'abc' -> 'abc'
    'abc def' -> ('abc', 'def')
    'abc {def ghi}' -> ('abc', ('def', 'ghi'))

    If an argument is a tuple, split() recursively splits it's elements. When an argument is TclObj, split() returns a string if Tcl list has 0 or 1 element, otherwise it returns the same value as splitlist().

    @serhiy-storchaka
    Copy link
    Member

    Tests added in bpo-19085 have a special case for tuple values because widget[name] and widget.configure(name) return different results in such cases. When remove this special case, following tests fails:

    ======================================================================
    FAIL: test_text (tkinter.test.test_ttk.test_widgets.ButtonTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 381, in test_text
        self.checkParams(widget, 'text', '', 'any string')
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 89, in checkParams
        self.checkParam(widget, name, value, **kwargs)
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 63, in checkParam
        self.assertEqual2(t[4], expected, eq=eq)
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 41, in assertEqual2
        self.assertEqual(actual, expected, msg)
    AssertionError: ('any', 'string') != 'any string'

    ======================================================================
    FAIL: test_offvalue (tkinter.test.test_ttk.test_widgets.CheckbuttonTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython/Lib/tkinter/test/test_ttk/test_widgets.py", line 248, in test_offvalue
        self.checkParams(widget, 'offvalue', 1, 2.3, '', 'any string')
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 89, in checkParams
        self.checkParam(widget, name, value, **kwargs)
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 63, in checkParam
        self.assertEqual2(t[4], expected, eq=eq)
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 41, in assertEqual2
        self.assertEqual(actual, expected, msg)
    AssertionError: ('any', 'string') != 'any string'

    ======================================================================
    FAIL: test_onvalue (tkinter.test.test_ttk.test_widgets.CheckbuttonTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython/Lib/tkinter/test/test_ttk/test_widgets.py", line 252, in test_onvalue
        self.checkParams(widget, 'onvalue', 1, 2.3, '', 'any string')
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 89, in checkParams
        self.checkParam(widget, name, value, **kwargs)
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 63, in checkParam
        self.assertEqual2(t[4], expected, eq=eq)
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 41, in assertEqual2
        self.assertEqual(actual, expected, msg)
    AssertionError: ('any', 'string') != 'any string'

    ======================================================================
    FAIL: test_text (tkinter.test.test_ttk.test_widgets.CheckbuttonTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 381, in test_text
        self.checkParams(widget, 'text', '', 'any string')
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 89, in checkParams
        self.checkParam(widget, name, value, **kwargs)
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 63, in checkParam
        self.assertEqual2(t[4], expected, eq=eq)
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 41, in assertEqual2
        self.assertEqual(actual, expected, msg)
    AssertionError: ('any', 'string') != 'any string'

    ======================================================================
    FAIL: test_values (tkinter.test.test_ttk.test_widgets.ComboboxTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython/Lib/tkinter/test/test_ttk/test_widgets.py", line 363, in test_values
        self.checkParam(self.combo, 'values', (42, 3.14, '', 'any string'))
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 63, in checkParam
        self.assertEqual2(t[4], expected, eq=eq)
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 41, in assertEqual2
        self.assertEqual(actual, expected, msg)
    AssertionError: Tuples differ: (42, 3.14, '', ('any', 'string')) != (42, 3.14, '', 'any string')

    First differing element 3:
    ('any', 'string')
    any string

    • (42, 3.14, '', ('any', 'string'))
      ? - ^^^^ -

    + (42, 3.14, '', 'any string')
    ? ^

    ======================================================================
    FAIL: test_text (tkinter.test.test_ttk.test_widgets.LabelFrameTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 381, in test_text
        self.checkParams(widget, 'text', '', 'any string')
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 89, in checkParams
        self.checkParam(widget, name, value, **kwargs)
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 63, in checkParam
        self.assertEqual2(t[4], expected, eq=eq)
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 41, in assertEqual2
        self.assertEqual(actual, expected, msg)
    AssertionError: ('any', 'string') != 'any string'

    ======================================================================
    FAIL: test_text (tkinter.test.test_ttk.test_widgets.LabelTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 381, in test_text
        self.checkParams(widget, 'text', '', 'any string')
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 89, in checkParams
        self.checkParam(widget, name, value, **kwargs)
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 63, in checkParam
        self.assertEqual2(t[4], expected, eq=eq)
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 41, in assertEqual2
        self.assertEqual(actual, expected, msg)
    AssertionError: ('any', 'string') != 'any string'

    ======================================================================
    FAIL: test_text (tkinter.test.test_ttk.test_widgets.RadiobuttonTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 381, in test_text
        self.checkParams(widget, 'text', '', 'any string')
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 89, in checkParams
        self.checkParam(widget, name, value, **kwargs)
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 63, in checkParam
        self.assertEqual2(t[4], expected, eq=eq)
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 41, in assertEqual2
        self.assertEqual(actual, expected, msg)
    AssertionError: ('any', 'string') != 'any string'

    ======================================================================
    FAIL: test_value (tkinter.test.test_ttk.test_widgets.RadiobuttonTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython/Lib/tkinter/test/test_ttk/test_widgets.py", line 701, in test_value
        self.checkParams(widget, 'value', 1, 2.3, '', 'any string')
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 89, in checkParams
        self.checkParam(widget, name, value, **kwargs)
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 63, in checkParam
        self.assertEqual2(t[4], expected, eq=eq)
      File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 41, in assertEqual2
        self.assertEqual(actual, expected, msg)
    AssertionError: ('any', 'string') != 'any string'

    With the tkinter_configure_splitlist.patch patch they are passed again.

    @terryjreedy
    Copy link
    Member Author

    I read your explanation in relation to the code and got part of it but not all. I need to try another run through. I may try to locally (and temporarily), print to the console to see what is happening.

    I am also not clear on the relation between the UnicodeDecodeError and tuple splitting. Does '_flatten((self._w, cmd)))' call split or splitlist on the tuple arg? Is so, do you know why a problem with that would lead to the UDError? Does your patch fix the leading '0' regression?

    @serhiy-storchaka
    Copy link
    Member

    I am also not clear on the relation between the UnicodeDecodeError and tuple splitting. Does '_flatten((self._w, cmd)))' call split or splitlist on the tuple arg? Is so, do you know why a problem with that would lead to the UDError? Does your patch fix the leading '0' regression?

    The traceback is misleading. Full statement is:

                for x in self.tk.split(
                        self.tk.call(_flatten((self._w, cmd)))):

    Where cmd is ('entryconfigure', index). The UnicodeDecodeError error was raised neither by _flatten() nor call(), but by split().

    When run ./python -m idlelib.idle \\0.py call() returns and split() gets a tuple of tuples: (('-activebackground', '', '', '', ''), ('-activeforeground', '', '', '', ''), ('-accelerator', '', '', '', ''), ('-background', '', '', '', ''), ('-bitmap', '', '', '', ''), ('-columnbreak', '', '', 0, 0), ('-command', '', '', '', '3067328620open_recent_file'), ('-compound', 'compound', 'Compound', <index object: 'none'>, 'none'), ('-font', '', '', '', ''), ('-foreground', '', '', '', ''), ('-hidemargin', '', '', 0, 0), ('-image', '', '', '', ''), ('-label', '', '', '', '1 /home/serhiy/py/cpython/\0.py'), ('-state', '', '', <index object: 'normal'>, 'normal'), ('-underline', '', '', -1, 0)). When set wantobjects in Lib/tkinter/init.py to 0, it will get a string r"{-activebackground {} {} {} {}} {-activeforeground {} {} {} {}} {-accelerator {} {} {} {}} {-background {} {} {} {}} {-bitmap {} {} {} {}} {-columnbreak {} {} 0 0} {-command {} {} {} 3067013228open_recent_file} {-compound compound Compound none none} {-font {} {} {} {}} {-foreground {} {} {} {}} {-hidemargin {} {} 0 0} {-image {} {} {} {}} {-label {} {} {} {1 /home/serhiy/py/cpython/\0.py}} {-state {} {} normal normal} {-underline {} {} -1 0}". Then split() try recursively split its argument. When it splits '1 /home/serhiy/py/cpython/\0.py' it interprets '\0' as backslash substitution of octal code 0 which means a character with code 0. Tcl uses modified UTF-8 encoding in which null code is encoded as b'\xC0\x80'. This bytes sequence is invalid UTF-8. That is why UnicodeDecodeError was raised (patch for bpo-13153 handles b'\xC0\x80' more correctly). When you will try '\101.py', it will be translated by split() to 'A.py'.

    @serhiy-storchaka
    Copy link
    Member

    What is your opinion, Terry?

    @terryjreedy
    Copy link
    Member Author

    What I think:

    1. Perhaps I should have noticed that
      self.tk.call(_flatten((self._w, cmd)))):
      has 3 '('s and 4 ')'s and looked at the previous line for the complete expression.

    2. Perhaps Python should switch os.sep ('\\') and os.altsep ('/') on Windows and otherwise 'sanitize', as needed, all file names it gets from Windows, so it always uses '/' internally as the path separator on Windows as well as *nix. The current situation has been a constant headache. (Example: until patched this year, patchcheck.py did not work completely on Windows.) Beyond the scope of this issue.

    3. Without waiting for 2. to happen, perhaps Idle should do so. Another example of the \ problem: if one recursively searches c:/programs/python34/lib/idlelib, the output window will put out entries with mixed usage:
      c:/programs/python34/lib/idlelib\idle_test\test_rstrip.py:
      This is confusing to read and not much useful when copied for pasting.
      Also beyond the scope of this issue.

    4. Without waiting for 3, and given that tk is (just sometimes?) cooking strings as if they were literals, Idle should at least sanitize (\ to /) filenames it sends to tk to avoid cooking altogether. Is tk also replacing the 2 char sequence \t with the tab char?

    4a. I suspect the tk cooking behavior should be documented better than it is. I was not aware of it.

    1. Making the tkinter tests pass (when written correctly) is enough to justify a patch. Better soon than just before release.

    You did not directly say whether your patch fixes the Idle 0.py problem, but I presume the change to _configure() is intended to. In any case, I will try to test this on my system tomorrow.

    @serhiy-storchaka
    Copy link
    Member

    Sanitizing backslashes will not help when file names (or other returned strings, see msg202496) contains spaces or curly braces.

    @serhiy-storchaka
    Copy link
    Member

    And of cause we can't "sanitize" filenames which contains backslash on Unix.

    @serhiy-storchaka
    Copy link
    Member

    If there are no objections I'll commit these patches tomorrow.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 25, 2013

    New changeset ff70c298dd60 by Serhiy Storchaka in branch '2.7':
    Issue bpo-19020: Tkinter now uses splitlist() instead of split() in configure
    http://hg.python.org/cpython/rev/ff70c298dd60

    New changeset a8f5f8c44dc8 by Serhiy Storchaka in branch '3.3':
    Issue bpo-19020: Tkinter now uses splitlist() instead of split() in configure
    http://hg.python.org/cpython/rev/a8f5f8c44dc8

    New changeset c6ba24ffa4ba by Serhiy Storchaka in branch 'default':
    Issue bpo-19020: Tkinter now uses splitlist() instead of split() in configure
    http://hg.python.org/cpython/rev/c6ba24ffa4ba

    @terryjreedy
    Copy link
    Member Author

    I am assuming that Serhiy meant to close this.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-IDLE topic-tkinter type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants