classification
Title: Regression: Windows-tkinter-idle, unicode, and 0xxx filename
Type: behavior Stage: commit review
Components: IDLE, Tkinter Versions: Python 3.4, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: bsherwood, ezio.melotti, gpolo, haypo, python-dev, serhiy.storchaka, terry.reedy
Priority: normal Keywords: patch

Created on 2013-09-14 22:11 by terry.reedy, last changed 2014-01-22 08:35 by terry.reedy. This issue is now closed.

Files
File name Uploaded Description Edit
tkinter_configure_splitlist.patch serhiy.storchaka, 2013-09-15 09:24 review
tkinter_checkParam_configure.patch serhiy.storchaka, 2013-11-09 20:31 Test tuple values with configure review
Messages (16)
msg197736 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-09-14 22:11
Current 3.3.2+ repository build (32 bit) and 3.4.0a2 repository build (32 bit) and installation (64 bit) have a problem that did not exist in my Win6 3.3.2 installation (64 bit). (Bruce Sherwood discovered the symption with some installed version of 3.4.0a?) Adding to recent files a path with a directory or file whose name begins with '0' causes Idle to stop with a traceback such as given below. Currently, opening *any* file causes the recent files list to be rebuilt (seen in traceback). So the problem happens whether one opens a new file with a leading 0 somewhere or if there merely is one such already on the list.

Exception in Tkinter callback
Traceback (most recent call last):
  File "C:\Python34\lib\tkinter\__init__.py", line 1475, in __call__
    return self.func(*args)
  File "C:\Python34\lib\idlelib\IOBinding.py", line 179, in open
    flist.open(filename, self.loadfile)
  File "C:\Python34\lib\idlelib\FileList.py", line 34, in open
    return action(filename)
  File "C:\Python34\lib\idlelib\IOBinding.py", line 240, in loadfile
    self.updaterecentfileslist(filename)
  File "C:\Python34\lib\idlelib\IOBinding.py", line 521, in updaterecentfileslist
    self.editwin.update_recent_files_list(filename)
  File "C:\Python34\lib\idlelib\EditorWindow.py", line 915, in update_recent_files_list
    menu.delete(0, END)  # clear, and rebuild:
  File "C:\Python34\lib\tkinter\__init__.py", line 2739, in delete
    if 'command' in self.entryconfig(i):
  File "C:\Python34\lib\tkinter\__init__.py", line 2749, in entryconfigure
    return self._configure(('entryconfigure', index), cnf, kw)
  File "C:\Python34\lib\tkinter\__init__.py", line 1247, in _configure
    self.tk.call(_flatten((self._w, cmd)))):
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 14: invalid start byte

'invalid start byte' 0xc0 has appeared before in other issues.

Bruce got this with
  H:\HP_Documents\0PythonWork\AirplaneKinematics\accel2.py
I get 12' instead of '14' with
  F:\Python\mypy\0...' (file or directory, must be 0, not 1)
It appears that the string is being improperly 're-cooked' at some point, so that '\x' sequences are 1 position rather than 2.

At the point of failure, I believe tk is checking whether a particular menuitem has an associated callback that should be deleted before the menuitem itself is deleted (before later being rebuilt). (It is a separate issue of whether we could just adjust the recent file list (sub sub menu) rather than deleting and rebuilding it, possibly the same as it was. Even if we did, the issue would still arise when opening a new '0' file, or when closing Idle.)
msg197738 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-15 00:15
Perhaps this is caused by issue18101.
msg197758 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-15 09:24
There are two methods: splitlist() and split(). The splitlist() method splits only one level and always returns a tuple (even if it is empty or one-element). The split() method splits recursive and can returns a string if it is not splittable. I.e. result type of split() depends on if the argument contains spaces.

Before issue18101 split() was broken on Python 3. It splitted recursively only bytes objects but not unicode strings. After "fixing" it becomes split already parsed tuples of strings.

Perhaps split() shouldn't recursive parse tuples. But in this case its purpose is questionable. Recursive parse only string argument? But how it can distinguish the list of two strings "a" and "b c" from the list of the "a" string and the ["b", "c"] list? Both are spelled as {a {b c}} in Tcl.

The proposed patch gets rid from split() in the configure result parsing code and uses two-level splitline() instead. Perhaps we should replace all left split()-s to the appropriate number of splitline()-s.
msg198804 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-01 20:27
What would you say about this patch Terry?
msg198811 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-10-01 22:34
I cannot say much since I do not know what .split and .splitlist do or are supposed to do. They have no docstrings. They are methods of tkinter.Tk().tk, the app or 'interpreter' returned by _tkinter.create. Modules/_tkinker.c maps then to the C functions TkappSplit(List), which have no comments.
msg198860 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-02 18:44
Tcl is weak typed language and all Tcl values formally are strings. "123" is the 123 integer, the "123" string, and Tcl list containing one element "123" (which can be a number, a string, a list, etc). Actually for optimization Tcl uses different specialized types internally and Tkinter uses it for converting Tcl values to Python values (when wantobject is true). When wantobject is false, tkinter always returns string. If Tkinter encounters unknown to it Tcl type, it returns Tcl_Obj. Tcl introduces new types in new versions and Tcl function which returned string or Tcl list in old version can return new type in new version.

So any Tkinter method which supposed return a "list", can return a tuple, a string, or a Tcl_Obj.

splitlist() splits a string, a tuple or a Tcl_Obj to Python tuple.

'' -> ()
'abc' -> ('abc',)
'abc def' -> ('abc', 'def')
'abc {def ghi}' -> ('abc', 'def ghi')

It always returns a tuple (of strings if an argument is a string). If an argument already is a tuple, splitlist() just returns it. If an argument is Tcl list, splitlist() returns a tuple which contains it's elements.

split() is more intelligent. It try guess a structure of data and splits "list" to subelements while it is possible.

'' -> ''
'abc' -> 'abc'
'abc def' -> ('abc', 'def')
'abc {def ghi}' -> ('abc', ('def', 'ghi'))

If an argument is a tuple, split() recursively splits it's elements. When an argument is TclObj, split() returns a string if Tcl list has 0 or 1 element, otherwise it returns the same value as splitlist().
msg202496 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-09 20:31
Tests added in issue19085 have a special case for tuple values because widget[name] and widget.configure(name) return different results in such cases. When remove this special case, following tests fails:

======================================================================
FAIL: test_text (tkinter.test.test_ttk.test_widgets.ButtonTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 381, in test_text
    self.checkParams(widget, 'text', '', 'any string')
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 89, in checkParams
    self.checkParam(widget, name, value, **kwargs)
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 63, in checkParam
    self.assertEqual2(t[4], expected, eq=eq)
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 41, in assertEqual2
    self.assertEqual(actual, expected, msg)
AssertionError: ('any', 'string') != 'any string'

======================================================================
FAIL: test_offvalue (tkinter.test.test_ttk.test_widgets.CheckbuttonTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/tkinter/test/test_ttk/test_widgets.py", line 248, in test_offvalue
    self.checkParams(widget, 'offvalue', 1, 2.3, '', 'any string')
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 89, in checkParams
    self.checkParam(widget, name, value, **kwargs)
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 63, in checkParam
    self.assertEqual2(t[4], expected, eq=eq)
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 41, in assertEqual2
    self.assertEqual(actual, expected, msg)
AssertionError: ('any', 'string') != 'any string'

======================================================================
FAIL: test_onvalue (tkinter.test.test_ttk.test_widgets.CheckbuttonTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/tkinter/test/test_ttk/test_widgets.py", line 252, in test_onvalue
    self.checkParams(widget, 'onvalue', 1, 2.3, '', 'any string')
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 89, in checkParams
    self.checkParam(widget, name, value, **kwargs)
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 63, in checkParam
    self.assertEqual2(t[4], expected, eq=eq)
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 41, in assertEqual2
    self.assertEqual(actual, expected, msg)
AssertionError: ('any', 'string') != 'any string'

======================================================================
FAIL: test_text (tkinter.test.test_ttk.test_widgets.CheckbuttonTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 381, in test_text
    self.checkParams(widget, 'text', '', 'any string')
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 89, in checkParams
    self.checkParam(widget, name, value, **kwargs)
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 63, in checkParam
    self.assertEqual2(t[4], expected, eq=eq)
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 41, in assertEqual2
    self.assertEqual(actual, expected, msg)
AssertionError: ('any', 'string') != 'any string'

======================================================================
FAIL: test_values (tkinter.test.test_ttk.test_widgets.ComboboxTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/tkinter/test/test_ttk/test_widgets.py", line 363, in test_values
    self.checkParam(self.combo, 'values', (42, 3.14, '', 'any string'))
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 63, in checkParam
    self.assertEqual2(t[4], expected, eq=eq)
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 41, in assertEqual2
    self.assertEqual(actual, expected, msg)
AssertionError: Tuples differ: (42, 3.14, '', ('any', 'string')) != (42, 3.14, '', 'any string')

First differing element 3:
('any', 'string')
any string

- (42, 3.14, '', ('any', 'string'))
?                -    ^^^^        -

+ (42, 3.14, '', 'any string')
?                    ^


======================================================================
FAIL: test_text (tkinter.test.test_ttk.test_widgets.LabelFrameTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 381, in test_text
    self.checkParams(widget, 'text', '', 'any string')
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 89, in checkParams
    self.checkParam(widget, name, value, **kwargs)
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 63, in checkParam
    self.assertEqual2(t[4], expected, eq=eq)
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 41, in assertEqual2
    self.assertEqual(actual, expected, msg)
AssertionError: ('any', 'string') != 'any string'

======================================================================
FAIL: test_text (tkinter.test.test_ttk.test_widgets.LabelTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 381, in test_text
    self.checkParams(widget, 'text', '', 'any string')
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 89, in checkParams
    self.checkParam(widget, name, value, **kwargs)
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 63, in checkParam
    self.assertEqual2(t[4], expected, eq=eq)
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 41, in assertEqual2
    self.assertEqual(actual, expected, msg)
AssertionError: ('any', 'string') != 'any string'

======================================================================
FAIL: test_text (tkinter.test.test_ttk.test_widgets.RadiobuttonTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 381, in test_text
    self.checkParams(widget, 'text', '', 'any string')
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 89, in checkParams
    self.checkParam(widget, name, value, **kwargs)
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 63, in checkParam
    self.assertEqual2(t[4], expected, eq=eq)
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 41, in assertEqual2
    self.assertEqual(actual, expected, msg)
AssertionError: ('any', 'string') != 'any string'

======================================================================
FAIL: test_value (tkinter.test.test_ttk.test_widgets.RadiobuttonTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/tkinter/test/test_ttk/test_widgets.py", line 701, in test_value
    self.checkParams(widget, 'value', 1, 2.3, '', 'any string')
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 89, in checkParams
    self.checkParam(widget, name, value, **kwargs)
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 63, in checkParam
    self.assertEqual2(t[4], expected, eq=eq)
  File "/home/serhiy/py/cpython/Lib/tkinter/test/widget_tests.py", line 41, in assertEqual2
    self.assertEqual(actual, expected, msg)
AssertionError: ('any', 'string') != 'any string'

With the tkinter_configure_splitlist.patch patch they are passed again.
msg202516 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-11-10 09:26
I read your explanation in relation to the code and got part of it but not all. I need to try another run through. I may try to locally (and temporarily), print to the console to see what is happening.

I am also not clear on the relation between the UnicodeDecodeError and tuple splitting. Does '_flatten((self._w, cmd)))' call split or splitlist on the tuple arg? Is so, do you know why a problem with that would lead to the UDError? Does your patch fix the leading '0' regression?
msg202542 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-10 18:02
> I am also not clear on the relation between the UnicodeDecodeError and tuple splitting. Does '_flatten((self._w, cmd)))' call split or splitlist on the tuple arg? Is so, do you know why a problem with that would lead to the UDError? Does your patch fix the leading '0' regression?

The traceback is misleading. Full statement is:

            for x in self.tk.split(
                    self.tk.call(_flatten((self._w, cmd)))):

Where cmd is ('entryconfigure', index). The UnicodeDecodeError error was raised neither by _flatten() nor call(), but by split().

When run `./python -m idlelib.idle \\0.py` call() returns and split() gets a tuple of tuples: (('-activebackground', '', '', '', ''), ('-activeforeground', '', '', '', ''), ('-accelerator', '', '', '', ''), ('-background', '', '', '', ''), ('-bitmap', '', '', '', ''), ('-columnbreak', '', '', 0, 0), ('-command', '', '', '', '3067328620open_recent_file'), ('-compound', 'compound', 'Compound', <index object: 'none'>, 'none'), ('-font', '', '', '', ''), ('-foreground', '', '', '', ''), ('-hidemargin', '', '', 0, 0), ('-image', '', '', '', ''), ('-label', '', '', '', '1 /home/serhiy/py/cpython/\\0.py'), ('-state', '', '', <index object: 'normal'>, 'normal'), ('-underline', '', '', -1, 0)). When set wantobjects in Lib/tkinter/__init__.py to 0, it will get a string r"{-activebackground {} {} {} {}} {-activeforeground {} {} {} {}} {-accelerator {} {} {} {}} {-background {} {} {} {}} {-bitmap {} {} {} {}} {-columnbreak {} {} 0 0} {-command {} {} {} 3067013228open_recent_file} {-compound compound Compound none none} {-font {} {} {} {}} {-foreground {} {} {} {}} {-hidemargin {} {} 0 0} {-image {} {} {} {}} {-label {} {} {} {1 /home/serhiy/py/cpython/\0.py}} {-state {} {} normal normal} {-underline {} {} -1 0}".  Then split() try recursively split its argument. When it splits '1 /home/serhiy/py/cpython/\\0.py' it interprets '\\0' as backslash substitution of octal code 0 which means a character with code 0. Tcl uses modified UTF-8 encoding in which null code is encoded as b'\xC0\x80'. This bytes sequence is invalid UTF-8. That is why UnicodeDecodeError was raised (patch for issue13153 handles b'\xC0\x80' more correctly). When you will try '\101.py', it will be translated by split() to 'A.py'.
msg206556 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-18 21:58
What is your opinion, Terry?
msg206570 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-12-19 00:47
What I think:

1. Perhaps I should have noticed that
        self.tk.call(_flatten((self._w, cmd)))):
has 3 '('s and 4 ')'s and looked at the previous line for the complete expression.

2. Perhaps Python should switch os.sep ('\\') and os.altsep ('/') on Windows and otherwise 'sanitize', as needed, all file names it gets from Windows, so it always uses '/' internally as the path separator on Windows as well as *nix. The current situation has been a constant headache. (Example: until patched this year, patchcheck.py did not work completely on Windows.) Beyond the scope of this issue.

3. Without waiting for 2. to happen, perhaps Idle should do so. Another example of the \ problem: if one recursively searches c:/programs/python34/lib/idlelib, the output window will put out entries with mixed usage:
 c:/programs/python34/lib/idlelib\idle_test\test_rstrip.py:
This is confusing to read and not much useful when copied for pasting.
Also beyond the scope of this issue.

4. Without waiting for 3, and given that tk is (just sometimes?) cooking strings as if they were literals, Idle should at least sanitize (\ to /) filenames it sends to tk to avoid cooking altogether. Is tk also replacing the 2 char sequence \t with the tab char?

4a. I suspect the tk cooking behavior should be documented better than it is. I was not aware of it.

5. Making the tkinter tests pass (when written correctly) is enough to justify a patch. Better soon than just before release.

You did not directly say whether your patch fixes the Idle 0.py problem, but I presume the change to _configure() is intended to. In any case, I will try to test this on my system tomorrow.
msg206586 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-19 07:43
Sanitizing backslashes will not help when file names (or other returned strings, see msg202496) contains spaces or curly braces.
msg206587 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-19 07:48
And of cause we can't "sanitize" filenames which contains backslash on Unix.
msg206868 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-23 16:59
If there are no objections I'll commit these patches tomorrow.
msg206924 - (view) Author: Roundup Robot (python-dev) Date: 2013-12-25 14:43
New changeset ff70c298dd60 by Serhiy Storchaka in branch '2.7':
Issue #19020: Tkinter now uses splitlist() instead of split() in configure
http://hg.python.org/cpython/rev/ff70c298dd60

New changeset a8f5f8c44dc8 by Serhiy Storchaka in branch '3.3':
Issue #19020: Tkinter now uses splitlist() instead of split() in configure
http://hg.python.org/cpython/rev/a8f5f8c44dc8

New changeset c6ba24ffa4ba by Serhiy Storchaka in branch 'default':
Issue #19020: Tkinter now uses splitlist() instead of split() in configure
http://hg.python.org/cpython/rev/c6ba24ffa4ba
msg208758 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-01-22 08:35
I am assuming that Serhiy meant to close this.
History
Date User Action Args
2014-07-08 18:08:10serhiy.storchakalinkissue5712 superseder
2014-02-09 19:35:03serhiy.storchakalinkissue1602742 superseder
2014-01-22 08:35:53terry.reedysetstatus: open -> closed

messages: + msg208758
2013-12-25 14:59:37serhiy.storchakasetstage: patch review -> commit review
resolution: fixed
versions: + Python 2.7
2013-12-25 14:43:23python-devsetnosy: + python-dev
messages: + msg206924
2013-12-23 16:59:06serhiy.storchakasetmessages: + msg206868
2013-12-19 07:48:53serhiy.storchakasetmessages: + msg206587
2013-12-19 07:43:39serhiy.storchakasetmessages: + msg206586
2013-12-19 00:47:59terry.reedysetmessages: + msg206570
2013-12-18 21:58:50serhiy.storchakasetmessages: + msg206556
2013-11-10 18:02:19serhiy.storchakasetmessages: + msg202542
2013-11-10 09:26:56terry.reedysetmessages: + msg202516
2013-11-09 20:31:51serhiy.storchakasetfiles: + tkinter_checkParam_configure.patch

messages: + msg202496
2013-10-02 18:44:22serhiy.storchakasetmessages: + msg198860
2013-10-01 22:34:46terry.reedysetmessages: + msg198811
2013-10-01 20:49:12serhiy.storchakasetmessages: - msg198805
2013-10-01 20:29:03serhiy.storchakasetmessages: + msg198805
2013-10-01 20:27:04serhiy.storchakasetmessages: + msg198804
2013-09-15 09:24:17serhiy.storchakasetfiles: + tkinter_configure_splitlist.patch

assignee: serhiy.storchaka
components: + IDLE, Tkinter

keywords: + patch
nosy: + gpolo
messages: + msg197758
stage: needs patch -> patch review
2013-09-15 00:15:15serhiy.storchakasetmessages: + msg197738
2013-09-14 22:11:22terry.reedycreate