classification
Title: IDLE crashes on *Edit / Find in files ...* command
Type: behavior Stage: test needed
Components: IDLE Versions: Python 3.5, Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: terry.reedy Nosy List: fgracia, python-dev, roger.serwy, terry.reedy
Priority: normal Keywords: patch

Created on 2012-05-27 20:47 by fgracia, last changed 2014-10-02 17:47 by terry.reedy.

Files
File name Uploaded Description Edit
prelim_14929.patch roger.serwy, 2012-05-27 22:27 review
Messages (9)
msg161727 - (view) Author: Francisco Gracia (fgracia) Date: 2012-05-27 20:47
There is little more that I can add to the title statement.

   1. Start IDLE
   2. Go to *Edit* menu option
   3. Select *Find in files...* option
   4. Put some word in the *Find* input box
   5. Press *Search files* button

When the command is issued the disappearance of IDLE's window is so quick that almost nothing can be seen. There is however a hint that a new window pops up with some message, that I have been unable to read. With some input strings or if the search is happening in a directory with many *.py files a quickly moving list of messages shows in this window, but the final result is the same: IDLE disappears immediately.
msg161738 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012-05-27 21:38
When running IDLE from the terminal on Ubuntu, I get the following error:

Exception in Tkinter callback
Traceback (most recent call last):
  File "/home/serwy/python/cpython/Lib/tkinter/__init__.py", line 1442, in __call__
    return self.func(*args)
  File "./idlelib/GrepDialog.py", line 70, in default_command
    self.grep_it(prog, path)
  File "./idlelib/GrepDialog.py", line 90, in grep_it
    block = f.readlines(100000)
  File "/home/serwy/python/cpython/Lib/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 120: invalid continuation byte

The reason why IDLE closes suddenly on Windows is described in issue13582.
msg161741 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012-05-27 22:27
The GrepDialog opens a file using plain "open", without specifying the encoding or how to handle errors. The docs for "open" says that "the default encoding is platform dependent (whatever locale.getpreferredencoding() returns)..." This can be problematic, as files can have different encodings and GrepDialog at present has no way to detect file encodings.

Attached is a preliminary patch to replace code points if the default decoder encounters errors.
msg161753 - (view) Author: Roundup Robot (python-dev) Date: 2012-05-28 03:06
New changeset 12454f78967b by Terry Jan Reedy in branch '3.2':
Issue14929: Stop Idle 3.x from closing on Unicode decode errors when grepping.
http://hg.python.org/cpython/rev/12454f78967b

New changeset 058c3099e82d by Terry Jan Reedy in branch 'default':
Merge 3.2 #14929
http://hg.python.org/cpython/rev/058c3099e82d
msg161754 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-05-28 03:08
Interesting -- and nasty. I have used this successfully with both 3.2 and now 3.3 on Win7 to search idlelib/*.py files and on a directory of my own files, all written by Idle and been quite pleased with the speed.

I just tried searching /Lib/*.py for 'itertools' and quickly got 10 results (up to decimal.py) before Idle disappeared. OK, import idlelib/idle in console, repeat, and I get the same traceback up to

  File "C:\Programs\Python33\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1877: character maps to <undefined>

This is not a problem on 2.7.

Patch fixes problem, so I committed it so it will be in 3.3.0a4 (in a few days).

The default extension is .py. The default encoding for .py files is utf-8. I think that is the default for what Idle writes. So I think this should be the default encoding (explicitly given) at least for .py files.
msg161755 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-05-28 03:13
Hit send too soon;-). With that much done, we can think about a more complete fix. See last paragraph above. Also, perhaps dialog box could have encodings field. People should be able to grep python code for any legal identifier, and this means proper decoding according to the encoding they actually use.

Francisco, thanks for reporting this, and Roger, thanks for quick fix.
msg162208 - (view) Author: Francisco Gracia (fgracia) Date: 2012-06-03 12:07
While your are at it, here is another suggestion: what the *Find in files ...* dialog needs most urgently in my opinion is a field for specifying clearly the directory from which the user wants to launch the search.

Also in my modest opinion, having an input field for encondings would be good, although detecting the encodings of the handled files is something that any self respecting software should reliably do by itself.
msg222783 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-07-11 19:34
The 'In files:' field specifies the search directory. I opened #21960 about it being more informative when using Find in Files from the Shell window.

Perfect encoding detection is a fantasy; decent encoding detection by heuristics is slow and not something to do when searchings 100s, 1000s, or more files. What would be reasonable is to check for an encoding cookie. Idle already does this when opening a .py file in the editor. That code should be encapsulated if it is not and reused.

An encoding field for non-python files would be possible, but is a lesser priority to me.
msg228247 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-10-02 17:47
As the 'crash' seems to have been solved, this should probably be closed and a new issue about encodings opened. Also, since the re module used to search lines within files can work with both bytes and strings, there might be an option to search undecoded bytes.
History
Date User Action Args
2014-10-02 17:47:46terry.reedysetmessages: + msg228247
2014-07-11 19:35:02terry.reedysetstage: needs patch -> test needed
versions: + Python 3.4, Python 3.5, - Python 3.2, Python 3.3
2014-07-11 19:34:47terry.reedysetmessages: + msg222783
2012-06-03 12:07:30fgraciasetmessages: + msg162208
2012-05-28 03:13:07terry.reedysetmessages: + msg161755
stage: needs patch
2012-05-28 03:08:55terry.reedysetassignee: terry.reedy
messages: + msg161754
versions: + Python 3.3
2012-05-28 03:06:41python-devsetnosy: + python-dev
messages: + msg161753
2012-05-27 22:27:47roger.serwysetfiles: + prelim_14929.patch
keywords: + patch
messages: + msg161741
2012-05-27 21:38:34roger.serwysettitle: IDLE crashes on *Edit / Find in files ...* command (Python 3.2, Windows XP) -> IDLE crashes on *Edit / Find in files ...* command
nosy: + terry.reedy, roger.serwy

messages: + msg161738

type: behavior
2012-05-27 20:47:21fgraciacreate