Title: Unit test needed for IDLE's's findfiles()
Type: enhancement Stage: resolved
Components: IDLE Versions: Python 3.8, Python 3.7
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: cheryl.sabella Nosy List: Al.Sweigart, cheryl.sabella, miss-islington, terry.reedy
Priority: normal Keywords: patch

Created on 2015-01-09 08:16 by Al.Sweigart, last changed 2019-03-23 17:51 by cheryl.sabella. This issue is now closed.

File name Uploaded Description Edit
idle_grep_findfiles_test.diff Al.Sweigart, 2015-01-09 08:15 review
Pull Requests
URL Status Linked Edit
PR 12203 merged cheryl.sabella, 2019-03-06 20:05
PR 12512 merged miss-islington, 2019-03-23 12:02
Messages (10)
msg233727 - (view) Author: Al Sweigart (Al.Sweigart) * Date: 2015-01-09 08:15's findfiles() method lacks a unit test.

The comments in the unit test stub in correctly notes that since findfiles() method does not rely on the GrepDialog class, it can be made into a function.

The attached patch does the following:

- Moves findfiles() to be a function in the module.
- Adds a unit test for findfiles()
- Adds sensible default arguments
- As suggested by the previous stub comments, findfiles() returns a generator instead of a full list
- Changes 'list' and 'dir' names since those are built-ins
- Uses os.walk() instead of reinventing it.

There were so many changes to make that I eventually just rewrote the entire findfiles function. I've checked that the new version returns the same strings as the old version.
msg233810 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-01-10 09:29
I have been putting off re-writing findfiles because it partly duplicates os.listdir, which should perhaps be used instead, except that a new, improved, os.scandir is in the works: PEP 471 and  #22524.

I believe filefiles currently searches depth first, whereas I would generally prefer breadth first.  For instance, I would like all hits in /lib together and then hits in /lib/idlelib, /lib/test, /lib/tinter, etc.  What do you think?
msg233834 - (view) Author: Al Sweigart (Al.Sweigart) * Date: 2015-01-11 04:31
I checked with a couple grep programs and they use depth first. Which makes sense, since you'd want the return order to be something like:


...instead of the bread-first:


However, it turns out this is moot since the first thing does with the returned results is sort them.
msg336437 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-02-24 06:07
Findfiles was more like os.walk than os.listdir.  os.walk now uses os.scandir instead of os.listdir.  From 3.7.2 doc:

"By default, errors from the scandir() call are ignored. If optional argument onerror is specified, it should be a function; it will be called with one argument, an OSError instance. It can report the error to continue with the walk, or raise the exception to abort the walk. Note that the filename is available as the filename attribute of the exception object."

We should delete the try: except: and pass in a function to print the name and error message.  This could be done after getting a working patch.

Defaults are not needed as findfiles is called with 3 positional args.  The calling method should do the replacement of '' with os.curdir().

The patch must be incomplete as generators do not have a sort method.  Replace the first 3 lines of grep_it with

        folder, filepat = os.path.split(path)
        if not folder:
            folder = os.curdir()
        filelist = sorted(findfiles(folder, filepat, self.recvar.get())

and replace 'list' with 'filelist' in the following code.

Cheryl, grab the issue if you want to do this.
msg336482 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2019-02-24 20:16
Sure, thanks.  grep was on list to add tests to, so I'll take a look at this as well.  Thanks!
msg337339 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2019-03-06 20:21
I've opened a PR with the changes.  I did several commits, one for each stage.  That is, the first one adds the test, then the second one moves `findfiles` to the module level, etc.

I added the `onerror` function because if a directory that doesn't exist is entered as the path, it's nice to see that message at the top of the output window instead of just seeing zero hits.  Plus, I had made a test for it.  :-)

One change that I didn't commit is that an alternative version of findfiles would be:

def findfiles(folder, pattern, recursive):
    prefix = '**/' if recursive else ''
    yield from(pathlib.Path(folder).glob(f'{prefix}{pattern}'))

The tests would have to be reworked, but manual testing showed it gave the same results, albeit without the `onerror`.

One other comment about the sorting.  If you change the `sorted()` in `grep_it()` to `list()` when you're looking at manual results, you'll see that `list()` shows all of one directory first, then all of the first child directory, etc (which makes sense since walk does each directory at a time).  It's a quick way to compare the depth-first vs breadth first results.
msg338653 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-03-23 07:46
Commit as-is.  I will consider the Path.glob findfiles while working on #36323.  Like walk, it yields in breadth first order.  Unlike walk, it requires relative paths (which #37323 may need anyway) and needs code to deal with that.

I like the breadth first listing from glob.  Would it make any sense to give users a choice?  What does unix grep do?
msg338668 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2019-03-23 11:33
New changeset d60f658fc0278f3fcdadec8ddcab35b8ae03e1d1 by Cheryl Sabella in branch 'master':
bpo-23205: IDLE: Add tests and refactor grep's findfiles (GH-12203)
msg338674 - (view) Author: miss-islington (miss-islington) Date: 2019-03-23 12:21
New changeset 5ab665005b7f8a21c133208f140389e3bb1a3294 by Miss Islington (bot) in branch '3.7':
bpo-23205: IDLE: Add tests and refactor grep's findfiles (GH-12203)
msg338691 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2019-03-23 17:51
On linux, grep does depth first, so searching for 'idle' from Lib.idlelib returns:

--- cut ---
all of idle_test/
--- cut ---

Although, within idle_test, the files aren't in alphabetical order.  Also, as you can see __init__ is before iomenu, so underscores seem to be ignored.  On SO, it looks like they suggest piping it to sort if one wants a given ordering.
Date User Action Args
2019-03-23 17:51:13cheryl.sabellasetmessages: + msg338691
2019-03-23 12:23:39cheryl.sabellasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-03-23 12:21:49miss-islingtonsetnosy: + miss-islington
messages: + msg338674
2019-03-23 12:02:23miss-islingtonsetstage: needs patch -> patch review
pull_requests: + pull_request12463
2019-03-23 11:33:52cheryl.sabellasetmessages: + msg338668
2019-03-23 07:46:06terry.reedysetmessages: + msg338653
2019-03-06 20:21:54cheryl.sabellasetmessages: + msg337339
stage: patch review -> needs patch
2019-03-06 20:05:23cheryl.sabellasetstage: needs patch -> patch review
pull_requests: + pull_request12198
2019-02-24 20:16:48cheryl.sabellasetassignee: cheryl.sabella
messages: + msg336482
2019-02-24 06:07:08terry.reedysetversions: + Python 3.7, Python 3.8, - Python 2.7, Python 3.4, Python 3.5
nosy: + cheryl.sabella

messages: + msg336437

stage: patch review -> needs patch
2015-01-11 04:31:10Al.Sweigartsetmessages: + msg233834
2015-01-10 09:29:49terry.reedysetversions: + Python 2.7, Python 3.4
nosy: + terry.reedy

messages: + msg233810

type: enhancement
stage: patch review
2015-01-09 08:16:00Al.Sweigartcreate