This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add match parameter to filecmp.dircmp to ignore using patterns
Type: enhancement Stage: needs patch
Components: Library (Lib) Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, flxkid, gvanrossum, jafo, mamrhein, nikratio, vdupras
Priority: normal Keywords: patch

Created on 2008-01-04 21:30 by flxkid, last changed 2022-04-11 14:56 by admin.

Files
File name Uploaded Description Edit
filecmp.py.patch flxkid, 2008-01-05 02:28
wildcard.patch mamrhein, 2008-04-11 11:21 patch implementing an enhanced version of this feature review
add_match_func.patch mamrhein, 2008-04-11 16:10 revised patch review
add_match_func.patch mamrhein, 2008-04-23 16:43 2nd revised patch review
issue1738py3k.diff BreamoreBoy, 2010-09-19 10:43 review
issue1738.diff nikratio, 2014-03-16 19:26 review
issue1738_r2.diff nikratio, 2014-03-18 04:12 review
issue1738_r3.diff nikratio, 2014-04-13 19:00 review
Messages (19)
msg59262 - (view) Author: Oliver Nelson (flxkid) Date: 2008-01-04 21:34
dircmp's ignore and hide list only take exact files to ignore, not unix
filename pattern's.  This means you can't hide/ignore *.bak or something
similar.  Changing the _filter function adds this:

def newfilter(flist, skip):
  for pattern in skip:
    flist = list(ifilterfalse(fnmatch.filter(flist,
pattern).__contains__, flist))
  return flist
msg59265 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-01-04 23:06
I'm sorry, but can you rephrase that in the form of a patch? I can't
quite figure out what you're trying to say, except that it sounds like
it's scratching an itch of yours.
msg59276 - (view) Author: Oliver Nelson (flxkid) Date: 2008-01-05 02:21
Patch attached (sorry, this is my first bug report on an os project). 
dircmp has a list of files to ignore and hide.  These lists right now
are compared to the left and right lists using __contains__ to filter
out the ignore/hide lists.

This patch adds the ability to pass file patterns in addition to
filenames so that you can filter classes of files such as *.bak or temp*.*
msg59277 - (view) Author: Oliver Nelson (flxkid) Date: 2008-01-05 02:28
sorry...jacked up the patch file...new one attached
msg62893 - (view) Author: Virgil Dupras (vdupras) (Python triager) Date: 2008-02-24 11:26
The documentation doesn't say anything about dircmp being supposed to 
support pattern matching. This ticket is a feature request rather than a 
bug.
msg64138 - (view) Author: Sean Reifschneider (jafo) * (Python committer) Date: 2008-03-20 02:38
Please also include at least documentation changes, since this changes
the behavior of the module.  This would be in the file:
Doc/library/filecmp.rst

Also.  If possible a test would be great.  The file for this would be:
./Lib/test/test_filecmp.py
msg65348 - (view) Author: Michael Amrhein (mamrhein) Date: 2008-04-11 11:21
I've implemented an enhanced version of this feature by adding a keyword
'match' to the constructor of class 'dircmp'. It defaults to function
'fnmatch' imported from module 'fnmatch'.
This allows to exclude directories and/or files by using patterns like
'*.tmp'.
By giving a different function it's also possible to use more elaborated
patterns, for example, based on regular expressions.
Attached patch includes updates of documentation and test cases.
msg65351 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-04-11 14:00
+1 on adding the match argument.  Can you comment on how one would 
implement the old behavior? I would guess match=lambda x,y: x in y, 
which is not that bad, but maybe that should be the default and those 
who need pattern matching should use match=fnmatch.

On the patch itself, please don't change default arguments from None to 
lists or function.  There is a subtle difference between the two forms. 
For example, in your code if someone overrides filecmp.fnmatch before 
calling dircmp, old fnmatch will still be used.  If you do match=None in  
finction declaration and match is None check in the function body, then 
the new overridden value will be used in the above scenario.
msg65359 - (view) Author: Michael Amrhein (mamrhein) Date: 2008-04-11 16:10
Ok, I've set default arguments (back) to None. Revised patch attached.

Defaulting the match function to fnmatch doesn't change the behavior in
the "normal" case, i.e. when regular file / directory names are used,
like in the default value of ignore. It behaves different in two cases:
a) A string given in ignore contains wildcard character(s):
In this case this parameter would have no effect in the previous
implementation, because the string would not match any file / directory
name exactly. In the changed implementation all files / directories
matching the pattern would be ignored. If the wildcard(s) were included
by intent, this is what probably was intended; if they were included by
mistake, both version do not behave as intended.
b) File system is case-insensitive:
In this case the changed implementation will ignore files / directories
which the previous version did not ignore because of a case mismatch.
But, on such a file system this is what one would normally expect, I think.
So, in both cases, I feel the changed behavior is acceptable.
Or did I miss something?
msg65360 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-04-11 16:30
On Fri, Apr 11, 2008 at 12:10 PM, Michael Amrhein
<report@bugs.python.org> wrote:
>
..
>  a) A string given in ignore contains wildcard character(s):
>  In this case this parameter would have no effect in the previous
>  implementation, because the string would not match any file / directory
>  name exactly.

'*' is a perfectly legal filename character on most filesystems
msg65363 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-04-11 18:38
As you are working on this, please consider changing
self.hide+self.ignore in phase0 to chain(self.hide, self.ignore) where
chain should be imported from itertools. There is no need to create the
combined list (twice!) and not accepting arbitrary iterables for hide
and ignore seems to be against the zen of python.
msg65479 - (view) Author: Michael Amrhein (mamrhein) Date: 2008-04-14 20:06
> Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:
...
> 
> '*' is a perfectly legal filename character on most filesystems
> 
Oops! Never thought of putting a '*' into a file name.
Obviously, I should have tried before ...

Ok, then I agree that, for not breaking existing code, the match
function should default to string comparison.
I'll provide a second revised patch in the next days.
And, I'll chain ignore and hide, as you proposed.
msg65629 - (view) Author: Michael Amrhein (mamrhein) Date: 2008-04-19 09:06
There is one small issue I would like to discuss:
While the comparison of directory and file names in phase1 is
case-insensitive on case-insensitive systems (os.path.normcase applied
to each name), the filtering of ignore and hide in phase0 isn't. 
I can't imagine a good reason for this and would like to change it by
also applying os.name.normcase to each name in ignore and hide.
msg65701 - (view) Author: Michael Amrhein (mamrhein) Date: 2008-04-23 16:43
Here's a 2nd revised patch, which
* adds a keyword 'match' to the constructor of class 'dircmp'
* defaults 'match' to str.__eq__
* modifies method 'phase0': apply os.name.normcase to each name in
ignore and hide
* modifies the docs accordingly, incl. an example for using pattern matching
* modifies the test case for the default matching
* adds a test case for using pattern matching (fnmatch.fnmatch)
msg107430 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-09 22:29
The patch does not apply to py3k branch.
msg116857 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-09-19 10:43
Patch worked fine with 2.7.  I reworked it for SVN trunk but got this failure.

FAILED (failures=1)
Traceback (most recent call last):
  File "test_filecmp.py", line 179, in <module>
    test_main()
  File "test_filecmp.py", line 176, in test_main
    support.run_unittest(FileCompareTestCase, DirCompareTestCase)
  File "c:\py3k\lib\test\support.py", line 1128, in run_unittest
    _run_suite(suite)
  File "c:\py3k\lib\test\support.py", line 1111, in _run_suite
    raise TestFailed(err)
test.support.TestFailed: Traceback (most recent call last):
  File "test_filecmp.py", line 158, in test_dircmp_fnmatch
    self.assertEqual(d.left_list, ['file'])
AssertionError: Lists differ: ['dir-ignore', 'file', 'file.t... != ['file']

First differing element 0:
dir-ignore
file

First list contains 2 additional elements.
First extra element 1:
file

- ['dir-ignore', 'file', 'file.tmp']
+ ['file']

I've attached a py3k patch as a different pair of eyes is more likely to spot a problem.
msg213748 - (view) Author: Nikolaus Rath (nikratio) * Date: 2014-03-16 19:26
I don't think that we can just introduce path normalization in phase0. Even though I agree that this would be the proper way to do it when reimplementing from scratch, it breaks backward compatibility.

There also is a small mistake in that the *match* attribute should also be used for subdirectories in the `phase4` method.

Other than that, this patch looks good to me. I fixed the above issues, rebased on current hg tip, and added some missing markup in the documentation. After inspecting the code, it seems that there is no difference between directory entries being "hidden" by the *hide* parameter, and being "ignored* by the *ignore* parameter, so I also updated the documentation make this less confusing.

I could not reproduce the test failure reported by Mark, but this is most likely because I could not find out on what base revision to apply his patch. 

I think this is ready for commit.
msg213941 - (view) Author: Nikolaus Rath (nikratio) * Date: 2014-03-18 04:12
Attached is an updated patch that addresses the comments from Rietveld. Thanks for the feedback!
msg216030 - (view) Author: Nikolaus Rath (nikratio) * Date: 2014-04-13 19:00
Updated patch to acknowledge original authors in Misc/ACKS.
History
Date User Action Args
2022-04-11 14:56:29adminsetgithub: 46079
2014-04-13 19:00:44nikratiosetfiles: + issue1738_r3.diff

messages: + msg216030
2014-03-18 04:12:34nikratiosetfiles: + issue1738_r2.diff

messages: + msg213941
2014-03-16 19:28:16nikratiosettitle: Add match parameter to filecmp.dircmp to ignore name patterns -> Add match parameter to filecmp.dircmp to ignore using patterns
2014-03-16 19:27:44nikratiosetversions: + Python 3.5, - Python 3.2
title: filecmp.dircmp does exact match only -> Add match parameter to filecmp.dircmp to ignore name patterns
2014-03-16 19:26:58nikratiosetfiles: + issue1738.diff
nosy: + nikratio
messages: + msg213748

2014-02-03 19:42:36BreamoreBoysetnosy: - BreamoreBoy
2010-09-19 10:43:21BreamoreBoysetfiles: + issue1738py3k.diff
nosy: + BreamoreBoy
messages: + msg116857

2010-07-22 18:18:57belopolskysetassignee: belopolsky ->
2010-06-09 22:29:46belopolskysetassignee: belopolsky
messages: + msg107430
stage: needs patch
2010-06-09 22:14:10terry.reedysetversions: - Python 2.6, Python 3.1, Python 2.7
2010-06-09 22:13:56terry.reedysetversions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.5
2008-04-23 16:43:51mamrheinsetfiles: + add_match_func.patch
messages: + msg65701
2008-04-19 09:06:51mamrheinsetmessages: + msg65629
2008-04-14 20:06:22mamrheinsetmessages: + msg65479
2008-04-11 18:38:27belopolskysetmessages: + msg65363
2008-04-11 16:30:31belopolskysetmessages: + msg65360
2008-04-11 16:10:49mamrheinsetfiles: + add_match_func.patch
messages: + msg65359
2008-04-11 14:00:39belopolskysetnosy: + belopolsky
messages: + msg65351
2008-04-11 11:21:43mamrheinsetfiles: + wildcard.patch
nosy: + mamrhein
messages: + msg65348
versions: + Python 2.6
2008-03-20 02:38:56jafosetpriority: normal
nosy: + jafo
messages: + msg64138
keywords: + patch
2008-02-24 11:26:49vduprassetnosy: + vdupras
type: behavior -> enhancement
messages: + msg62893
components: + Library (Lib), - None
2008-01-19 22:57:43georg.brandlsetfiles: - filecmp.py.patch
2008-01-05 02:28:58flxkidsetfiles: + filecmp.py.patch
messages: + msg59277
2008-01-05 02:22:00flxkidsetfiles: + filecmp.py.patch
messages: + msg59276
2008-01-04 23:06:00gvanrossumsetnosy: + gvanrossum
messages: + msg59265
2008-01-04 21:34:05flxkidsetmessages: + msg59262
2008-01-04 21:30:31flxkidcreate