classification
Title: Add fnmatch.filterfalse function
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: 30415 Superseder:
Assigned To: Nosy List: Roee Nizan, brett.cannon, rhettinger, serhiy.storchaka, steven.daprano, vstinner, wolma
Priority: normal Keywords: patch

Created on 2017-05-20 13:19 by steven.daprano, last changed 2019-08-21 16:32 by gvanrossum.

Files
File name Uploaded Description Edit
filterfalse.diff steven.daprano, 2017-05-20 13:19 review
filterfalse.alternate_patch wolma, 2017-05-20 17:08 review
Messages (17)
msg294029 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2017-05-20 13:19
There has been some discussion on Python-Ideas about adding fnmatch.filter_false to complement filter, for when you want to ignore matching files rather than non-matching ones.

https://mail.python.org/pipermail/python-ideas/2017-May/045694.html

I don't believe there have been any strong objections, and work-arounds have a considerable performance penalty (according to the OP, I have not confirmed that).

Here's a patch which refactors filter and adds filter_false.

Sorry I haven't got my head around the brave new world on Github, so I'm going old school here with a diff :-)
msg294031 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-20 14:31
In general LGTM. I left few minor comments on Rietveld. Needed the versionadded directive, Misc/NEWS and What's New entries.

But shouldn't the new function have name "filterfalse" for parallel with itertools.filterfalse? And this would better match the style of the fnmatch module, fnmatchcase doesn't contain an underscore.

Tests for fnmatch.filter are too poor. I think we should add more tests (this is other issue).
msg294036 - (view) Author: Wolfgang Maier (wolma) * Date: 2017-05-20 17:08
Hi,
seems I had the same thoughts as you, Steven. I had started working on a patch independently yesterday, but after making my changes to fnmatch itself, I found I had too many other things to do to address unittests and doc changes to turn this into a real patch - so thank you for spending time on all of this.
I just downloaded your patch and merged it with mine because I think my version of fnmatch.py may be simpler and slightly faster (though like you I didn't run performance tests). Feel free to do whatever you like with this alternate version - it's not that there is much new in it to take credit for :)

Another thing I noted: fnmatch takes a rather unusual approach to determine whether normcase is NOP or not. It imports posixpath only to see if it is the same as os.path. That looks pretty weird and wasteful to me (especially if you are running this on Windows and are effectively importing posixpath just for fun then). I think it would be better to just check os.name instead (like pathlib does for example). I moved the corresponding check to the top of the module to make this easier to address, should that be of interest. I'm also using the result of the check in fnmatch.fnmatch because I don't see any reason why you wouldn't want to benefit from it there as well.
msg294038 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-20 17:26
Your patch makes filter() returning normalized names. This may be not what is expected.
msg294039 - (view) Author: Wolfgang Maier (wolma) * Date: 2017-05-20 17:31
Does it? I thought it does so only if normalize_case is True.
Did I miss something?
msg294041 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-20 17:34
Yes, it does so only if normalize_case is True. Currently fnmatch.filter() returns non-normalized names.
msg294078 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2017-05-21 06:39
I'm happy for you to change the name to filterfalse.
msg294110 - (view) Author: Wolfgang Maier (wolma) * Date: 2017-05-21 19:31
@serhiy: my bad! I just hadn't realized this behavior of the original.
With this requirement I cannot see any simpler solution than Steven's.

Some other questions though to everyone involved:
1) what do you think about "os.path is posixpath" vs just checking os.name == 'posix' as I suggested earlier?

2) speaking of missing functionality in filter:
What do you think of a keyword argument like 'case' to both filter and filterfalse that, when True, would make these functions behave equivalently to
[n for n in names if fnmatchcase(n, pattern)]
The default would be False, of course. I know this would be inconsistent in terms of argument vs separate functions, but it is easy to explain and learn and without it Windows users of filter/filterfalse would really suffer from the much poorer performance due to the slow normcase call (even slower now with the new fspath protocol) even if they pass in normalized names already.

3) not directly related to this issue, but I came across it in this context:

isn't normcase in both posixpath and ntpath doing isinstance(str, bytes) checks that are redundant with os.fspath? Is this oversight or am I missing something again?
msg294111 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-21 20:39
> 1) what do you think about "os.path is posixpath" vs just checking os.name == 'posix' as I suggested earlier?

Currently os.path is posixpath if and only if os.name == 'posix'. But this can be changed in future. I think it is safer to the current check. The posixpath module doesn't contain classes, enums or namedtuples, it's import is fast.

And it would be safer to keep the check in the function rather than move it at module level.

> What do you think of a keyword argument like 'case' to both filter and filterfalse that, when True, would make these functions behave equivalently to
> [n for n in names if fnmatchcase(n, pattern)]

This is a different issue. fnmatch.filter() was added for speeding up glob.glob() (see issue409973). I don't know whether there is a good use case for filtercase().

> isn't normcase in both posixpath and ntpath doing isinstance(str, bytes) checks that are redundant with os.fspath?

Good catch! It seems to me that they are redundant. Please open a new issue for this.
msg294132 - (view) Author: Wolfgang Maier (wolma) * Date: 2017-05-22 08:32
> Good catch! It seems to me that they are redundant. Please open a new issue for this.

done: issue 30427
msg294133 - (view) Author: Wolfgang Maier (wolma) * Date: 2017-05-22 08:41
Yet another thing I just realized (sorry for being so annoying):

With os.normcase calling os.fspath in 3.6+ it is not really a NOP anymore even on posix. As a consequence, you can now do some weird things with fnmatch: in all cases, and only in these, where the module *does* call normcase you can pass in Path-like objects as the names.
This works with fnmatch.fnmatch on all platforms, never for fnmatchcase, and platform-dependently (on Windows, but not on posix platforms) for filter/filterfalse.
It's mostly that last case that worries me because it makes it easy to accidentally write code that is not platform-independent.

Also note that you can also pass a Path-like object as pat everywhere except in fnmatchcase.
msg294510 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2017-05-25 19:33
I don't think os.normcase() calling os.fspath() needs to be a worry. Pragmatically it's just getting a str/bytes from an object that knows how to return a str/bytes for a path. IOW it's no different than os.normcase(str(path)) and so embedding the call such that it's not a flat-out no-op shouldn't be a general concern.
msg350056 - (view) Author: Roee Nizan (Roee Nizan) Date: 2019-08-21 09:56
Was this abandoned? Is there a reason not to accept this?
msg350062 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-08-21 10:51
Rather than adding a new function, why not adding a parameter like sort(key=func, reverse=True)? Something like fnmatch.filterfalse(pat, invert=True)?

The Unix grep command has a --invert-match option:

       -v, --invert-match
              Invert the sense of matching, to select non-matching lines.

The Unix find and test commands use "!" to invert a condition. Example:

   find ! -name "*.py"  # list files which doesn't match ".py"
   test ! -e setup.py   # true if setup.py doesn't exist
msg350066 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2019-08-21 11:10
On Wed, Aug 21, 2019 at 10:51:02AM +0000, STINNER Victor wrote:

> Rather than adding a new function, why not adding a parameter like 
> sort(key=func, reverse=True)? Something like fnmatch.filterfalse(pat, 
> invert=True)?

Guido argues that as a general rule of thumb, we should avoid "constant 
bool parameters" and prefer seperate functions. For example, we have

- re.search and re.match, not re.search(start=True)

- str.find and str.rfind, not str.find(end=False)

If we typically call the function with a bool constant:

    filter(pat, invert=True)

rather than a variable or expression

    filter(pat, invert=condition or default)

that's a hint that the two cases probably should be seperate functions.

Martin Fowler agrees:

https://martinfowler.com/bliki/FlagArgument.html

as does Raymond Chen:

https://devblogs.microsoft.com/oldnewthing/20060828-18/?p=29953

(Of course there are cases where it is impractical to avoid bool flags 
-- if a function would otherwise take four flags, we would need sixteen 
functions! -- or there may be other reasons why we might go against that 
design rule.)
msg350090 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-08-21 16:01
> Rather than adding a new function, why not adding a parameter 

Guido, the reason iteratools has a separate filter() and filterfalse() tools is because of your API guidance to prefer separate functions rather than having flags in most cases.

Do you still feel that way and does it apply here?  Personally, I would want to have a separate function for fnmatch.filter_false() rather than adding complexity to the API for fnmatch.
msg350093 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2019-08-21 16:32
Yes, I still feel like this and I find it applies here. (Note that the module already has two variants of fnmatch(), so it's nothing new in this context.)
History
Date User Action Args
2019-08-21 16:32:30gvanrossumsetnosy: - gvanrossum
2019-08-21 16:32:16gvanrossumsetmessages: + msg350093
2019-08-21 16:01:18rhettingersetnosy: + rhettinger, gvanrossum
messages: + msg350090
2019-08-21 11:10:49steven.dapranosetmessages: + msg350066
2019-08-21 10:51:02vstinnersetnosy: + vstinner
messages: + msg350062
2019-08-21 09:56:34Roee Nizansetnosy: + Roee Nizan
messages: + msg350056
2017-05-25 19:33:10brett.cannonsetmessages: + msg294510
2017-05-25 12:01:36serhiy.storchakasetnosy: + brett.cannon
2017-05-22 08:41:37wolmasetmessages: + msg294133
2017-05-22 08:32:30wolmasetmessages: + msg294132
2017-05-21 20:39:22serhiy.storchakasetmessages: + msg294111
title: Add fnmatch.filter_false function -> Add fnmatch.filterfalse function
2017-05-21 19:31:54wolmasetmessages: + msg294110
2017-05-21 06:39:16steven.dapranosetmessages: + msg294078
2017-05-20 18:41:13serhiy.storchakasetdependencies: + Improve fnmatch testing
components: + Library (Lib)
2017-05-20 17:34:34serhiy.storchakasetmessages: + msg294041
2017-05-20 17:31:47wolmasetmessages: + msg294039
2017-05-20 17:26:41serhiy.storchakasetmessages: + msg294038
2017-05-20 17:08:40wolmasetfiles: + filterfalse.alternate_patch
nosy: + wolma
messages: + msg294036

2017-05-20 14:31:54serhiy.storchakasetnosy: + serhiy.storchaka

messages: + msg294031
stage: patch review
2017-05-20 13:19:02steven.dapranocreate