This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: '*' matches entire path in fnmatch
Type: Stage:
Components: Library (Lib) Versions:
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Alberto Galera, Jim Nasby, Toon Verstraelen, aaron-whitehouse, josh.r, serhiy.storchaka
Priority: normal Keywords:

Created on 2016-11-16 20:12 by Jim Nasby, last changed 2022-04-11 14:58 by admin.

Messages (8)
msg280985 - (view) Author: Jim Nasby (Jim Nasby) * Date: 2016-11-16 20:12
A '*' in fnmatch.translate is converted into '.*', which will greedily match directory separators. This doesn't match shell behavior, which is that * will only match file names:

decibel@decina:[14:07]~$ls ~/tmp/*/1|head
ls: /Users/decibel/tmp/*/1: No such file or directory
decibel@decina:[14:07]~$ls ~/tmp/d*/base/1|head

From a posix standpoint, this would easily be fixed by using '[^/]*' instead of '.*'. I'm not sure how to make this work cross-platform though.

It's worth noting that some programs (rsync, git) support **, which would correctly translate to '.*'.
msg281017 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2016-11-17 02:00
Presumably something like:

r'(?:' + r'|'.join({re.escape(os.path.sep), re.escape(os.path.altsep)}) + r')'

would cover it completely. I switched to using non-capturing groups over a character class both to deal with the fact that escaping doesn't work the same way for character classes and to cover the possibility (no idea here) that some terrible OS might have a multicharacter path separator.
msg281018 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2016-11-17 02:01
Oops, altsep is None, not the empty string when there is only one separator. And I didn't handle inverting the match. Sigh. You get the idea.
msg288608 - (view) Author: Aaron Whitehouse (aaron-whitehouse) Date: 2017-02-26 18:20
Note that somebody has forked the standard library to implement this:
This shows that the actual changes would be pretty small (though pywildcard is based on 2.x code and does not handle the cross-platform slashes you have been discussing).

It is also worth noting that the glob standard library:
implements a "recursive" option that has similar behaviour (* does not span path separators whereas ** does) and essentially builds this on top of fnmatch for the actual filename matching. 

I do not think we can change the default behaviour of fnmatch at this point, but I would like to see this behaviour triggered by an optional argument to the various functions, e.g.:
fnmatch.fnmatch(filename, pattern, glob_asterisks=False)
fnmatch.fnmatchcase(filename, pattern, glob_asterisks=False)
fnmatch.filter(names, pattern, glob_asterisks=False)
fnmatch.translate(pattern, glob_asterisks=False)

In each case, if glob_asterisks (or whatever other name we came up with) is true, the behaviour would match the pywildcard behaviour, i.e.:
    **      matches everything
    *       matches in one path level

I look after the glob matching code in duplicity and would like to start using the standard library to do filename matching for us, but we need the above behaviour. I am happy to do the patching if there is a realistic chance of it being accepted.
msg290624 - (view) Author: Aaron Whitehouse (aaron-whitehouse) Date: 2017-03-27 16:01
Posted to the [Python-ideas] mailing list, as it is proposing a change to a standard library:

Nobody has responded so far, however. I take this as at least no vehement objection to the idea.
msg307867 - (view) Author: Alberto Galera (Alberto Galera) Date: 2017-12-08 20:09
I see that they have commented on the lib that I made a few years ago (python-wildcard).

The reason for the creation of that little fork started in this issue:
msg339054 - (view) Author: Toon Verstraelen (Toon Verstraelen) * Date: 2019-03-28 15:59
For consistency with the corresponding feature in the glob function since Python 3.5, I would suggest to add an extra optional argument 'recursive' instead of 'glob_asterisks'. With the default recursive=False, one gets the old behavior, with recursive=True, it can handle the '**' and '*' as in pywildcard.

I realize that with recursive=False, the behavior is not exactly consistent with glob, but  I'd still prefer the same name for the optional argument. It is the common terminology for this type of feature. See
msg339256 - (view) Author: Toon Verstraelen (Toon Verstraelen) * Date: 2019-03-31 12:59
Just for reference, here are a few more implementations of the same idea, next to pywildcard, sometimes combined with other useful features:


The last one is rather active, with regular releases, last one on March 24, 2019.
Date User Action Args
2022-04-11 14:58:39adminsetgithub: 72904
2019-03-31 12:59:09Toon Verstraelensetmessages: + msg339256
2019-03-28 17:42:37xtreaksetnosy: + serhiy.storchaka
2019-03-28 15:59:15Toon Verstraelensetnosy: + Toon Verstraelen
messages: + msg339054
2017-12-08 20:09:36Alberto Galerasetnosy: + Alberto Galera
messages: + msg307867
2017-03-27 16:01:52aaron-whitehousesetmessages: + msg290624
2017-02-26 18:20:35aaron-whitehousesetnosy: + aaron-whitehouse

messages: + msg288608
title: '*' matches entire path in fnmatch.translate -> '*' matches entire path in fnmatch
2016-11-17 02:01:17josh.rsetmessages: + msg281018
2016-11-17 02:00:14josh.rsetnosy: + josh.r
messages: + msg281017
2016-11-16 20:12:07Jim Nasbycreate