New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fnmatch, glob: Allow curly brace expansion #53793
Comments
The attached patch allows for shell curly braces with fnmatch.filter(). This makes the following possible:
>>> import fnmatch
>>> import os
>>>
>>> for file in os.listdir('.'):
... if fnmatch.fnmatch(file, '*.{txt,csv}'):
... print file
...
file.csv
file.txt
foo.txt
This is especially convenient with the glob module:
>>> import glob
>>> glob.glob('*.{txt,csv}')
['file.csv', 'file.txt', 'foo.txt'] Hopefully, this makes fnmatch match better the behavior that people expect from a shell-style pattern matcher. Please note: I attached a patch that applies on the Python trunk, but only tested it on Python 2.5 on Windows. However, the fnmatch module doesn't seem to have changed substantially in between. |
Oops, I meant that it allows for curly braces in fnmatch.translate(), which makes it available in the whole fnmatch module. |
Thanks for the patch. + if j < n and pat[j] == '}': I don't get what the purpose of these two lines is. Forbid empty patterns? + while i < n and pat[j] != '}': You probably mean "while j < n" instead of "while i < n". You should also add a test for unmatched braces. Currently: $ ./python -c "import fnmatch; print(fnmatch.translate('{x'))"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/antoine/py3k/__svn__/Lib/fnmatch.py", line 129, in translate
while i < n and pat[j] != '}':
IndexError: string index out of range |
Thanks for this suggestion and patch. In general I think more tests would be good, A test for {} would clarify what you are expecting there. |
In Bash, * and ? match only characters in existing files, but {a,b} always produces two filenames, even if the files don’t exist. Do we want to mimic this behavior in fnmatch? |
Ah, I had forgotten that detail, Éric. No, it doesn't seem as if implementing braces as matchers is appropriate. fnmatch is only implementing the shell file name globbing. Doing the equivalent of brace expansion would have to be done before applying globbing, to be consistent with the shell. Which is too bad. Unfortunately I think we should probably reject this, though it could be discussed on python-ideas to see if the idea can lead to something both consistent with the shell and useful. |
I don't get the "shell consistency" argument. First, there is no single definition of "the shell". Second, users of Python generally don't care what a shell would do, they simply want to achieve useful tasks (which filename matching is arguably part of). |
I'm not sure it has to be consistent with the shell to be useful, as long as the behavior is documented and we possibly add a note explaining the differences from the shell. But I agree that a discussion on python-ideas would be helpful. |
My view is that people using fnmatch/glob are expecting to get back the same list of files that they would if they ran 'echo <globpattern>' in the shell. The major shells (sh, bash, zsh, csh) seem to be pretty consistent in this regard (though sh does less brace expansion than the others...but is almost always actually bash these days). If you just wanted to provide a flexible way for people to match files, then instead of fnmatch/glob, we should have a function that walks down a directory tree applying a regular expression to the filenames it encounters and returning the rooted pathnames of the matches. That function is easy enough to write using standard library facilities. The special magic of fnmatch/glob is that it does a not-so-easy-to-get-right transformation of *shell* globbing rules into regular expressions behind the scenes. That is, in my view its *purpose* is to be compatible with the "normal rules" for unix shell globbing. So currently I'm about -0.5 on this feature. |
But it's not the case since we currently don't process braces anyway.
Excluding the 95% (or so) of Windows users, I suppose.
I've never thought that the purpose of glob or fnmatch was to reproduce |
Well, Windows supports * and ? globs, but not brace expansion, as far as I can tell (at least on XP, which is what I currently have access to). In fact, I don't believe I've run into brace expansion anywhere except in the unix shell, whereas as you say * and ? globbing is fairly common, so that might be another reason *not* to add it :) Unfortunately for that argument, Windows XP CMD doesn't appear to support [] globbing. I'm not going to try block this if other people want it. As you say, there is no real standard here to adhere to. |
I don't see any reason to turn this down except, perhaps, for keeping something simple. Certainly I don't believe that Windows users will be confused by the fact that there are wildcards other than "*" and "?". fnmatch already implements [] and [!] which are not supported on Windows. |
Wow, I certainly didn't expect to generate so much controversy. :-/ First of all, thanks for the comments on the patch Antoine and David.
I copy-pasted the handling of the '[' character in the same file, and adapted it. This test was to properly handle sequences like '[]]', and you are right, it has nothing to do in this patch, I just forgot to remove it.
Yes, that's a typo. :-/
I know, I just thought I would try to remain consistent with the way the '[' char was handled.
I realised that after submitting the patch, yes. Actually, there are several other cases that I didn't properly handle, like a closing brace without a matching opening brace, or even nested braces (which are perfectly acceptable in the context of a shell like Bash). I'm working on an improved patch that would correctly handle those cases (with much more unit tests!), I guess I just hit the submit button too quickly. :) --- Now, about whether or not this is appropriate in fnmatch, I agree with David that if we want to remain really consistent with shell implementations, then fnmatch probably isn't the appropriate place to do so. In this case, I guess the correct way to implement it would be to expand the braces and generate several patterns that would all be fed to different fnmatch calls? Implementing it in fnmatch just seemed so convenient, replacing the braces with '(...|...)' constructs in a regex. People seem to agree that a thread on python-ideas would be good to discuss this change, but this ticket already generated some discussion. Should I start the thread on the mailing-list anyway or is this ticket an appropriate forum for further discussion? |
python-idea is read by more people. |
I agree with Antoine that this would be useful functionality and that matching "the" shell is futile here. A quick check on an old linux server: bash and ksh do brace expansion before expanding '*', but that csh does both at the same time. That is, in a directory with foo.py and no .h files 'echo *.{py,h}' returns foo.py with csh and '*.h foo.py' with bash. I'm +1 on matching the behavior of csh here. |
It's worth noting that the sh-like shells are far more widely used than the csh-like shells, so csh-like behavior may surprise more people. From the sh-like shell perspective, the {...,...} syntax just isn't part of the globbing handling. |
I finally found the time to follow up on this issue, sorry for the absence of response. The thread on Python-Ideas didn't really lead to a consensus (nor did it generate a lot of discussion). Some wanted to see this in fnmatch, others in glob and others in shutils. Most thought glob was the appropriate place though, and this is also my opinion. From the Python documentation, fnmatch is a « Unix filename pattern matching » while glob is a « Unix style pathname pattern expansion ». This makes it clear to me that curly expansion has its place in glob, that would then use fnmatch to match the resulting list of expanded paths. Here is a patch against the py3k branch. The patch contains both the implementation, unit tests, and some changes to the documentation. Note that could I only run the unit tests on Linux (Fedora 14 x86_64) which is the only system I have at hand. |
Latest patch looks good. |
Same patch, but rebased to the current trunk so it still applies. |
This is the right patch, sorry for all the mail spam. :-/ |
Thanks for the research and the updated patch. Unfortunately as a feature request this is going to have to wait for 3.3 since we missed the pre-beta window. |
Ok. This is my first patch to Python, so I'm not sure what I should do to get this in. Is keeping the patch in sync with the trunk enough? Is there something else, like some more formal process to follow? |
Nope, you've got it. After the final release of Python 3.2, please post to the issue to remind us about it, and someone will commit the patch. (For future Python releases we expect that the delays in our ability to commit feature patches will be much shorter, but this is the way it works right now.) |
So, now that Python 3.2 was released, here is a patch rebased on top of the py3k branch. |
The sys module is imported in glob but never used. It's not related to this feature request but I saw it when implementing the patch, so here is a second patch removing the import. |
I removed the unused import (mostly as a simple test of mercurial, it's my first commit there). |
Does it mean that Python development is not being done in SVN, as the documentations state it? My patches have all been based on the SVN py3k branch, please tell me if I must base them on something else instead. |
I've just rebuilt on Windows against tip. test_glob is failing: test test_glob failed -- Traceback (most recent call last):
File "c:\work-in-progress\python\cpython-9584\lib\test\test_glob.py", line 135, in test_glob_curly_braces
os.path.join('a', 'bcd', 'efg')]))
File "c:\work-in-progress\python\cpython-9584\lib\test\test_glob.py", line 53, in assertSequencesEqual_noorder
self.assertEqual(set(l1), set(l2))
AssertionError: Items in the first set but not the second:
'@test_2788_tmp_dir\\a/bcd\\efg'
'@test_2788_tmp_dir\\a/bcd\\EF'
Items in the second set but not the first:
'@test_2788_tmp_dir\\a\\bcd\\EF'
'@test_2788_tmp_dir\\a\\bcd\\efg' |
+ if sub.find(',') != -1: |
Help is still needed to debug the failing test_glob.py on Windows. I just tried this patch against 3.4trunk. The first hunk of the glob.py patch doesn't apply because 'for name in glob1(None, basename): yield name' was changed to 'yield from glob1(None, basename)'. The tests still pass w/ Ezio's suggested change to use (',' in sub). |
I've got a patch for this which applies cleanly to the 3.4 tip. I still need to sort out the Windows issues (which I don't think will be difficult; it looks like a test issue, not a code issue) |
I have to apologize for not following up on this patch. At first I had no time to go on pushing for it, and then (after a change of job), I completely forgot about it. :( I guess rebasing the patch on the latest tip is not that useful if you already have done it Tim, and I don't have access to a Windows box to figure out the issue in the tests. At this point is there anything else I can do to help getting it integrated? |
I'm planning to refactor the tests and the code very slightly. When I've |
Exactly. Although that's just because at the time I sent the patch, I didn't know about ranges in shells. So I just implemented the part of curly brace expansion that I knew of and felt would be useful. If ranges are an expected feature from shells, then it would probably be a worthwhile addition. |
Attached is a refactored version of Mathieu's patch which, when applied to tip, passes all tests. |
Something went wrong with that patch; it doesn't include all the changes to test_glob. I'll upload a newer patch later. |
Scratch that last comment: the patch does apply. I've tested it against Windows & Ubuntu. If no one comes in with any objections I'll commit it within the next day. |
New changeset dafca4714298 by Tim Golden in branch 'default': |
>>> glob.glob('*.ac')
['configure.ac']
>>> glob.glob('*.sub')
['config.sub']
>>> glob.glob('*.{sub,ac}')
['config.sub'] And since these changes are backward incompatible (and moreover, now it is impossible to glob for paths that contain braces), I would prefer a keyword to switch this option (off by default) or a separate function. See "man glob" [1] for flag GLOB_BRACE. |
>>> glob.glob('*.{sub,ac}')
['config.sub'] I'm surprised this broke, this is one of the behaviour I thought I had implemented in my original patch. :-/
I am absolutely sure this was working in my original submission, I had even added unit tests for it: |
Must have been something I did. I'll revert the commit and re-test. |
New changeset ec897bb38708 by Tim Golden in branch 'default': |
It can't be working in any implementation. >>> os.makedirs('a{b,c}d/e')
>>> os.listdir('a{b,c}d')
['e']
>>> glob.glob('a{b,c}d/*')
[] Was ['a{b,c}d/e'] in 3.3. |
Well even in the original [working] version, the scope of this change |
I'm talking about glob.glob(), not about os.listdir(). Bakward incompatible features should be off by default. |
Sorry, I misunderstood the point you were making with the |
This is another question. glob.glob(pattern, glob.GLOB_XXX | glob.GLOB_YYY) looks not much shorter than glob.glob(pattern, use_xxx=True, use_yyy=True). And not all features can be configured by only one bit. See os.access() and os.utime() as examples for many options. I hope that before 3.4 feature freeze we can change the interface, if needed. I also believe that this feature may not be regarded as mature without any sort of escaping (which also should be off by default for compatibility). And we need glob.escape() function for escaping arbitrary path. Since it is quite a complicated function, we need more examples in the documentation, which show different corner cases, nesting and interaction with wildcards. |
Given the backward compatibility concerns, and the fact that brace expansion and wildcard expansion are conceptually separate operations, perhaps what we should have a is a glob.expand_braces (or some such name) function? (Note: I haven't looked at whether or not this would actually work as an API, I'm just throwing out an idea.) |
In this case it may be appropriate to place this function into shutil or string module. |
Given that this isn't going to go ahead in its current form, and will need wider discussion on python-dev, I'm unassigning myself and I've removed the flawed version of the patch which I'd posted. |
Hi All, I have created new patch for handling curly brace expansion. The patch is available on Github PR bpo-6299 |
Hi All, this is a humble ping. |
@matúš Valo, thank you for the pull request and for your interest in contributing to CPython. It seems that consensus hadn't been reached during the previous discussion on this issue, which is why msg175217 suggested creating a discussion on python-dev. Most likely, your patch won't be reviewed until some more discussion takes place. Please open a topic on python-dev and then link to that topic once the details are hashed out. Thanks! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: