|
msg113750 - (view) |
Author: Mathieu Bridon (bochecha) |
Date: 2010-08-13 08:36 |
The attached patch allows for shell curly braces with fnmatch.filter().
This makes the following possible:
>>> import fnmatch
>>> import os
>>>
>>> for file in os.listdir('.'):
... if fnmatch.fnmatch(file, '*.{txt,csv}'):
... print file
...
file.csv
file.txt
foo.txt
This is especially convenient with the glob module:
>>> import glob
>>> glob.glob('*.{txt,csv}')
['file.csv', 'file.txt', 'foo.txt']
Hopefully, this makes fnmatch match better the behavior that people expect from a shell-style pattern matcher.
Please note: I attached a patch that applies on the Python trunk, but only tested it on Python 2.5 on Windows. However, the fnmatch module doesn't seem to have changed substantially in between.
|
|
msg113751 - (view) |
Author: Mathieu Bridon (bochecha) |
Date: 2010-08-13 08:41 |
> The attached patch allows for shell curly braces with fnmatch.filter().
Oops, I meant that it allows for curly braces in fnmatch.translate(), which makes it available in the whole fnmatch module.
|
|
msg113753 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-08-13 10:23 |
Thanks for the patch.
+ if j < n and pat[j] == '}':
+ j = j+1
I don't get what the purpose of these two lines is. Forbid empty patterns?
+ while i < n and pat[j] != '}':
+ j = j+1
You probably mean "while j < n" instead of "while i < n".
Regardless, it's simpler to use "j = pat.find('}', j)".
You should also add a test for unmatched braces. Currently:
$ ./python -c "import fnmatch; print(fnmatch.translate('{x'))"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/antoine/py3k/__svn__/Lib/fnmatch.py", line 129, in translate
while i < n and pat[j] != '}':
IndexError: string index out of range
|
|
msg113756 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2010-08-13 12:05 |
Thanks for this suggestion and patch.
In general I think more tests would be good, A test for {} would clarify what you are expecting there.
|
|
msg113758 - (view) |
Author: Éric Araujo (eric.araujo) *  |
Date: 2010-08-13 12:34 |
In Bash, * and ? match only characters in existing files, but {a,b} always produces two filenames, even if the files don’t exist. Do we want to mimic this behavior in fnmatch?
|
|
msg113762 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2010-08-13 13:03 |
Ah, I had forgotten that detail, Éric.
No, it doesn't seem as if implementing braces as matchers is appropriate. fnmatch is only implementing the shell file name globbing. Doing the equivalent of brace expansion would have to be done before applying globbing, to be consistent with the shell. Which is too bad.
Unfortunately I think we should probably reject this, though it could be discussed on python-ideas to see if the idea can lead to something both consistent with the shell and useful.
|
|
msg113765 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-08-13 13:08 |
> Doing the equivalent of brace expansion would have to be done before
> applying globbing, to be consistent with the shell.
I don't get the "shell consistency" argument. First, there is no single definition of "the shell". Second, users of Python generally don't care what a shell would do, they simply want to achieve useful tasks (which filename matching is arguably part of).
|
|
msg113766 - (view) |
Author: Eric V. Smith (eric.smith) *  |
Date: 2010-08-13 13:11 |
I'm not sure it has to be consistent with the shell to be useful, as long as the behavior is documented and we possibly add a note explaining the differences from the shell. But I agree that a discussion on python-ideas would be helpful.
|
|
msg113767 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2010-08-13 13:27 |
My view is that people using fnmatch/glob are expecting to get back the same list of files that they would if they ran 'echo <globpattern>' in the shell. The major shells (sh, bash, zsh, csh) seem to be pretty consistent in this regard (though sh does less brace expansion than the others...but is almost always actually bash these days).
If you just wanted to provide a flexible way for people to match files, then instead of fnmatch/glob, we should have a function that walks down a directory tree applying a regular expression to the filenames it encounters and returning the rooted pathnames of the matches. That function is easy enough to write using standard library facilities. The special magic of fnmatch/glob is that it does a not-so-easy-to-get-right transformation of *shell* globbing rules into regular expressions behind the scenes. That is, in my view its *purpose* is to be compatible with the "normal rules" for unix shell globbing.
So currently I'm about -0.5 on this feature.
|
|
msg113773 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-08-13 13:49 |
> My view is that people using fnmatch/glob are expecting to get back
> the same list of files that they would if they ran 'echo
> <globpattern>' in the shell.
But it's not the case since we currently don't process braces anyway.
> The major shells (sh, bash, zsh, csh) seem to be pretty consistent in
> this regard (though sh does less brace expansion than the others...but
> is almost always actually bash these days).
Excluding the 95% (or so) of Windows users, I suppose.
> The special magic of fnmatch/glob is that it does a
> not-so-easy-to-get-right transformation of *shell* globbing rules into
> regular expressions behind the scenes. That is, in my view its
> *purpose* is to be compatible with the "normal rules" for unix shell
> globbing.
I've never thought that the purpose of glob or fnmatch was to reproduce
shell rules. It's simply a convenient primitive. Wildcard expansion
exists in lots of other software than Unix shells.
|
|
msg113787 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2010-08-13 15:37 |
Well, Windows supports * and ? globs, but not brace expansion, as far as I can tell (at least on XP, which is what I currently have access to).
In fact, I don't believe I've run into brace expansion anywhere except in the unix shell, whereas as you say * and ? globbing is fairly common, so that might be another reason *not* to add it :)
Unfortunately for that argument, Windows XP CMD doesn't appear to support [] globbing.
I'm not going to try block this if other people want it. As you say, there is no real standard here to adhere to.
|
|
msg113789 - (view) |
Author: Tim Golden (tim.golden)  |
Date: 2010-08-13 15:46 |
I don't see any reason to turn this down except, perhaps, for keeping something simple.
Certainly I don't believe that Windows users will be confused by the fact that there are wildcards other than "*" and "?". fnmatch already implements [] and [!] which are not supported on Windows.
|
|
msg113888 - (view) |
Author: Mathieu Bridon (bochecha) |
Date: 2010-08-14 11:29 |
Wow, I certainly didn't expect to generate so much controversy. :-/
First of all, thanks for the comments on the patch Antoine and David.
> I don't get what the purpose of these two lines is. Forbid empty patterns?
I copy-pasted the handling of the '[' character in the same file, and adapted it. This test was to properly handle sequences like '[]]', and you are right, it has nothing to do in this patch, I just forgot to remove it.
> You probably mean "while j < n" instead of "while i < n".
Yes, that's a typo. :-/
> Regardless, it's simpler to use "j = pat.find('}', j)".
I know, I just thought I would try to remain consistent with the way the '[' char was handled.
> You should also add a test for unmatched braces. Currently:
I realised that after submitting the patch, yes. Actually, there are several other cases that I didn't properly handle, like a closing brace without a matching opening brace, or even nested braces (which are perfectly acceptable in the context of a shell like Bash).
I'm working on an improved patch that would correctly handle those cases (with much more unit tests!), I guess I just hit the submit button too quickly. :)
---
Now, about whether or not this is appropriate in fnmatch, I agree with David that if we want to remain really consistent with shell implementations, then fnmatch probably isn't the appropriate place to do so.
In this case, I guess the correct way to implement it would be to expand the braces and generate several patterns that would all be fed to different fnmatch calls?
Implementing it in fnmatch just seemed so convenient, replacing the braces with '(...|...)' constructs in a regex.
People seem to agree that a thread on python-ideas would be good to discuss this change, but this ticket already generated some discussion. Should I start the thread on the mailing-list anyway or is this ticket an appropriate forum for further discussion?
|
|
msg113897 - (view) |
Author: Éric Araujo (eric.araujo) *  |
Date: 2010-08-14 13:45 |
python-idea is read by more people.
|
|
msg114115 - (view) |
Author: Ronald Oussoren (ronaldoussoren) *  |
Date: 2010-08-17 13:08 |
I agree with Antoine that this would be useful functionality and that matching "the" shell is futile here.
A quick check on an old linux server: bash and ksh do brace expansion before expanding '*', but that csh does both at the same time.
That is, in a directory with foo.py and no .h files 'echo *.{py,h}' returns foo.py with csh and '*.h foo.py' with bash.
I'm +1 on matching the behavior of csh here.
|
|
msg114119 - (view) |
Author: Fred L. Drake, Jr. (fdrake)  |
Date: 2010-08-17 13:18 |
It's worth noting that the sh-like shells are far more widely used than the csh-like shells, so csh-like behavior may surprise more people.
From the sh-like shell perspective, the {...,...} syntax just isn't part of the globbing handling.
|
|
msg120082 - (view) |
Author: Mathieu Bridon (bochecha) |
Date: 2010-10-31 20:47 |
I finally found the time to follow up on this issue, sorry for the absence of response.
The thread on Python-Ideas didn't really lead to a consensus (nor did it generate a lot of discussion).
Some wanted to see this in fnmatch, others in glob and others in shutils. Most thought glob was the appropriate place though, and this is also my opinion.
From the Python documentation, fnmatch is a « Unix filename pattern matching » while glob is a « Unix style pathname pattern expansion ».
This makes it clear to me that curly expansion has its place in glob, that would then use fnmatch to match the resulting list of expanded paths.
Here is a patch against the py3k branch.
The patch contains both the implementation, unit tests, and some changes to the documentation.
Note that could I only run the unit tests on Linux (Fedora 14 x86_64) which is the only system I have at hand.
|
|
msg122614 - (view) |
Author: Éric Araujo (eric.araujo) *  |
Date: 2010-11-28 03:40 |
Latest patch looks good.
|
|
msg124422 - (view) |
Author: Mathieu Bridon (bochecha) |
Date: 2010-12-21 08:44 |
Same patch, but rebased to the current trunk so it still applies.
|
|
msg124423 - (view) |
Author: Mathieu Bridon (bochecha) |
Date: 2010-12-21 08:45 |
This is the right patch, sorry for all the mail spam. :-/
|
|
msg124459 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2010-12-21 21:30 |
Thanks for the research and the updated patch. Unfortunately as a feature request this is going to have to wait for 3.3 since we missed the pre-beta window.
|
|
msg124461 - (view) |
Author: Mathieu Bridon (bochecha) |
Date: 2010-12-21 21:42 |
> Thanks for the research and the updated patch. Unfortunately as
> a feature request this is going to have to wait for 3.3 since we
> missed the pre-beta window.
Ok.
This is my first patch to Python, so I'm not sure what I should do to get this in.
Is keeping the patch in sync with the trunk enough? Is there something else, like some more formal process to follow?
|
|
msg124462 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2010-12-21 21:46 |
Nope, you've got it.
After the final release of Python 3.2, please post to the issue to remind us about it, and someone will commit the patch. (For future Python releases we expect that the delays in our ability to commit feature patches will be much shorter, but this is the way it works right now.)
|
|
msg130098 - (view) |
Author: Mathieu Bridon (bochecha) |
Date: 2011-03-05 03:42 |
So, now that Python 3.2 was released, here is a patch rebased on top of the py3k branch.
|
|
msg130099 - (view) |
Author: Mathieu Bridon (bochecha) |
Date: 2011-03-05 03:44 |
The sys module is imported in glob but never used.
It's not related to this feature request but I saw it when implementing the patch, so here is a second patch removing the import.
|
|
msg130506 - (view) |
Author: Eric V. Smith (eric.smith) *  |
Date: 2011-03-10 13:45 |
I removed the unused import (mostly as a simple test of mercurial, it's my first commit there).
|
|
msg131613 - (view) |
Author: Mathieu Bridon (bochecha) |
Date: 2011-03-21 04:21 |
> "I removed the unused import (mostly as a simple test of mercurial, it's my first commit there)."
Does it mean that Python development is not being done in SVN, as the documentations state it?
My patches have all been based on the SVN py3k branch, please tell me if I must base them on something else instead.
|
|
msg131622 - (view) |
Author: Eric V. Smith (eric.smith) *  |
Date: 2011-03-21 08:48 |
Yes, we recently switched to Mercurial. See http://docs.python.org/devguide/faq.html
You shouldn't need to change your patches just because of the switch from svn.
|
|
msg135558 - (view) |
Author: Mathieu Bridon (bochecha) |
Date: 2011-05-09 02:53 |
Is anybody still reading this? :-/
Could somebody commit the patch, reject it, or tell me what else I need to do?
|
|
msg135587 - (view) |
Author: Tim Golden (tim.golden)  |
Date: 2011-05-09 13:32 |
I've just rebuilt on Windows against tip. test_glob is failing:
test test_glob failed -- Traceback (most recent call last):
File "c:\work-in-progress\python\cpython-9584\lib\test\test_glob.py", line 135, in test_glob_curly_braces
os.path.join('a', 'bcd', 'efg')]))
File "c:\work-in-progress\python\cpython-9584\lib\test\test_glob.py", line 53, in assertSequencesEqual_noorder
self.assertEqual(set(l1), set(l2))
AssertionError: Items in the first set but not the second:
'@test_2788_tmp_dir\\a/bcd\\efg'
'@test_2788_tmp_dir\\a/bcd\\EF'
Items in the second set but not the first:
'@test_2788_tmp_dir\\a\\bcd\\EF'
'@test_2788_tmp_dir\\a\\bcd\\efg'
|
|
msg135588 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2011-05-09 14:04 |
+ if sub.find(',') != -1:
Please use the 'in' operator here.
|
|
| Date |
User |
Action |
Args |
| 2011-05-09 14:04:55 | ezio.melotti | set | nosy:
+ ezio.melotti messages:
+ msg135588
|
| 2011-05-09 13:32:24 | tim.golden | set | messages:
+ msg135587 |
| 2011-05-09 02:53:10 | bochecha | set | messages:
+ msg135558 |
| 2011-03-21 08:48:26 | eric.smith | set | nosy:
fdrake, ronaldoussoren, pitrou, eric.smith, tim.golden, kveretennicov, eric.araujo, r.david.murray, bochecha messages:
+ msg131622 |
| 2011-03-21 04:21:25 | bochecha | set | nosy:
fdrake, ronaldoussoren, pitrou, eric.smith, tim.golden, kveretennicov, eric.araujo, r.david.murray, bochecha messages:
+ msg131613 |
| 2011-03-10 13:45:57 | eric.smith | set | nosy:
fdrake, ronaldoussoren, pitrou, eric.smith, tim.golden, kveretennicov, eric.araujo, r.david.murray, bochecha messages:
+ msg130506 |
| 2011-03-05 03:44:24 | bochecha | set | files:
+ 0002-Remove-unused-import.patch nosy:
fdrake, ronaldoussoren, pitrou, eric.smith, tim.golden, kveretennicov, eric.araujo, r.david.murray, bochecha messages:
+ msg130099
|
| 2011-03-05 03:43:00 | bochecha | set | files:
- 0001-Curly-brace-expansion-in-glob.patch nosy:
fdrake, ronaldoussoren, pitrou, eric.smith, tim.golden, kveretennicov, eric.araujo, r.david.murray, bochecha |
| 2011-03-05 03:42:35 | bochecha | set | files:
+ 0001-Curly-brace-expansion-in-glob.patch nosy:
fdrake, ronaldoussoren, pitrou, eric.smith, tim.golden, kveretennicov, eric.araujo, r.david.murray, bochecha messages:
+ msg130098
|
| 2010-12-21 21:46:58 | r.david.murray | set | nosy:
fdrake, ronaldoussoren, pitrou, eric.smith, tim.golden, kveretennicov, eric.araujo, r.david.murray, bochecha messages:
+ msg124462 |
| 2010-12-21 21:42:25 | bochecha | set | nosy:
fdrake, ronaldoussoren, pitrou, eric.smith, tim.golden, kveretennicov, eric.araujo, r.david.murray, bochecha messages:
+ msg124461 |
| 2010-12-21 21:30:00 | r.david.murray | set | nosy:
fdrake, ronaldoussoren, pitrou, eric.smith, tim.golden, kveretennicov, eric.araujo, r.david.murray, bochecha messages:
+ msg124459 versions:
+ Python 3.3, - Python 3.2 |
| 2010-12-21 08:45:36 | bochecha | set | files:
+ 0001-Curly-brace-expansion-in-glob.patch nosy:
fdrake, ronaldoussoren, pitrou, eric.smith, tim.golden, kveretennicov, eric.araujo, r.david.murray, bochecha messages:
+ msg124423
|
| 2010-12-21 08:44:57 | bochecha | set | files:
- 0001-Curly-brace-expansion-in-glob.patch.old nosy:
fdrake, ronaldoussoren, pitrou, eric.smith, tim.golden, kveretennicov, eric.araujo, r.david.murray, bochecha |
| 2010-12-21 08:44:13 | bochecha | set | files:
+ 0001-Curly-brace-expansion-in-glob.patch.old nosy:
fdrake, ronaldoussoren, pitrou, eric.smith, tim.golden, kveretennicov, eric.araujo, r.david.murray, bochecha messages:
+ msg124422
|
| 2010-12-21 08:43:37 | bochecha | set | files:
- 0001-Curly-brace-expansion-in-glob.patch nosy:
fdrake, ronaldoussoren, pitrou, eric.smith, tim.golden, kveretennicov, eric.araujo, r.david.murray, bochecha |
| 2010-11-28 03:40:21 | eric.araujo | set | messages:
+ msg122614 |
| 2010-10-31 20:47:39 | bochecha | set | files:
- curly-fnmatch.patch |
| 2010-10-31 20:47:15 | bochecha | set | files:
+ 0001-Curly-brace-expansion-in-glob.patch
messages:
+ msg120082 title: Allow curly braces in fnmatch -> Allow curly brace expansion |
| 2010-09-27 20:39:42 | kveretennicov | set | nosy:
+ kveretennicov
|
| 2010-08-17 20:26:02 | terry.reedy | set | resolution: rejected -> stage: committed/rejected -> patch review |
| 2010-08-17 13:18:39 | fdrake | set | nosy:
+ fdrake messages:
+ msg114119
|
| 2010-08-17 13:08:23 | ronaldoussoren | set | nosy:
+ ronaldoussoren messages:
+ msg114115
|
| 2010-08-14 13:45:52 | eric.araujo | set | messages:
+ msg113897 |
| 2010-08-14 11:29:25 | bochecha | set | messages:
+ msg113888 |
| 2010-08-13 15:46:06 | tim.golden | set | nosy:
+ tim.golden messages:
+ msg113789
|
| 2010-08-13 15:37:55 | r.david.murray | set | messages:
+ msg113787 |
| 2010-08-13 13:49:45 | pitrou | set | messages:
+ msg113773 |
| 2010-08-13 13:27:29 | r.david.murray | set | messages:
+ msg113767 |
| 2010-08-13 13:11:23 | eric.smith | set | nosy:
+ eric.smith messages:
+ msg113766
|
| 2010-08-13 13:08:25 | pitrou | set | status: pending -> open
messages:
+ msg113765 |
| 2010-08-13 13:03:50 | r.david.murray | set | status: open -> pending resolution: rejected messages:
+ msg113762
stage: patch review -> committed/rejected |
| 2010-08-13 12:34:23 | eric.araujo | set | nosy:
+ eric.araujo messages:
+ msg113758
|
| 2010-08-13 12:05:54 | r.david.murray | set | versions:
+ Python 3.2 nosy:
+ r.david.murray
messages:
+ msg113756
stage: patch review |
| 2010-08-13 10:23:12 | pitrou | set | nosy:
+ pitrou messages:
+ msg113753
|
| 2010-08-13 08:41:11 | bochecha | set | messages:
+ msg113751 |
| 2010-08-13 08:36:51 | bochecha | create | |