Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fnmatch, glob: Allow curly brace expansion #53793

Open
bochecha mannequin opened this issue Aug 13, 2010 · 55 comments
Open

fnmatch, glob: Allow curly brace expansion #53793

bochecha mannequin opened this issue Aug 13, 2010 · 55 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@bochecha
Copy link
Mannequin

bochecha mannequin commented Aug 13, 2010

BPO 9584
Nosy @freddrake, @ronaldoussoren, @pitrou, @ericvsmith, @tjguk, @ezio-melotti, @merwok, @stevendaprano, @bitdancer, @bochecha, @Julian, @serhiy-storchaka, @csabella, @matusvalo
PRs
  • bpo-9584: Added brace expressions to the glob and fnmatch. #6299
  • Files
  • 0001-Curly-brace-expansion-in-glob.patch: Curly brace expansion in glob
  • 0002-Remove-unused-import.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2010-08-13.08:36:51.246>
    labels = ['type-feature', 'library']
    title = 'fnmatch, glob: Allow curly brace expansion'
    updated_at = <Date 2019-05-14.12:56:15.952>
    user = 'https://github.com/bochecha'

    bugs.python.org fields:

    activity = <Date 2019-05-14.12:56:15.952>
    actor = 'cheryl.sabella'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2010-08-13.08:36:51.246>
    creator = 'bochecha'
    dependencies = []
    files = ['21001', '21002']
    hgrepos = []
    issue_num = 9584
    keywords = ['patch']
    message_count = 55.0
    messages = ['113750', '113751', '113753', '113756', '113758', '113762', '113765', '113766', '113767', '113773', '113787', '113789', '113888', '113897', '114115', '114119', '120082', '122614', '124422', '124423', '124459', '124461', '124462', '130098', '130099', '130506', '131613', '131622', '135558', '135587', '135588', '174712', '174879', '174880', '174882', '174883', '174916', '174919', '174938', '174963', '174974', '174978', '174980', '174984', '174991', '174995', '174996', '174997', '175001', '175190', '175193', '175217', '314633', '335173', '342458']
    nosy_count = 18.0
    nosy_names = ['fdrake', 'ronaldoussoren', 'pitrou', 'eric.smith', 'tim.golden', 'kveretennicov', 'ezio.melotti', 'eric.araujo', 'Arfrever', 'steven.daprano', 'r.david.murray', 'bochecha', 'Julian', 'python-dev', 'serhiy.storchaka', 'ysangkok', 'cheryl.sabella', 'Mat\xc3\xba\xc5\xa1 Valo']
    pr_nums = ['6299']
    priority = 'low'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue9584'
    versions = ['Python 3.5']

    @bochecha
    Copy link
    Mannequin Author

    bochecha mannequin commented Aug 13, 2010

    The attached patch allows for shell curly braces with fnmatch.filter().

    This makes the following possible:
    >>> import fnmatch
    >>> import os
    >>>
    >>> for file in os.listdir('.'):
    ...     if fnmatch.fnmatch(file, '*.{txt,csv}'):
    ...         print file
    ...
    file.csv
    file.txt
    foo.txt
    
    This is especially convenient with the glob module:
    >>> import glob
    >>> glob.glob('*.{txt,csv}')
    ['file.csv', 'file.txt', 'foo.txt']

    Hopefully, this makes fnmatch match better the behavior that people expect from a shell-style pattern matcher.

    Please note: I attached a patch that applies on the Python trunk, but only tested it on Python 2.5 on Windows. However, the fnmatch module doesn't seem to have changed substantially in between.

    @bochecha bochecha mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Aug 13, 2010
    @bochecha
    Copy link
    Mannequin Author

    bochecha mannequin commented Aug 13, 2010

    The attached patch allows for shell curly braces with fnmatch.filter().

    Oops, I meant that it allows for curly braces in fnmatch.translate(), which makes it available in the whole fnmatch module.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 13, 2010

    Thanks for the patch.

    + if j < n and pat[j] == '}':
    + j = j+1

    I don't get what the purpose of these two lines is. Forbid empty patterns?

    + while i < n and pat[j] != '}':
    + j = j+1

    You probably mean "while j < n" instead of "while i < n".
    Regardless, it's simpler to use "j = pat.find('}', j)".

    You should also add a test for unmatched braces. Currently:

    $ ./python -c "import fnmatch; print(fnmatch.translate('{x'))"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/antoine/py3k/__svn__/Lib/fnmatch.py", line 129, in translate
        while i < n and pat[j] != '}':
    IndexError: string index out of range

    @bitdancer
    Copy link
    Member

    Thanks for this suggestion and patch.

    In general I think more tests would be good, A test for {} would clarify what you are expecting there.

    @merwok
    Copy link
    Member

    merwok commented Aug 13, 2010

    In Bash, * and ? match only characters in existing files, but {a,b} always produces two filenames, even if the files don’t exist. Do we want to mimic this behavior in fnmatch?

    @bitdancer
    Copy link
    Member

    Ah, I had forgotten that detail, Éric.

    No, it doesn't seem as if implementing braces as matchers is appropriate. fnmatch is only implementing the shell file name globbing. Doing the equivalent of brace expansion would have to be done before applying globbing, to be consistent with the shell. Which is too bad.

    Unfortunately I think we should probably reject this, though it could be discussed on python-ideas to see if the idea can lead to something both consistent with the shell and useful.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 13, 2010

    Doing the equivalent of brace expansion would have to be done before
    applying globbing, to be consistent with the shell.

    I don't get the "shell consistency" argument. First, there is no single definition of "the shell". Second, users of Python generally don't care what a shell would do, they simply want to achieve useful tasks (which filename matching is arguably part of).

    @ericvsmith
    Copy link
    Member

    I'm not sure it has to be consistent with the shell to be useful, as long as the behavior is documented and we possibly add a note explaining the differences from the shell. But I agree that a discussion on python-ideas would be helpful.

    @bitdancer
    Copy link
    Member

    My view is that people using fnmatch/glob are expecting to get back the same list of files that they would if they ran 'echo <globpattern>' in the shell. The major shells (sh, bash, zsh, csh) seem to be pretty consistent in this regard (though sh does less brace expansion than the others...but is almost always actually bash these days).

    If you just wanted to provide a flexible way for people to match files, then instead of fnmatch/glob, we should have a function that walks down a directory tree applying a regular expression to the filenames it encounters and returning the rooted pathnames of the matches. That function is easy enough to write using standard library facilities. The special magic of fnmatch/glob is that it does a not-so-easy-to-get-right transformation of *shell* globbing rules into regular expressions behind the scenes. That is, in my view its *purpose* is to be compatible with the "normal rules" for unix shell globbing.

    So currently I'm about -0.5 on this feature.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 13, 2010

    My view is that people using fnmatch/glob are expecting to get back
    the same list of files that they would if they ran 'echo
    <globpattern>' in the shell.

    But it's not the case since we currently don't process braces anyway.

    The major shells (sh, bash, zsh, csh) seem to be pretty consistent in
    this regard (though sh does less brace expansion than the others...but
    is almost always actually bash these days).

    Excluding the 95% (or so) of Windows users, I suppose.

    The special magic of fnmatch/glob is that it does a
    not-so-easy-to-get-right transformation of *shell* globbing rules into
    regular expressions behind the scenes. That is, in my view its
    *purpose* is to be compatible with the "normal rules" for unix shell
    globbing.

    I've never thought that the purpose of glob or fnmatch was to reproduce
    shell rules. It's simply a convenient primitive. Wildcard expansion
    exists in lots of other software than Unix shells.

    @bitdancer
    Copy link
    Member

    Well, Windows supports * and ? globs, but not brace expansion, as far as I can tell (at least on XP, which is what I currently have access to).

    In fact, I don't believe I've run into brace expansion anywhere except in the unix shell, whereas as you say * and ? globbing is fairly common, so that might be another reason *not* to add it :)

    Unfortunately for that argument, Windows XP CMD doesn't appear to support [] globbing.

    I'm not going to try block this if other people want it. As you say, there is no real standard here to adhere to.

    @tjguk
    Copy link
    Member

    tjguk commented Aug 13, 2010

    I don't see any reason to turn this down except, perhaps, for keeping something simple.

    Certainly I don't believe that Windows users will be confused by the fact that there are wildcards other than "*" and "?". fnmatch already implements [] and [!] which are not supported on Windows.

    @bochecha
    Copy link
    Mannequin Author

    bochecha mannequin commented Aug 14, 2010

    Wow, I certainly didn't expect to generate so much controversy. :-/

    First of all, thanks for the comments on the patch Antoine and David.

    I don't get what the purpose of these two lines is. Forbid empty patterns?

    I copy-pasted the handling of the '[' character in the same file, and adapted it. This test was to properly handle sequences like '[]]', and you are right, it has nothing to do in this patch, I just forgot to remove it.

    You probably mean "while j < n" instead of "while i < n".

    Yes, that's a typo. :-/

    Regardless, it's simpler to use "j = pat.find('}', j)".

    I know, I just thought I would try to remain consistent with the way the '[' char was handled.

    You should also add a test for unmatched braces. Currently:

    I realised that after submitting the patch, yes. Actually, there are several other cases that I didn't properly handle, like a closing brace without a matching opening brace, or even nested braces (which are perfectly acceptable in the context of a shell like Bash).

    I'm working on an improved patch that would correctly handle those cases (with much more unit tests!), I guess I just hit the submit button too quickly. :)

    ---

    Now, about whether or not this is appropriate in fnmatch, I agree with David that if we want to remain really consistent with shell implementations, then fnmatch probably isn't the appropriate place to do so.

    In this case, I guess the correct way to implement it would be to expand the braces and generate several patterns that would all be fed to different fnmatch calls?

    Implementing it in fnmatch just seemed so convenient, replacing the braces with '(...|...)' constructs in a regex.

    People seem to agree that a thread on python-ideas would be good to discuss this change, but this ticket already generated some discussion. Should I start the thread on the mailing-list anyway or is this ticket an appropriate forum for further discussion?

    @merwok
    Copy link
    Member

    merwok commented Aug 14, 2010

    python-idea is read by more people.

    @ronaldoussoren
    Copy link
    Contributor

    I agree with Antoine that this would be useful functionality and that matching "the" shell is futile here.

    A quick check on an old linux server: bash and ksh do brace expansion before expanding '*', but that csh does both at the same time.

    That is, in a directory with foo.py and no .h files 'echo *.{py,h}' returns foo.py with csh and '*.h foo.py' with bash.

    I'm +1 on matching the behavior of csh here.

    @freddrake
    Copy link
    Member

    It's worth noting that the sh-like shells are far more widely used than the csh-like shells, so csh-like behavior may surprise more people.

    From the sh-like shell perspective, the {...,...} syntax just isn't part of the globbing handling.

    @bochecha
    Copy link
    Mannequin Author

    bochecha mannequin commented Oct 31, 2010

    I finally found the time to follow up on this issue, sorry for the absence of response.

    The thread on Python-Ideas didn't really lead to a consensus (nor did it generate a lot of discussion).

    Some wanted to see this in fnmatch, others in glob and others in shutils. Most thought glob was the appropriate place though, and this is also my opinion.

    From the Python documentation, fnmatch is a « Unix filename pattern matching » while glob is a « Unix style pathname pattern expansion ».

    This makes it clear to me that curly expansion has its place in glob, that would then use fnmatch to match the resulting list of expanded paths.

    Here is a patch against the py3k branch.

    The patch contains both the implementation, unit tests, and some changes to the documentation.

    Note that could I only run the unit tests on Linux (Fedora 14 x86_64) which is the only system I have at hand.

    @bochecha bochecha mannequin changed the title Allow curly braces in fnmatch Allow curly brace expansion Oct 31, 2010
    @merwok
    Copy link
    Member

    merwok commented Nov 28, 2010

    Latest patch looks good.

    @bochecha
    Copy link
    Mannequin Author

    bochecha mannequin commented Dec 21, 2010

    Same patch, but rebased to the current trunk so it still applies.

    @bochecha
    Copy link
    Mannequin Author

    bochecha mannequin commented Dec 21, 2010

    This is the right patch, sorry for all the mail spam. :-/

    @bitdancer
    Copy link
    Member

    Thanks for the research and the updated patch. Unfortunately as a feature request this is going to have to wait for 3.3 since we missed the pre-beta window.

    @bochecha
    Copy link
    Mannequin Author

    bochecha mannequin commented Dec 21, 2010

    Thanks for the research and the updated patch. Unfortunately as
    a feature request this is going to have to wait for 3.3 since we
    missed the pre-beta window.

    Ok.

    This is my first patch to Python, so I'm not sure what I should do to get this in.

    Is keeping the patch in sync with the trunk enough? Is there something else, like some more formal process to follow?

    @bitdancer
    Copy link
    Member

    Nope, you've got it.

    After the final release of Python 3.2, please post to the issue to remind us about it, and someone will commit the patch. (For future Python releases we expect that the delays in our ability to commit feature patches will be much shorter, but this is the way it works right now.)

    @bochecha
    Copy link
    Mannequin Author

    bochecha mannequin commented Mar 5, 2011

    So, now that Python 3.2 was released, here is a patch rebased on top of the py3k branch.

    @bochecha
    Copy link
    Mannequin Author

    bochecha mannequin commented Mar 5, 2011

    The sys module is imported in glob but never used.

    It's not related to this feature request but I saw it when implementing the patch, so here is a second patch removing the import.

    @ericvsmith
    Copy link
    Member

    I removed the unused import (mostly as a simple test of mercurial, it's my first commit there).

    @bochecha
    Copy link
    Mannequin Author

    bochecha mannequin commented Mar 21, 2011

    "I removed the unused import (mostly as a simple test of mercurial, it's my first commit there)."

    Does it mean that Python development is not being done in SVN, as the documentations state it?

    My patches have all been based on the SVN py3k branch, please tell me if I must base them on something else instead.

    @tjguk
    Copy link
    Member

    tjguk commented May 9, 2011

    I've just rebuilt on Windows against tip. test_glob is failing:

    test test_glob failed -- Traceback (most recent call last):
      File "c:\work-in-progress\python\cpython-9584\lib\test\test_glob.py", line 135, in test_glob_curly_braces
        os.path.join('a', 'bcd', 'efg')]))
      File "c:\work-in-progress\python\cpython-9584\lib\test\test_glob.py", line 53, in assertSequencesEqual_noorder
        self.assertEqual(set(l1), set(l2))
    AssertionError: Items in the first set but not the second:
    '@test_2788_tmp_dir\\a/bcd\\efg'
    '@test_2788_tmp_dir\\a/bcd\\EF'
    Items in the second set but not the first:
    '@test_2788_tmp_dir\\a\\bcd\\EF'
    '@test_2788_tmp_dir\\a\\bcd\\efg'

    @ezio-melotti
    Copy link
    Member

    + if sub.find(',') != -1:
    Please use the 'in' operator here.

    @akuchling
    Copy link
    Member

    Help is still needed to debug the failing test_glob.py on Windows.

    I just tried this patch against 3.4trunk. The first hunk of the glob.py patch doesn't apply because 'for name in glob1(None, basename): yield name' was changed to 'yield from glob1(None, basename)'. The tests still pass w/ Ezio's suggested change to use (',' in sub).

    @tjguk
    Copy link
    Member

    tjguk commented Nov 5, 2012

    I've got a patch for this which applies cleanly to the 3.4 tip. I still need to sort out the Windows issues (which I don't think will be difficult; it looks like a test issue, not a code issue)

    @tjguk tjguk self-assigned this Nov 5, 2012
    @bochecha
    Copy link
    Mannequin Author

    bochecha mannequin commented Nov 5, 2012

    I have to apologize for not following up on this patch. At first I had no time to go on pushing for it, and then (after a change of job), I completely forgot about it. :(

    I guess rebasing the patch on the latest tip is not that useful if you already have done it Tim, and I don't have access to a Windows box to figure out the issue in the tests.

    At this point is there anything else I can do to help getting it integrated?

    @tjguk
    Copy link
    Member

    tjguk commented Nov 5, 2012

    I'm planning to refactor the tests and the code very slightly. When I've
    got a reworked patch I'll ping it back to you to ensure it matches your
    intent. IIUC you're implementing comma-separated lists {abc,def} and
    nested braces {a{b,c}d,efg} but not ranges {a..z}.

    @bochecha
    Copy link
    Mannequin Author

    bochecha mannequin commented Nov 5, 2012

    " IIUC you're implementing comma-separated lists {abc,def} and nested braces {a{b,c}d,efg} but not ranges {a..z}."

    Exactly.

    Although that's just because at the time I sent the patch, I didn't know about ranges in shells.

    So I just implemented the part of curly brace expansion that I knew of and felt would be useful.

    If ranges are an expected feature from shells, then it would probably be a worthwhile addition.

    @tjguk
    Copy link
    Member

    tjguk commented Nov 5, 2012

    Attached is a refactored version of Mathieu's patch which, when applied to tip, passes all tests.

    @tjguk
    Copy link
    Member

    tjguk commented Nov 5, 2012

    Something went wrong with that patch; it doesn't include all the changes to test_glob. I'll upload a newer patch later.

    @tjguk
    Copy link
    Member

    tjguk commented Nov 5, 2012

    Scratch that last comment: the patch does apply. I've tested it against Windows & Ubuntu. If no one comes in with any objections I'll commit it within the next day.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 6, 2012

    New changeset dafca4714298 by Tim Golden in branch 'default':
    bpo-9584: Add {} list expansion to glob. Original patch by Mathieu Bridon
    http://hg.python.org/cpython/rev/dafca4714298

    @serhiy-storchaka
    Copy link
    Member

    >>> glob.glob('*.ac')
    ['configure.ac']
    >>> glob.glob('*.sub')
    ['config.sub']
    >>> glob.glob('*.{sub,ac}')
    ['config.sub']

    And since these changes are backward incompatible (and moreover, now it is impossible to glob for paths that contain braces), I would prefer a keyword to switch this option (off by default) or a separate function. See "man glob" [1] for flag GLOB_BRACE.

    [1] http://linux.die.net/man/3/glob

    @bochecha
    Copy link
    Mannequin Author

    bochecha mannequin commented Nov 6, 2012

    >>> glob.glob('*.{sub,ac}')
    ['config.sub']

    I'm surprised this broke, this is one of the behaviour I thought I had implemented in my original patch. :-/

    (and moreover, now it is impossible to glob for paths that contain braces)

    I am absolutely sure this was working in my original submission, I had even added unit tests for it:
    + # test some edge cases where braces must not be expanded
    + eq(self.glob('c{}d'), [self.norm('c{}d')])
    + eq(self.glob('c{d{e,f}g'), map(self.norm, ['c{deg', 'c{dfg']))
    + eq(self.glob('c{d,e}{f}g'), map(self.norm, ['cd{f}g', 'ce{f}g']))
    + eq(self.glob('c{d,e}f}g'), map(self.norm, ['cdf}g', 'cef}g']))

    @tjguk
    Copy link
    Member

    tjguk commented Nov 6, 2012

    Must have been something I did. I'll revert the commit and re-test.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 6, 2012

    New changeset ec897bb38708 by Tim Golden in branch 'default':
    Reversed changes from bpo-9584
    http://hg.python.org/cpython/rev/ec897bb38708

    @serhiy-storchaka
    Copy link
    Member

    I am absolutely sure this was working in my original submission, I had even added unit tests for it:

    It can't be working in any implementation.

    >>> os.makedirs('a{b,c}d/e')
    >>> os.listdir('a{b,c}d')
    ['e']
    >>> glob.glob('a{b,c}d/*')
    []

    Was ['a{b,c}d/e'] in 3.3.

    @tjguk
    Copy link
    Member

    tjguk commented Nov 6, 2012

    Well even in the original [working] version, the scope of this change
    was limited to glob.glob. os.listdir doesn't currently support any form
    of expansion (at least not on Windows) and nor does os.makedirs. I don't
    see any problem in restricting this change -- and any future extensions
    -- to glob.glob whose purpose is precisely to return one or more files
    matching a, possibly wildcard, pattern.

    @serhiy-storchaka
    Copy link
    Member

    I'm talking about glob.glob(), not about os.listdir(). Bakward incompatible features should be off by default.

    @tjguk
    Copy link
    Member

    tjguk commented Nov 6, 2012

    Sorry, I misunderstood the point you were making with the
    os.listdir/makedirs examples. Fair point about backwards compatibility.
    This may make this change untenable as no-one will want a series of
    use_feature_xxx flags, one for each change we introduce to glob.glob.
    Unless we throw in every known expansion / matching right now and have a
    single use-extended-features flag.

    @serhiy-storchaka
    Copy link
    Member

    This may make this change untenable as no-one will want a series of
    use_feature_xxx flags, one for each change we introduce to glob.glob.
    Unless we throw in every known expansion / matching right now and have a
    single use-extended-features flag.

    This is another question. glob.glob(pattern, glob.GLOB_XXX | glob.GLOB_YYY) looks not much shorter than glob.glob(pattern, use_xxx=True, use_yyy=True). And not all features can be configured by only one bit. See os.access() and os.utime() as examples for many options. I hope that before 3.4 feature freeze we can change the interface, if needed.

    I also believe that this feature may not be regarded as mature without any sort of escaping (which also should be off by default for compatibility). And we need glob.escape() function for escaping arbitrary path.

    Since it is quite a complicated function, we need more examples in the documentation, which show different corner cases, nesting and interaction with wildcards.

    @bitdancer
    Copy link
    Member

    Given the backward compatibility concerns, and the fact that brace expansion and wildcard expansion are conceptually separate operations, perhaps what we should have a is a glob.expand_braces (or some such name) function? (Note: I haven't looked at whether or not this would actually work as an API, I'm just throwing out an idea.)

    @serhiy-storchaka
    Copy link
    Member

    Given the backward compatibility concerns, and the fact that brace expansion and wildcard expansion are conceptually separate operations, perhaps what we should have a is a glob.expand_braces (or some such name) function?

    In this case it may be appropriate to place this function into shutil or string module.

    @tjguk
    Copy link
    Member

    tjguk commented Nov 9, 2012

    Given that this isn't going to go ahead in its current form, and will need wider discussion on python-dev, I'm unassigning myself and I've removed the flawed version of the patch which I'd posted.

    @tjguk tjguk removed their assignment Nov 9, 2012
    @vstinner vstinner changed the title Allow curly brace expansion fnmatch, glob: Allow curly brace expansion Dec 5, 2014
    @matusvalo
    Copy link
    Mannequin

    matusvalo mannequin commented Mar 29, 2018

    Hi All,

    I have created new patch for handling curly brace expansion. The patch is available on Github PR bpo-6299

    @matusvalo
    Copy link
    Mannequin

    matusvalo mannequin commented Feb 10, 2019

    Hi All,

    this is a humble ping.

    @csabella
    Copy link
    Contributor

    @matúš Valo, thank you for the pull request and for your interest in contributing to CPython. It seems that consensus hadn't been reached during the previous discussion on this issue, which is why msg175217 suggested creating a discussion on python-dev. Most likely, your patch won't be reviewed until some more discussion takes place. Please open a topic on python-dev and then link to that topic once the details are hashed out. Thanks!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    Status: No status
    Development

    No branches or pull requests