Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a function to escape metacharacters in glob/fnmatch #52649

Closed
georgehu mannequin opened this issue Apr 15, 2010 · 36 comments
Closed

Add a function to escape metacharacters in glob/fnmatch #52649

georgehu mannequin opened this issue Apr 15, 2010 · 36 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@georgehu
Copy link
Mannequin

georgehu mannequin commented Apr 15, 2010

BPO 8402
Nosy @terryjreedy, @ericvsmith, @ezio-melotti, @merwok, @vadmium, @serhiy-storchaka
Files
  • issue8402.patch
  • issue8402.1.patch
  • fnmatch_escape.patch
  • fnmatch_escape_2.patch
  • fnmatch_implementation.py
  • glob_escape.patch
  • glob_escape_2.patch
  • glob_escape_3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2013-11-18.11:09:40.359>
    created_at = <Date 2010-04-15.00:51:25.788>
    labels = ['type-feature', 'library']
    title = 'Add a function to escape metacharacters in glob/fnmatch'
    updated_at = <Date 2013-11-18.11:09:40.358>
    user = 'https://bugs.python.org/georgehu'

    bugs.python.org fields:

    activity = <Date 2013-11-18.11:09:40.358>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2013-11-18.11:09:40.359>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2010-04-15.00:51:25.788>
    creator = 'george.hu'
    dependencies = []
    files = ['27551', '27570', '27579', '27582', '29343', '29380', '32673', '32687']
    hgrepos = []
    issue_num = 8402
    keywords = ['patch', 'needs review']
    message_count = 36.0
    messages = ['103160', '103163', '103164', '103165', '103168', '103171', '103173', '103174', '103175', '106545', '106548', '106550', '109682', '109743', '147434', '172635', '172810', '172919', '172922', '172948', '172951', '172958', '172973', '172977', '172979', '175767', '177000', '183676', '183679', '183966', '203179', '203221', '203274', '203277', '203278', '203279']
    nosy_count = 15.0
    nosy_names = ['terry.reedy', 'eric.smith', 'kveretennicov', 'ezio.melotti', 'eric.araujo', 'mrabarnett', 'l0nwlf', 'george.hu', 'docs@python', 'Aquinas', 'python-dev', 'martin.panter', 'Tilka', 'serhiy.storchaka', 'a1abhishek']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue8402'
    versions = ['Python 3.4']

    @georgehu
    Copy link
    Mannequin Author

    georgehu mannequin commented Apr 15, 2010

    Have this problem in python 2.5.4 under windows.
    I'm trying to return a list of files in a directory by using glob. It keeps returning a empty list until I tested/adjusted folder name by removing "[" character from it. Not sure if this is a bug.

    glob.glob("c:\abc\afolderwith[test]\") returns empty list
    glob.glob("c:\abc\afolderwithtest]\
    ") returns files

    @georgehu georgehu mannequin added the type-bug An unexpected behavior, bug, or error label Apr 15, 2010
    @l0nwlf
    Copy link
    Mannequin

    l0nwlf mannequin commented Apr 15, 2010

    When you do :
    glob.glob("c:\abc\afolderwith[test]\") returns empty list
    It looks for all files in three directories:
    c:\abc\afolderwitht\

    c:\abc\afolderwithe\*
    c:\abc\afolderwiths\*

    Ofcourse they do not exist so it returns empty list

    06:35:05 l0nwlf-MBP:Desktop $ ls -R test
    1 2 3
    06:35:15 l0nwlf-MBP:Desktop $ ls -R test1
    alpha beta gamma

    >>> glob.glob('/Users/l0nwlf/Desktop/test[123]/*')
    ['/Users/l0nwlf/Desktop/test1/alpha', '/Users/l0nwlf/Desktop/test1/beta', '/Users/l0nwlf/Desktop/test1/gamma']

    As you can see, by giving the argument test[123] it looked for test1, test2, test3. Since test1 existed, it gave all the files present within it.

    @ericvsmith
    Copy link
    Member

    See the explanation at http://docs.python.org/library/fnmatch.html#module-fnmatch , which uses the same rules.

    @georgehu
    Copy link
    Mannequin Author

    georgehu mannequin commented Apr 15, 2010

    Ok, what if the name of the directory contains "[]" characters? What is the escape string for that?

    @georgehu georgehu mannequin reopened this Apr 15, 2010
    @ericvsmith
    Copy link
    Member

    The documentation for fnmatch.translate, which is what ultimately gets called, says:
    There is no way to quote meta-characters.
    Sorry.

    If you want to see this changed, you could open a feature request. If you have a patch, that would help!

    You probably want to research what the Unix shells use for escaping globs.

    @l0nwlf
    Copy link
    Mannequin

    l0nwlf mannequin commented Apr 15, 2010

    glob module does not provide what you want.
    As a workaround you can try:

    os.listdir("c:\abc\afolderwith[test]")
    07:02:52 l0nwlf-MBP:Desktop $ ls -R test\[123\]/
    1 2 3
    >>> os.listdir('/Users/l0nwlf/Desktop/test[123]')
    ['1', '2', '3']

    Changing type to 'Feature Request'

    @l0nwlf l0nwlf mannequin added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Apr 15, 2010
    @georgehu
    Copy link
    Mannequin Author

    georgehu mannequin commented Apr 15, 2010

    Well, the listdir doesn't support "wildcard", for example,
    listdir("*.app"). I know the glob is kind of unix shell style expanding, but
    my program is running under windows, it's my tiny script to walk through a
    huge directory in my NAS. And there are many directories named with "[]" and
    "()" characters amid. May the only way is to program a filter on the
    listdir.

    On Wed, Apr 14, 2010 at 6:34 PM, Shashwat Anand <report@bugs.python.org>wrote:

    Shashwat Anand <anand.shashwat@gmail.com> added the comment:

    glob module does not provide what you want.
    As a workaround you can try:

    os.listdir("c:\abc\afolderwith[test]")

    07:02:52 l0nwlf-MBP:Desktop $ ls -R test\[123\]/
    1 2 3
    >>> os.listdir('/Users/l0nwlf/Desktop/test[123]')
    ['1', '2', '3']

    Changing type to 'Feature Request'

    ----------
    status: pending -> open
    type: behavior -> feature request


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue8402\>


    @georgehu georgehu mannequin changed the title glob returns empty list with "[" character in the folder name glob returns empty list with " Apr 15, 2010
    @georgehu
    Copy link
    Mannequin Author

    georgehu mannequin commented Apr 15, 2010

    Well, the listdir doesn't support "wildcard", for example, listdir("*.app"). I know the glob is kind of unix shell style expanding, but my program is running under windows, it's my tiny script to walk through a huge directory in my NAS. And there are many directories named with "[]" and "()" characters amid. May be the only way is to write a filter on the listdir.

    @l0nwlf
    Copy link
    Mannequin

    l0nwlf mannequin commented Apr 15, 2010

    You repeated the same comment twice and added an 'unnamed' file. I assume you did it by mistake.

    @l0nwlf l0nwlf mannequin changed the title glob returns empty list with " glob returns empty list with "[" character in the folder name Apr 15, 2010
    @Aquinas
    Copy link
    Mannequin

    Aquinas mannequin commented May 26, 2010

    Shouldn't the title be updated to indicate the fnmatch is the true source of the behavior (I'm basing this on http://docs.python.org/library/glob.html indicating the fnmatch is invoked by glob). I'm not using glob, but fnmatch in my attempt to find filenames that look like "Ajax_[version2].txt".

    If nothing else, it would have helped me if the documentation would state whether or not the brackets could be escaped. It doesn't appear from my tests (trying "Ajax_\[version2\].txt" and "Ajax_\\[version2\\].txt") that 'escaping' is possible, but if the filter pattern gets turned into a regular expression, I think escaping *would* be possible. Is that a reasonable assumption?

    I'm running 2.5.1 under Windows, and this is my first ever post to the bugs list.

    @Aquinas
    Copy link
    Mannequin

    Aquinas mannequin commented May 26, 2010

    Following up...
    I saw Eric Smith's 2nd note (2010-04-15 @1:27) about fnmatch.translate documentation stating that
    "There is no way to quote meta-characters."

    When I looked at:
    http://docs.python.org/library/fnmatch.html#module-fnmatch

    did not see this statement appear anywhere. Would this absence be because someone is working on making this enhancement?

    @ericvsmith
    Copy link
    Member

    I don't think so. That quote came from the docstring for fnmatch.translate.

    >>> help(fnmatch.translate)
    Help on function translate in module fnmatch:
    translate(pat)
        Translate a shell PATTERN to a regular expression.
        
        There is no way to quote meta-characters.

    @terryjreedy
    Copy link
    Member

    The 3.1.2 doc for fnmatch.translate no longer says "There is no way to quote meta-characters." If that is still true (no quoting method is given that I can see), then that removal is something of a regression.

    @ericvsmith
    Copy link
    Member

    The note about no quoting meta-chars is in the docstring for fnmatch.translate, not the documentation. I still see it in 3.1. I have a to-do item to add this to the actual documentation. I'll add an issue.

    @Tilka
    Copy link
    Mannequin

    Tilka mannequin commented Nov 11, 2011

    As a workaround, it is possible to make every glob character a character set of one character (wrapping it with [] ). The gotcha here is that you can't just use multiple replaces because you would escape the escape brackets.

    Here is a function adapted from [1]:

    def escape_glob(path):
        transdict = {
                '[': '[[]',
                ']': '[]]',
                '*': '[*]',
                '?': '[?]',
                }
        rc = re.compile('|'.join(map(re.escape, transdict)))
        return rc.sub(lambda m: transdict[m.group(0)], path)

    [1] http://www.daniweb.com/software-development/python/code/216636

    @Tilka Tilka mannequin added the stdlib Python modules in the Lib dir label Nov 11, 2011
    @a1abhishek
    Copy link
    Mannequin

    a1abhishek mannequin commented Oct 11, 2012

    i m agree with answer number 6. the resolution mentioned is quite easy and very effectve

    thanks
    http://www.packersmoversdirectory.net/

    @mmaker
    Copy link
    Mannequin

    mmaker mannequin commented Oct 13, 2012

    The attached patch adds support for '\\' escaping to fnmatch, and consequently to glob.

    @merwok
    Copy link
    Member

    merwok commented Oct 14, 2012

    I have comments on the patch but a review link does not appear. Could you update your clone to latest default revision and regenerate the patch? Thanks.

    @merwok merwok changed the title glob returns empty list with "[" character in the folder name Add a way to escape metacharacters in glob/fnmatch Oct 14, 2012
    @mmaker
    Copy link
    Mannequin

    mmaker mannequin commented Oct 14, 2012

    Noblesse oblige :)

    @serhiy-storchaka
    Copy link
    Member

    The attached patch adds support for '\\' escaping to fnmatch, and consequently to glob.

    This is a backward incompatible change. For example glob.glob(r'C:\Program Files\*') will be broken.

    As flacs says a way to escape metacharacters in glob/fnmatch already exists. If someone want to match literal name "Ajax_[version2].txt" it should use pattern "Ajax_[[]version2].txt". Documentation should explicitly mentions such way.

    It will be good also to add new fnmatch.escape() function.

    @serhiy-storchaka serhiy-storchaka added the docs Documentation in the Doc dir label Oct 15, 2012
    @serhiy-storchaka
    Copy link
    Member

    Here is a patch which add fnmatch.escape() function.

    @serhiy-storchaka
    Copy link
    Member

    I am not sure if escape() should support bytes. translate() doesn't.

    @ezio-melotti
    Copy link
    Member

    I think the escaping workaround should be documented in the glob and/or fnmatch docs. This way users can simply do:

    import glob
    glob.glob("c:\abc\afolderwith[[]test]\*")

    rather than

    import glob
    import fnmatch
    glob.glob(fnmatch.escape("c:\abc\afolderwith[test]\") + "*")

    The function might still be useful with patterns constructed programmatically, but I'm not sure how common the problem really is.

    @serhiy-storchaka
    Copy link
    Member

    I think the escaping workaround should be documented in the glob and/or fnmatch docs.

    See bpo-16240. This issue left for enhancement.

    @serhiy-storchaka
    Copy link
    Member

    Patch updated (thanks Ezio for review and comments).

    @serhiy-storchaka serhiy-storchaka changed the title Add a way to escape metacharacters in glob/fnmatch Add a function to escape metacharacters in glob/fnmatch Oct 15, 2012
    @serhiy-storchaka serhiy-storchaka removed the docs Documentation in the Doc dir label Nov 1, 2012
    @ezio-melotti
    Copy link
    Member

    The workaround is now documented.
    I'm still not sure if this should still be added, or if it should be closed as rejected now that the workaround is documented.
    A third option would be adding it as a recipe in the doc, given that the whole functions boils down to a single re.sub (the user can take care of picking the bytes/str regex depending on his input).

    @serhiy-storchaka
    Copy link
    Member

    It is good, if stdlib has function for escaping any special characters, even if this function is simple. There are already escape functions for re and sgml/xml/html.

    Private function glob.glob1 used in Lib/msilib and Tools/msi to prevent unexpected globbing in parent directory name. glob.glob1(dirname, pattern) should be replaced by glob.glob(os.path.join(fnmatch.escape(dirname), pattern) in external code.

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Mar 7, 2013

    I've attached fnmatch_implementation.py, which is a simple pure-Python implementation of the fnmatch function.

    It's not as susceptible to catastrophic backtracking as the current re-based one. For example:

    fnmatch('a' * 50, '*a*' * 50)

    completes quickly.

    @serhiy-storchaka
    Copy link
    Member

    I think it should be a separate issue.

    @serhiy-storchaka
    Copy link
    Member

    Escaping for glob on Windows should not be such trivial. Special characters in the drive part have no special meaning and should not be escaped. I.e. escape('//?/c:/Quo vadis?.txt') should return '//?/c:/Quo vadis[?].txt'. Perhaps we should move the escape function to the glob module (because it is glob's peculiarity).

    Here is a patch for glob.escape().

    @serhiy-storchaka
    Copy link
    Member

    Could anyone please review the patch before feature freeze?

    @serhiy-storchaka
    Copy link
    Member

    Updated patch addresses Ezio's and Eric's comments.

    @serhiy-storchaka
    Copy link
    Member

    Updated patch addresses Eric's comment.

    @ericvsmith
    Copy link
    Member

    Looks good to me.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 18, 2013

    New changeset 5fda36bff39d by Serhiy Storchaka in branch 'default':
    Issue bpo-8402: Added the escape() function to the glob module.
    http://hg.python.org/cpython/rev/5fda36bff39d

    @serhiy-storchaka
    Copy link
    Member

    Thank you Ezio and Eric for your reviews.

    @serhiy-storchaka serhiy-storchaka self-assigned this Nov 18, 2013
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants