Title: crash error in glob.glob; directories with brackets
Type: Stage:
Components: Extension Modules Versions: Python 2.2
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: progoth, rhettinger, tim.peters
Priority: normal Keywords:

Created on 2003-05-15 16:06 by progoth, last changed 2003-05-17 23:43 by tim.peters. This issue is now closed.

File name Uploaded Description Edit progoth, 2003-05-15 16:06 unzip this and run to see it happen, probably only works on win32 due to using \ as directory delimiter
globfix.patch progoth, 2003-05-15 17:06 the patch I just made to fix this problem
Messages (6)
msg16028 - (view) Author: Steven Scott (progoth) Date: 2003-05-15 16:06
I'm attaching a zip file containing a python file and
directory structure to test this.

I ran into this bug in real life work, so, as contrived
as the bug test may look, it happens.

I was writing a function which recurses through
directories and does stuff with the files it finds.

glob.glob() doesn't return any files inside a directory
named [_]

glob.glob() crashes on a directory named [A--_B].  I
tried a few different combinations of characters inside
brackets, but this was the only one I could get it to
crash on.

the crash happens during the regular expression
compilation, as probably can be surmised by seeing the
characters which cause it ( [] ).  it also may be a
combination of that and using \ as the directory
delimiter since this is win32.

  File "C:\temp\globbug\", line 5, in test
    fs = glob.glob( path + '\\*' )
  File "C:\Python22\lib\", line 24, in glob
    list = glob(dirname)
  File "C:\Python22\lib\", line 37, in glob
    sublist = glob1(dirname, basename)
  File "C:\Python22\lib\", line 50, in glob1
    return fnmatch.filter(names,pattern)
  File "C:\Python22\lib\", line 47, in filter
    _cache[pat] = re.compile(res)
  File "C:\Python22\lib\", line 179, in compile
    return _compile(pattern, flags)
  File "C:\Python22\lib\", line 229, in _compile
    raise error, v # invalid expression
sre_constants.error: bad character range
msg16029 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2003-05-16 04:24
Logged In: YES 

This doesn't seem like a bug to me.  Those strange names 
have the Unix style magic characters in them.  
Unfortunately, brackets are valid file/dir names in 

If anything were changed, I would prefer strengthening the 
magic character recognizer from:
   magic_check = re.compile('[*?[]')
to something that can treat ill-formed bracket expressions 
as being non-magic.

When posting a bug report, please avoid zip files and 
multiple test scripts.  It is enough to include in the text of 
the report something like this:
    glob.glob('[_]/*')   # fails to recognize a win directory
msg16030 - (view) Author: Steven Scott (progoth) Date: 2003-05-16 04:41
Logged In: YES 

brackets are valid file/dir names in unix, too.  in fact, if I'm not mistaken, the 
only 2 characters not allowed in unix file names are / and \0.  I don't see 
how it's not a bug if glob tries to read the files in a directory that exists 
and crashes (or doesn't read them).

as for how it should be fixed, I have no idea.  my patch isn't very elegant.

btw, I just ran this on unix (after changing the \\ to / in the test script) and 
the exact same behavior was exhibited.
msg16031 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2003-05-16 18:20
Logged In: YES 

Okay.  See if you can come-up with a more elegant patch 
that only touches the glob module.  If you can see a 
straight-forward way to test it, then some unittests would 
be nice also.
msg16032 - (view) Author: Steven Scott (progoth) Date: 2003-05-16 19:32
Logged In: YES 

So a co-worker pointed out that you could have directorys
like mine, but say, numbered:
say you wanted a pattern like '[A--_B]?' to get them
all....that's not a valid directory, so it definitely needs
to do some wildcard expansion...but it doesn't need to mess
with what's inside the brackets.
fnmatch probably shouldn't throw an exception in any
case...regardless, we're of the opinion that the only
logical way around this issue of wildcard characters in
filenames is to have the programmer escape stuff manually. 
so r"\[A--_B]?" would be what is needed. 
python/glob/fnmatch can't read the programmer's mind in a
pattern with wildcards which ones are supposed to be pattern
or not.
to take this route, fnmatch would have to be modified to
recognize characters that are \-escaped, because it doesn't
at the moment.
or maybe that's not the best solution.
msg16033 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2003-05-17 23:43
Logged In: YES 

The heart of the problem seems to be the comment in's translate() docstring:

    """Translate a shell PATTERN to a regular expression.

    There is no way to quote meta-characters.
So it looks like an undocumented design limitation.
Date User Action Args
2003-05-15 16:06:09progothcreate