classification
Title: crash error in glob.glob; directories with brackets
Type: Stage:
Components: Extension Modules Versions: Python 2.2
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: progoth, rhettinger, tim.peters
Priority: normal Keywords:

Created on 2003-05-15 16:06 by progoth, last changed 2003-05-17 23:43 by tim.peters. This issue is now closed.

Files
File name Uploaded Description Edit
globbug.zip progoth, 2003-05-15 16:06 unzip this and run bug.py to see it happen, probably only works on win32 due to using \ as directory delimiter
globfix.patch progoth, 2003-05-15 17:06 the patch I just made to fix this problem
Messages (6)
msg16028 - (view) Author: Steven Scott (progoth) Date: 2003-05-15 16:06
I'm attaching a zip file containing a python file and
directory structure to test this.

I ran into this bug in real life work, so, as contrived
as the bug test may look, it happens.

I was writing a function which recurses through
directories and does stuff with the files it finds.

glob.glob() doesn't return any files inside a directory
named [_]

glob.glob() crashes on a directory named [A--_B].  I
tried a few different combinations of characters inside
brackets, but this was the only one I could get it to
crash on.

the crash happens during the regular expression
compilation, as probably can be surmised by seeing the
characters which cause it ( [] ).  it also may be a
combination of that and using \ as the directory
delimiter since this is win32.

  File "C:\temp\globbug\bug.py", line 5, in test
    fs = glob.glob( path + '\\*' )
  File "C:\Python22\lib\glob.py", line 24, in glob
    list = glob(dirname)
  File "C:\Python22\lib\glob.py", line 37, in glob
    sublist = glob1(dirname, basename)
  File "C:\Python22\lib\glob.py", line 50, in glob1
    return fnmatch.filter(names,pattern)
  File "C:\Python22\lib\fnmatch.py", line 47, in filter
    _cache[pat] = re.compile(res)
  File "C:\Python22\lib\sre.py", line 179, in compile
    return _compile(pattern, flags)
  File "C:\Python22\lib\sre.py", line 229, in _compile
    raise error, v # invalid expression
sre_constants.error: bad character range
msg16029 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2003-05-16 04:24
Logged In: YES 
user_id=80475

This doesn't seem like a bug to me.  Those strange names 
have the Unix style magic characters in them.  
Unfortunately, brackets are valid file/dir names in 
Windows.

If anything were changed, I would prefer strengthening the 
magic character recognizer from:
   magic_check = re.compile('[*?[]')
to something that can treat ill-formed bracket expressions 
as being non-magic.

When posting a bug report, please avoid zip files and 
multiple test scripts.  It is enough to include in the text of 
the report something like this:
    glob.glob('[_]/*')   # fails to recognize a win directory
    
msg16030 - (view) Author: Steven Scott (progoth) Date: 2003-05-16 04:41
Logged In: YES 
user_id=61663

brackets are valid file/dir names in unix, too.  in fact, if I'm not mistaken, the 
only 2 characters not allowed in unix file names are / and \0.  I don't see 
how it's not a bug if glob tries to read the files in a directory that exists 
and crashes (or doesn't read them).

as for how it should be fixed, I have no idea.  my patch isn't very elegant.

btw, I just ran this on unix (after changing the \\ to / in the test script) and 
the exact same behavior was exhibited.
msg16031 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2003-05-16 18:20
Logged In: YES 
user_id=80475

Okay.  See if you can come-up with a more elegant patch 
that only touches the glob module.  If you can see a 
straight-forward way to test it, then some unittests would 
be nice also.
msg16032 - (view) Author: Steven Scott (progoth) Date: 2003-05-16 19:32
Logged In: YES 
user_id=61663

So a co-worker pointed out that you could have directorys
like mine, but say, numbered:
[A--_B]1
[A--_B]2
etc
say you wanted a pattern like '[A--_B]?' to get them
all....that's not a valid directory, so it definitely needs
to do some wildcard expansion...but it doesn't need to mess
with what's inside the brackets.
fnmatch probably shouldn't throw an exception in any
case...regardless, we're of the opinion that the only
logical way around this issue of wildcard characters in
filenames is to have the programmer escape stuff manually. 
so r"\[A--_B]?" would be what is needed. 
python/glob/fnmatch can't read the programmer's mind in a
pattern with wildcards which ones are supposed to be pattern
or not.
to take this route, fnmatch would have to be modified to
recognize characters that are \-escaped, because it doesn't
at the moment.
or maybe that's not the best solution.
msg16033 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2003-05-17 23:43
Logged In: YES 
user_id=31435

The heart of the problem seems to be the comment in 
fnmatch.py's translate() docstring:

    """Translate a shell PATTERN to a regular expression.

    There is no way to quote meta-characters.
    """
So it looks like an undocumented design limitation.
History
Date User Action Args
2003-05-15 16:06:09progothcreate