New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
a recursive glob pattern fails to list files in the current directory #69770
Comments
On archlinux during an upgrade, the package manager backups some files in /etc with a .pacnew extension. On my system there are 20 such files, 9 .pacnew files located in /etc and 11 .pacnew files in subdirectories of /etc. The following commands are run from /etc: $ shopt -s globstar
$ ls **/*.pacnew | wc -w
20
$ ls *.pacnew | wc -w
9 With python: $ python
Python 3.6.0a0 (default:72cca30f4707, Nov 2 2015, 14:17:31)
[GCC 5.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import glob
>>> len(glob.glob('./**/*.pacnew', recursive=True))
20
>>> len(glob.glob('*.pacnew'))
9
>>> len(glob.glob('**/*.pacnew', recursive=True))
11 The '**/*.pacnew' pattern does not list the files in /etc, only those located in the subdirectories of /etc. |
I believe this behavior matches the documentation: "If the pattern is followed by an os.sep, only directories and subdirectories match." ('the pattern' being '**') I wonder if '***.pacnew' would work. |
I already don't remember if it was a deliberate design, or just implementation detail. In any case it is not documented.
No, it is not related. It is that './**/' will list only directories, not regular files.
No, only ** as a whole path component works. |
Ah, I see, 'pattern' there means the whole pattern. That certainly isn't clear. |
Likely it was implementation artifact. Current implementation is simpler butter fitted existing glob design. The problem was that '**/a' should list 'a' and 'd/a', but '**/' should list only 'd/', and not ''. Here is a patch that makes '**' to match also zero directories. Old tests were passed, new tests are added to cover this case. |
FWIW the patch looks good to me. I find the code in glob.py difficult to read as it happily joins regular filenames together with os.path.join() or attempts to list the files contained into a regular file (sic). The attached diff makes the code more correct and easier to understand. It is meant to be applied on top of Serhiy's patch. |
glob('invalid_dir/**', recursive=True) triggers the assert that was added by my patch in _rlistdir(). This new patch fixes this: when there is no magic character in the dirname part of a split(), and dirname is not an existing directory, then there is nothing to yield and the processing of pathname must stop (and thus in this case, no call is made to glob2() when basename is '**'). |
New changeset 4532c4f37429 by Serhiy Storchaka in branch '3.5': New changeset 175cd763de57 by Serhiy Storchaka in branch 'default': New changeset fefc10de2775 by Serhiy Storchaka in branch '3.5': New changeset 128e61cb3de2 by Serhiy Storchaka in branch 'default': |
Please open new issue for glob() optimization Xavier. |
New bpo-25596 entered: regular files handled as directories in the glob module. Thanks for fixing this Serhiy. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: