classification
Title: a recursive glob pattern fails to list files in the current directory
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: pitrou, python-dev, r.david.murray, serhiy.storchaka, xdegaye
Priority: normal Keywords: patch

Created on 2015-11-08 15:54 by xdegaye, last changed 2015-11-10 11:21 by xdegaye. This issue is now closed.

Files
File name Uploaded Description Edit
rglob_zero_dirs.patch serhiy.storchaka, 2015-11-09 07:14 review
rglob_zero_dirs_2.patch serhiy.storchaka, 2015-11-09 08:45 review
rglob_isdir.diff xdegaye, 2015-11-09 13:03
rglob_isdir_2.diff xdegaye, 2015-11-09 18:02
Messages (10)
msg254344 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2015-11-08 15:54
On archlinux during an upgrade, the package manager backups some files in /etc with a .pacnew extension. On my system there are 20 such files, 9 .pacnew files located in /etc and 11 .pacnew files in subdirectories of /etc. The following commands are run from /etc:

    $ shopt -s globstar
    $ ls **/*.pacnew | wc -w
    20
    $ ls *.pacnew | wc -w
    9

With python:

    $ python
    Python 3.6.0a0 (default:72cca30f4707, Nov  2 2015, 14:17:31) 
    [GCC 5.2.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import glob
    >>> len(glob.glob('./**/*.pacnew', recursive=True))
    20
    >>> len(glob.glob('*.pacnew'))
    9
    >>> len(glob.glob('**/*.pacnew', recursive=True))
    11

The '**/*.pacnew' pattern does not list the files in /etc, only those located in the subdirectories of /etc.
msg254352 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-11-08 17:46
I believe this behavior matches the documentation:

  "If the pattern is followed by an os.sep, only directories and subdirectories match."

('the pattern' being '**')

I wonder if '***.pacnew' would work.
msg254354 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-08 18:30
I already don't remember if it was a deliberate design, or just implementation detail. In any case it is not documented.

> I believe this behavior matches the documentation:

No, it is not related. It is that './**/' will list only directories, not regular files.

> I wonder if '***.pacnew' would work.

No, only ** as a whole path component works.
msg254366 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-11-09 04:39
Ah, I see, 'pattern' there means the whole pattern.  That certainly isn't clear.
msg254370 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-09 07:14
Likely it was implementation artifact. Current implementation is simpler butter fitted existing glob design. The problem was that '**/a' should list 'a' and 'd/a', but '**/' should list only 'd/', and not ''.

Here is a patch that makes '**' to match also zero directories. Old tests were passed, new tests are added to cover this case.
msg254386 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2015-11-09 13:03
FWIW the patch looks good to me.

I find the code in glob.py difficult to read as it happily joins regular filenames together with os.path.join() or attempts to list the files contained into a regular file (sic).  The attached diff makes the code more correct and easier to understand. It is meant to be applied on top of Serhiy's patch.
msg254397 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2015-11-09 18:02
glob('invalid_dir/**', recursive=True) triggers the assert that was added by my patch in _rlistdir().

This new patch fixes this: when there is no magic character in the dirname part of a split(), and dirname is not an existing directory, then there is nothing to yield and the processing of pathname must stop (and thus in this case, no call is made to glob2() when basename is '**').
msg254412 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-11-09 21:19
New changeset 4532c4f37429 by Serhiy Storchaka in branch '3.5':
Issue #25584: Fixed recursive glob() with patterns starting with '**'.
https://hg.python.org/cpython/rev/4532c4f37429

New changeset 175cd763de57 by Serhiy Storchaka in branch 'default':
Issue #25584: Fixed recursive glob() with patterns starting with '**'.
https://hg.python.org/cpython/rev/175cd763de57

New changeset fefc10de2775 by Serhiy Storchaka in branch '3.5':
Issue #25584: Added "escape" to the __all__ list in the glob module.
https://hg.python.org/cpython/rev/fefc10de2775

New changeset 128e61cb3de2 by Serhiy Storchaka in branch 'default':
Issue #25584: Added "escape" to the __all__ list in the glob module.
https://hg.python.org/cpython/rev/128e61cb3de2
msg254414 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-09 21:58
Please open new issue for glob() optimization Xavier.
msg254441 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2015-11-10 11:21
New issue 25596 entered: regular files handled as directories in the glob module.

Thanks for fixing this Serhiy.
History
Date User Action Args
2015-11-10 11:21:41xdegayesetmessages: + msg254441
2015-11-09 21:58:34serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg254414

stage: patch review -> resolved
2015-11-09 21:19:30python-devsetnosy: + python-dev
messages: + msg254412
2015-11-09 18:02:23xdegayesetfiles: + rglob_isdir_2.diff

messages: + msg254397
2015-11-09 13:03:51xdegayesetfiles: + rglob_isdir.diff

messages: + msg254386
2015-11-09 08:45:36serhiy.storchakasetfiles: + rglob_zero_dirs_2.patch
2015-11-09 07:14:45serhiy.storchakasetfiles: + rglob_zero_dirs.patch
versions: + Python 3.5
messages: + msg254370

keywords: + patch
stage: patch review
2015-11-09 04:39:57r.david.murraysetmessages: + msg254366
2015-11-08 18:30:59serhiy.storchakasetassignee: serhiy.storchaka
messages: + msg254354
2015-11-08 17:46:12r.david.murraysetnosy: + r.david.murray, pitrou
messages: + msg254352
2015-11-08 15:54:35xdegayecreate