Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a recursive glob pattern fails to list files in the current directory #69770

Closed
xdegaye mannequin opened this issue Nov 8, 2015 · 10 comments
Closed

a recursive glob pattern fails to list files in the current directory #69770

xdegaye mannequin opened this issue Nov 8, 2015 · 10 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@xdegaye
Copy link
Mannequin

xdegaye mannequin commented Nov 8, 2015

BPO 25584
Nosy @pitrou, @bitdancer, @xdegaye, @serhiy-storchaka
Files
  • rglob_zero_dirs.patch
  • rglob_zero_dirs_2.patch
  • rglob_isdir.diff
  • rglob_isdir_2.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2015-11-09.21:58:34.687>
    created_at = <Date 2015-11-08.15:54:35.860>
    labels = ['type-bug', 'library']
    title = 'a recursive glob pattern fails to list files in the current directory'
    updated_at = <Date 2015-11-10.11:21:41.904>
    user = 'https://github.com/xdegaye'

    bugs.python.org fields:

    activity = <Date 2015-11-10.11:21:41.904>
    actor = 'xdegaye'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2015-11-09.21:58:34.687>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2015-11-08.15:54:35.860>
    creator = 'xdegaye'
    dependencies = []
    files = ['40986', '40987', '40988', '40989']
    hgrepos = []
    issue_num = 25584
    keywords = ['patch']
    message_count = 10.0
    messages = ['254344', '254352', '254354', '254366', '254370', '254386', '254397', '254412', '254414', '254441']
    nosy_count = 5.0
    nosy_names = ['pitrou', 'r.david.murray', 'xdegaye', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue25584'
    versions = ['Python 3.5', 'Python 3.6']

    @xdegaye
    Copy link
    Mannequin Author

    xdegaye mannequin commented Nov 8, 2015

    On archlinux during an upgrade, the package manager backups some files in /etc with a .pacnew extension. On my system there are 20 such files, 9 .pacnew files located in /etc and 11 .pacnew files in subdirectories of /etc. The following commands are run from /etc:

        $ shopt -s globstar
        $ ls **/*.pacnew | wc -w
        20
        $ ls *.pacnew | wc -w
        9

    With python:

        $ python
        Python 3.6.0a0 (default:72cca30f4707, Nov  2 2015, 14:17:31) 
        [GCC 5.2.0] on linux
        Type "help", "copyright", "credits" or "license" for more information.
        >>> import glob
        >>> len(glob.glob('./**/*.pacnew', recursive=True))
        20
        >>> len(glob.glob('*.pacnew'))
        9
        >>> len(glob.glob('**/*.pacnew', recursive=True))
        11

    The '**/*.pacnew' pattern does not list the files in /etc, only those located in the subdirectories of /etc.

    @xdegaye xdegaye mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Nov 8, 2015
    @bitdancer
    Copy link
    Member

    I believe this behavior matches the documentation:

    "If the pattern is followed by an os.sep, only directories and subdirectories match."

    ('the pattern' being '**')

    I wonder if '***.pacnew' would work.

    @serhiy-storchaka
    Copy link
    Member

    I already don't remember if it was a deliberate design, or just implementation detail. In any case it is not documented.

    I believe this behavior matches the documentation:

    No, it is not related. It is that './**/' will list only directories, not regular files.

    I wonder if '***.pacnew' would work.

    No, only ** as a whole path component works.

    @serhiy-storchaka serhiy-storchaka self-assigned this Nov 8, 2015
    @bitdancer
    Copy link
    Member

    Ah, I see, 'pattern' there means the whole pattern. That certainly isn't clear.

    @serhiy-storchaka
    Copy link
    Member

    Likely it was implementation artifact. Current implementation is simpler butter fitted existing glob design. The problem was that '**/a' should list 'a' and 'd/a', but '**/' should list only 'd/', and not ''.

    Here is a patch that makes '**' to match also zero directories. Old tests were passed, new tests are added to cover this case.

    @xdegaye
    Copy link
    Mannequin Author

    xdegaye mannequin commented Nov 9, 2015

    FWIW the patch looks good to me.

    I find the code in glob.py difficult to read as it happily joins regular filenames together with os.path.join() or attempts to list the files contained into a regular file (sic). The attached diff makes the code more correct and easier to understand. It is meant to be applied on top of Serhiy's patch.

    @xdegaye
    Copy link
    Mannequin Author

    xdegaye mannequin commented Nov 9, 2015

    glob('invalid_dir/**', recursive=True) triggers the assert that was added by my patch in _rlistdir().

    This new patch fixes this: when there is no magic character in the dirname part of a split(), and dirname is not an existing directory, then there is nothing to yield and the processing of pathname must stop (and thus in this case, no call is made to glob2() when basename is '**').

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 9, 2015

    New changeset 4532c4f37429 by Serhiy Storchaka in branch '3.5':
    Issue bpo-25584: Fixed recursive glob() with patterns starting with '**'.
    https://hg.python.org/cpython/rev/4532c4f37429

    New changeset 175cd763de57 by Serhiy Storchaka in branch 'default':
    Issue bpo-25584: Fixed recursive glob() with patterns starting with '**'.
    https://hg.python.org/cpython/rev/175cd763de57

    New changeset fefc10de2775 by Serhiy Storchaka in branch '3.5':
    Issue bpo-25584: Added "escape" to the __all__ list in the glob module.
    https://hg.python.org/cpython/rev/fefc10de2775

    New changeset 128e61cb3de2 by Serhiy Storchaka in branch 'default':
    Issue bpo-25584: Added "escape" to the __all__ list in the glob module.
    https://hg.python.org/cpython/rev/128e61cb3de2

    @serhiy-storchaka
    Copy link
    Member

    Please open new issue for glob() optimization Xavier.

    @xdegaye
    Copy link
    Mannequin Author

    xdegaye mannequin commented Nov 10, 2015

    New bpo-25596 entered: regular files handled as directories in the glob module.

    Thanks for fixing this Serhiy.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants