Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path.glob() sometimes misses files that match #83075

Closed
parmentelat mannequin opened this issue Nov 22, 2019 · 7 comments
Closed

Path.glob() sometimes misses files that match #83075

parmentelat mannequin opened this issue Nov 22, 2019 · 7 comments
Assignees
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes type-bug An unexpected behavior, bug, or error

Comments

@parmentelat
Copy link
Mannequin

parmentelat mannequin commented Nov 22, 2019

BPO 38894
Nosy @blueyed, @pablogsal, @miss-islington, @parmentelat
PRs
  • bpo-38894: Fix pathlib.Path.glob in the presence of symlinks and insufficient permissions #18815
  • [3.8] bpo-38894: Fix pathlib.Path.glob in the presence of symlinks and insufficient permissions (GH-18815) #18830
  • [3.7] bpo-38894: Fix pathlib.Path.glob in the presence of symlinks and insufficient permissions (GH-18815) #18831
  • [WIP/RFC] pathlib: revisit error handling #23025
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/pablogsal'
    closed_at = <Date 2020-03-07.18:12:08.669>
    created_at = <Date 2019-11-22.14:06:07.742>
    labels = ['3.8', 'type-bug', '3.7', '3.9']
    title = 'Path.glob() sometimes misses files that match'
    updated_at = <Date 2020-10-29.12:06:04.323>
    user = 'https://github.com/parmentelat'

    bugs.python.org fields:

    activity = <Date 2020-10-29.12:06:04.323>
    actor = 'blueyed'
    assignee = 'pablogsal'
    closed = True
    closed_date = <Date 2020-03-07.18:12:08.669>
    closer = 'pablogsal'
    components = []
    creation = <Date 2019-11-22.14:06:07.742>
    creator = 'thierry.parmentelat'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 38894
    keywords = ['patch', '3.7regression', '3.8regression']
    message_count = 7.0
    messages = ['357284', '357288', '363537', '363552', '363608', '363609', '363610']
    nosy_count = 5.0
    nosy_names = ['blueyed', 'pablogsal', 'miss-islington', 'thierry.parmentelat', 'Matt Wozniski']
    pr_nums = ['18815', '18830', '18831', '23025']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue38894'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

    @parmentelat
    Copy link
    Mannequin Author

    parmentelat mannequin commented Nov 22, 2019

    I have observed this on a linux box running fedora29

    $ python3 --version
    Python 3.7.5
    $ uname -a
    Linux faraday.inria.fr 5.3.11-100.fc29.x86_64 #1 SMP Tue Nov 12 20:41:25 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
    $ cat /etc/fedora-release
    Fedora release 29 (Twenty Nine)

    ============ steps to reproduce:

    This assumes that /root is not readable by lambda users

    ----- as root:

    # mkdir /tmp/foo
    # cd /tmp/foo
    # touch a b d e
    # ln -s /root/anywhere c

    # ls -l
    total 0
    -rw-r--r-- 1 root root 0 Nov 22 14:51 a
    -rw-r--r-- 1 root root 0 Nov 22 14:51 b
    lrwxrwxrwx 1 root root 14 Nov 22 14:53 c -> /root/anywhere
    -rw-r--r-- 1 root root 0 Nov 22 14:51 d
    -rw-r--r-- 1 root root 0 Nov 22 14:51 e

    ----- as a lambda user:

    we can see all files

    $ ls -l /tmp/foo
    total 0
    -rw-r--r-- 1 root root  0 Nov 22 14:51 a
    -rw-r--r-- 1 root root  0 Nov 22 14:51 b
    lrwxrwxrwx 1 root root 14 Nov 22 14:53 c -> /root/anywhere
    -rw-r--r-- 1 root root  0 Nov 22 14:51 d
    -rw-r--r-- 1 root root  0 Nov 22 14:51 e

    and with glob.glob() too

    In [1]: import glob

    In [2]: for filename in glob.glob("/tmp/foo/*"):
    ...: print(filename)
    ...:
    /tmp/foo/c
    /tmp/foo/e
    /tmp/foo/d
    /tmp/foo/b
    /tmp/foo/a

    BUT Path.glob() is not working as expected

    In [3]: from pathlib import Path

    In [4]: for filename in Path("/tmp/foo/").glob("*"):
    ...: print(filename)
    ...:

    ----- If I now I go back as root and remove the problematic file in /tmp/foo

    # rm /tmp/foo/c

    ----- and try again as a lambda user

    In [5]: for filename in Path("/tmp/foo/").glob("*"):
    ...: print(filename)
    ...:
    /tmp/foo/e
    /tmp/foo/d
    /tmp/foo/b
    /tmp/foo/a

    ============ discussion

    in my case in a real application I was getting *some* files - not an empty list like here.

    I ran strace on that real application
    it's fairly clear from that output that the odd symlink is causing the scanning of all files to break instead of continuing (see snip below)
    of course the order in which files are read from the disk will impact the behaviour, that's why I created the symlink last, that might need to be changed to reproduce successfully in another setup

    ============ strace extract

    <snip>
    getdents64(3, /* 189 entries */, 32768) = 8640
    getdents64(3, /* 0 entries */, 32768) = 0
    close(3) = 0
    stat("/var/lib/rhubarbe-images/centos.ndz", {st_mode=S_IFREG|0644, st_size=1002438656, ...}) = 0
    stat("/var/lib/rhubarbe-images/oai-enb.ndz", {st_mode=S_IFREG|0644, st_size=2840592384, ...}) = 0
    <snip>
    stat("/var/lib/rhubarbe-images/ubuntu-floodlight.ndz", {st_mode=S_IFREG|0644, st_size=2559574016, ...}) = 0
    stat("/var/lib/rhubarbe-images/ndnsim.ndz", {st_mode=S_IFREG|0644, st_size=4153409536, ...}) = 0

    ==> that's the line about the broken symlink in my real app
    stat("/var/lib/rhubarbe-images/push-to-preplab.sh", 0x7ffd3ac4a140) = -1 EACCES (Permission denied)
    ==> and here it stops scanning files while there are still quite a lot to be dealt with

    write(1, "/var/lib/rhubarbe-images/fedora-"..., 82/var/lib/rhubarbe-images/fedora-31.ndz
    /var/lib/rhubarbe-images/fedora-31-ssh.ndz
    ) = 82
    rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fc583705e70}, {sa_handler=0x7fc583936f10, sa_mask=[], \
    sa_flags=SA_RESTORER, sa_restorer=0x7fc583705e70}, 8) = 0
    sigaltstack(NULL, {ss_sp=0x560a7dac3330, ss_flags=0, ss_size=16384}) = 0
    sigaltstack({ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=0}, NULL) = 0
    exit_group(0) = ?
    +++ exited with 0 +++

    @parmentelat parmentelat mannequin added 3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error labels Nov 22, 2019
    @parmentelat
    Copy link
    Mannequin Author

    parmentelat mannequin commented Nov 22, 2019

    to clarify, when I said 'lambda user' I mean regular, non-root user that has no permission to read in /root

    @MattWozniski
    Copy link
    Mannequin

    MattWozniski mannequin commented Mar 6, 2020

    A simple test case for this issue:

    ~>mkdir tmp
    ~>cd tmp
    tmp>touch 1.txt
    tmp>ln -s subdir/file 2.txt
    tmp>touch 3.txt
    tmp>ls -l
    total 0
    -rw-rw-r-- 1 mwoznisk general 0 Mar 6 14:52 1.txt
    lrwxrwxrwx 1 mwoznisk general 11 Mar 6 14:52 2.txt -> subdir/file
    -rw-rw-r-- 1 mwoznisk general 0 Mar 6 14:52 3.txt
    tmp>python3.8 -c "import pathlib; print(list(pathlib.Path('.').glob('*')))"
    [PosixPath('1.txt'), PosixPath('2.txt'), PosixPath('3.txt')]
    tmp>mkdir subdir
    tmp>python3.8 -c "import pathlib; print(list(pathlib.Path('.').glob('*')))"
    [PosixPath('1.txt'), PosixPath('2.txt'), PosixPath('3.txt'), PosixPath('subdir')]

    So far so good, but if the subdirectory isn't readable, things fall apart:

    tmp>chmod 000 subdir
    tmp>python3.8 -c "import pathlib; print(list(pathlib.Path('.').glob('*')))"
    [PosixPath('1.txt')]

    Looks like this is caused by entry.is_dir() in pathlib._WildcardSelector raising a PermissionError when trying to check if a symlink pointing into an unreadable directory is or isn't a directory. EACCESS isn't in IGNORED_ERROS (sic) and so the loop over directory entries is broken out of, and the "except PermissionError:" block in _select_from swallows the exception so that the failure is silent.

    @MattWozniski MattWozniski mannequin added 3.8 only security fixes and removed 3.7 (EOL) end of life labels Mar 6, 2020
    @pablogsal pablogsal self-assigned this Mar 6, 2020
    @pablogsal pablogsal added 3.7 (EOL) end of life 3.9 only security fixes labels Mar 6, 2020
    @pablogsal
    Copy link
    Member

    Ok, I managed to reproduce.

    This seems a regression introduced by #11988 in issue https://bugs.python.org/issue36035.

    @pablogsal
    Copy link
    Member

    New changeset eb7560a by Pablo Galindo in branch 'master':
    bpo-38894: Fix pathlib.Path.glob in the presence of symlinks and insufficient permissions (GH-18815)
    eb7560a

    @miss-islington
    Copy link
    Contributor

    New changeset cca0b31 by Miss Islington (bot) in branch '3.7':
    bpo-38894: Fix pathlib.Path.glob in the presence of symlinks and insufficient permissions (GH-18815)
    cca0b31

    @miss-islington
    Copy link
    Contributor

    New changeset 928b4dd by Miss Islington (bot) in branch '3.8':
    bpo-38894: Fix pathlib.Path.glob in the presence of symlinks and insufficient permissions (GH-18815)
    928b4dd

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants