classification
Title: Path.glob() sometimes misses files that match
Type: behavior Stage: resolved
Components: Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: pablogsal Nosy List: Matt Wozniski, miss-islington, pablogsal, thierry.parmentelat
Priority: normal Keywords: 3.7regression, 3.8regression, patch

Created on 2019-11-22 14:06 by thierry.parmentelat, last changed 2020-03-07 18:12 by pablogsal. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 18815 merged pablogsal, 2020-03-06 21:08
PR 18830 merged miss-islington, 2020-03-07 17:53
PR 18831 merged miss-islington, 2020-03-07 17:53
Messages (7)
msg357284 - (view) Author: Thierry Parmentelat (thierry.parmentelat) Date: 2019-11-22 14:06
I have observed this on a linux box running fedora29

$ python3 --version
Python 3.7.5
$ uname -a
Linux faraday.inria.fr 5.3.11-100.fc29.x86_64 #1 SMP Tue Nov 12 20:41:25 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/fedora-release
Fedora release 29 (Twenty Nine)

============ steps to reproduce:

This assumes that /root is not readable by lambda users

----- as root:

# mkdir /tmp/foo
# cd /tmp/foo
# touch a b d e
# ln -s /root/anywhere c

# ls -l
total 0
-rw-r--r-- 1 root root  0 Nov 22 14:51 a
-rw-r--r-- 1 root root  0 Nov 22 14:51 b
lrwxrwxrwx 1 root root 14 Nov 22 14:53 c -> /root/anywhere
-rw-r--r-- 1 root root  0 Nov 22 14:51 d
-rw-r--r-- 1 root root  0 Nov 22 14:51 e


----- as a lambda user:

we can see all files

$ ls -l /tmp/foo
total 0
-rw-r--r-- 1 root root  0 Nov 22 14:51 a
-rw-r--r-- 1 root root  0 Nov 22 14:51 b
lrwxrwxrwx 1 root root 14 Nov 22 14:53 c -> /root/anywhere
-rw-r--r-- 1 root root  0 Nov 22 14:51 d
-rw-r--r-- 1 root root  0 Nov 22 14:51 e

and with glob.glob() too

In [1]: import glob

In [2]: for filename in glob.glob("/tmp/foo/*"):
   ...:     print(filename)
   ...:
/tmp/foo/c
/tmp/foo/e
/tmp/foo/d
/tmp/foo/b
/tmp/foo/a


BUT Path.glob() is not working as expected

In [3]: from pathlib import Path

In [4]: for filename in Path("/tmp/foo/").glob("*"):
   ...:     print(filename)
   ...:



----- If I now I go back as root and remove the problematic file in /tmp/foo

# rm /tmp/foo/c


----- and try again as a lambda user

In [5]: for filename in Path("/tmp/foo/").glob("*"):
   ...:     print(filename)
   ...:
/tmp/foo/e
/tmp/foo/d
/tmp/foo/b
/tmp/foo/a


============ discussion

in my case in a real application I was getting *some* files - not an empty list like here. 

I ran strace on that real application
it's fairly clear from that output that the odd symlink is causing the scanning of all files to break instead of continuing (see snip below)
of course the order in which files are read from the disk will impact the behaviour, that's why I created the symlink last, that might need to be changed to reproduce successfully in another setup



============ strace extract

<snip>
getdents64(3, /* 189 entries */, 32768) = 8640
getdents64(3, /* 0 entries */, 32768)   = 0
close(3)                                = 0
stat("/var/lib/rhubarbe-images/centos.ndz", {st_mode=S_IFREG|0644, st_size=1002438656, ...}) = 0
stat("/var/lib/rhubarbe-images/oai-enb.ndz", {st_mode=S_IFREG|0644, st_size=2840592384, ...}) = 0
<snip>
stat("/var/lib/rhubarbe-images/ubuntu-floodlight.ndz", {st_mode=S_IFREG|0644, st_size=2559574016, ...}) = 0
stat("/var/lib/rhubarbe-images/ndnsim.ndz", {st_mode=S_IFREG|0644, st_size=4153409536, ...}) = 0

==> that's the line about the broken symlink in my real app
stat("/var/lib/rhubarbe-images/push-to-preplab.sh", 0x7ffd3ac4a140) = -1 EACCES (Permission denied)
==> and here it stops scanning files while there are still quite a lot to be dealt with

write(1, "/var/lib/rhubarbe-images/fedora-"..., 82/var/lib/rhubarbe-images/fedora-31.ndz
/var/lib/rhubarbe-images/fedora-31-ssh.ndz
) = 82
rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fc583705e70}, {sa_handler=0x7fc583936f10, sa_mask=[], \
sa_flags=SA_RESTORER, sa_restorer=0x7fc583705e70}, 8) = 0
sigaltstack(NULL, {ss_sp=0x560a7dac3330, ss_flags=0, ss_size=16384}) = 0
sigaltstack({ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=0}, NULL) = 0
exit_group(0)                           = ?
+++ exited with 0 +++
msg357288 - (view) Author: Thierry Parmentelat (thierry.parmentelat) Date: 2019-11-22 14:19
to clarify, when I said 'lambda user' I mean regular, non-root user that has no permission to read in /root
msg363537 - (view) Author: Matt Wozniski (Matt Wozniski) Date: 2020-03-06 20:00
A simple test case for this issue:

~>mkdir tmp
~>cd tmp
tmp>touch 1.txt
tmp>ln -s subdir/file 2.txt
tmp>touch 3.txt
tmp>ls -l
total 0
-rw-rw-r-- 1 mwoznisk general  0 Mar  6 14:52 1.txt
lrwxrwxrwx 1 mwoznisk general 11 Mar  6 14:52 2.txt -> subdir/file
-rw-rw-r-- 1 mwoznisk general  0 Mar  6 14:52 3.txt
tmp>python3.8 -c "import pathlib; print(list(pathlib.Path('.').glob('*')))"
[PosixPath('1.txt'), PosixPath('2.txt'), PosixPath('3.txt')]
tmp>mkdir subdir
tmp>python3.8 -c "import pathlib; print(list(pathlib.Path('.').glob('*')))"
[PosixPath('1.txt'), PosixPath('2.txt'), PosixPath('3.txt'), PosixPath('subdir')]

So far so good, but if the subdirectory isn't readable, things fall apart:

tmp>chmod 000 subdir
tmp>python3.8 -c "import pathlib; print(list(pathlib.Path('.').glob('*')))"
[PosixPath('1.txt')]

Looks like this is caused by entry.is_dir() in pathlib._WildcardSelector raising a PermissionError when trying to check if a symlink pointing into an unreadable directory is or isn't a directory.  EACCESS isn't in IGNORED_ERROS (sic) and so the loop over directory entries is broken out of, and the "except PermissionError:" block in _select_from swallows the exception so that the failure is silent.
msg363552 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-03-06 21:01
Ok, I managed to reproduce. 

This seems a regression introduced by https://github.com/python/cpython/pull/11988 in issue https://bugs.python.org/issue36035.
msg363608 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-03-07 17:53
New changeset eb7560a73d46800e4ade4a8869139b48e6c92811 by Pablo Galindo in branch 'master':
bpo-38894: Fix pathlib.Path.glob in the presence of symlinks and insufficient permissions (GH-18815)
https://github.com/python/cpython/commit/eb7560a73d46800e4ade4a8869139b48e6c92811
msg363609 - (view) Author: miss-islington (miss-islington) Date: 2020-03-07 18:10
New changeset cca0b31fb8ed7d25ede68f314d4a85bb07d6ca6f by Miss Islington (bot) in branch '3.7':
bpo-38894: Fix pathlib.Path.glob in the presence of symlinks and insufficient permissions (GH-18815)
https://github.com/python/cpython/commit/cca0b31fb8ed7d25ede68f314d4a85bb07d6ca6f
msg363610 - (view) Author: miss-islington (miss-islington) Date: 2020-03-07 18:11
New changeset 928b4dd0edf0022190a8a296c8ea65e7ef55c694 by Miss Islington (bot) in branch '3.8':
bpo-38894: Fix pathlib.Path.glob in the presence of symlinks and insufficient permissions (GH-18815)
https://github.com/python/cpython/commit/928b4dd0edf0022190a8a296c8ea65e7ef55c694
History
Date User Action Args
2020-03-07 18:12:08pablogsalsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-03-07 18:11:28miss-islingtonsetmessages: + msg363610
2020-03-07 18:10:13miss-islingtonsetmessages: + msg363609
2020-03-07 17:53:40miss-islingtonsetpull_requests: + pull_request18188
2020-03-07 17:53:33miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request18187
2020-03-07 17:53:23pablogsalsetmessages: + msg363608
2020-03-06 21:08:05pablogsalsetkeywords: + patch
stage: patch review
pull_requests: + pull_request18173
2020-03-06 21:01:22pablogsalsetmessages: + msg363552
2020-03-06 21:00:29pablogsalsetmessages: - msg363548
2020-03-06 20:37:44pablogsalsetkeywords: + 3.7regression, 3.8regression

messages: + msg363548
versions: + Python 3.7, Python 3.9
2020-03-06 20:07:40pablogsalsetassignee: pablogsal

nosy: + pablogsal
2020-03-06 20:00:35Matt Wozniskisetnosy: + Matt Wozniski

messages: + msg363537
versions: + Python 3.8, - Python 3.7
2019-11-22 14:19:22thierry.parmentelatsetmessages: + msg357288
2019-11-22 14:06:07thierry.parmentelatcreate