classification
Title: pathlib.Path().rglob() is fooled by symlink loops
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.6, Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gvanrossum Nosy List: gvanrossum, josh.r, martin.panter, pitrou, python-dev, serhiy.storchaka
Priority: normal Keywords:

Created on 2016-01-05 02:13 by gvanrossum, last changed 2016-01-07 23:11 by gvanrossum. This issue is now closed.

Messages (14)
msg257511 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-01-05 02:13
I created a symlink loop as follows:

mkdir tmp
cd tmp
ln -s ../tmp baz
cd ..

Then I tried to list it recursively using rglob():

>>> list(pathlib.Path('tmp').rglob('**/*')

This caused an infinite regress:
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/guido/src/cpython36/Lib/pathlib.py", line 1065, in rglob
    for p in selector.select_from(self):
  File "/Users/guido/src/cpython36/Lib/pathlib.py", line 549, in _select_from
    for p in successor_select(starting_point, is_dir, exists, listdir):
  File "/Users/guido/src/cpython36/Lib/pathlib.py", line 548, in _select_from
    for starting_point in self._iterate_directories(parent_path, is_dir, listdir):
  File "/Users/guido/src/cpython36/Lib/pathlib.py", line 538, in _iterate_directories
    for p in self._iterate_directories(path, is_dir, listdir):
[...]
  File "/Users/guido/src/cpython36/Lib/pathlib.py", line 538, in _iterate_directories
    for p in self._iterate_directories(path, is_dir, listdir):
  File "/Users/guido/src/cpython36/Lib/pathlib.py", line 537, in _iterate_directories
    if is_dir(path):
  File "/Users/guido/src/cpython36/Lib/pathlib.py", line 1303, in is_dir
    return S_ISDIR(self.stat().st_mode)
  File "/Users/guido/src/cpython36/Lib/pathlib.py", line 1111, in stat
    return self._accessor.stat(self)
  File "/Users/guido/src/cpython36/Lib/pathlib.py", line 371, in wrapped
    return strfunc(str(pathobj), *args)
OSError: [Errno 62] Too many levels of symbolic links: 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz'
msg257516 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2016-01-05 05:25
So would the goal of a fix be something like dev/inode memoization to prevent traversing the same link twice? To provide an argument to prevent following symlinks at all (as in stuff like os.walk), possibly defaulting to followlinks=False? It looks like languages like Ruby never follow symlinks when performing recursive globs to avoid this issue ( https://stackoverflow.com/questions/357754/can-i-traverse-symlinked-directories-in-ruby-with-a-glob ). It's probably easiest to just block recursing through symlinks (at least by default); memoizing risks unbounded memory overhead for large directory trees.
msg257519 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-01-05 06:21
glob.glob() just stops too deep recursion.

>>> import glob
>>> glob.glob('tmp/**/*')
['tmp/baz/baz']
>>> glob.glob('tmp/**/*', recursive=True)
['tmp/baz', 'tmp/baz/baz', 'tmp/baz/baz/baz', 'tmp/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz']
msg257522 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-01-05 07:03
I agree it's easiest just not to traverse symlinks matching **. glob.py
avoids the errors (but not the senseless recursion) by simply ignoring
OSError coming out of listdir(). That might be a good idea anyways (it
might even be a simpler way to avoid the PermissionError reported in
#24120).
msg257620 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-01-06 17:45
I'm going to fix this by skipping all symlinks in _RecursiveWildcardSelector.
msg257628 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-01-06 18:36
New changeset 18f5b125a863 by Guido van Rossum in branch '3.4':
Issue #26012: Don't traverse into symlinks for ** pattern in pathlib.Path.[r]glob().
https://hg.python.org/cpython/rev/18f5b125a863

New changeset 9826dbad1252 by Guido van Rossum in branch '3.5':
Issue #26012: Don't traverse into symlinks for ** pattern in pathlib.Path.[r]glob(). (Merge 3.4->3.5)
https://hg.python.org/cpython/rev/9826dbad1252

New changeset 36864abbfe02 by Guido van Rossum in branch 'default':
Issue #26012: Don't traverse into symlinks for ** pattern in pathlib.Path.[r]glob(). (Merge 3.5->3.6)
https://hg.python.org/cpython/rev/36864abbfe02
msg257683 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-07 11:43
This seems to have broken test_pathlib, both on my computer and some Linux and BSD buildbots:

======================================================================
FAIL: test_rglob_symlink_loop (test.test_pathlib.PathTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/media/disk/home/proj/python/cpython/Lib/test/test_pathlib.py", line 1455, in test_rglob_symlink_loop
    self.assertEqual(given, {p / x for x in expect})
AssertionError: Items in the second set but not the first:
PosixPath('/media/disk/home/proj/python/cpython/build/test_python_27756/@test_27756_tmp/dirA/linkC')
PosixPath('/media/disk/home/proj/python/cpython/build/test_python_27756/@test_27756_tmp/dirB/fileB')
PosixPath('/media/disk/home/proj/python/cpython/build/test_python_27756/@test_27756_tmp/dirB/linkD')

======================================================================
FAIL: test_rglob_symlink_loop (test.test_pathlib.PosixPathTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/media/disk/home/proj/python/cpython/Lib/test/test_pathlib.py", line 1455, in test_rglob_symlink_loop
    self.assertEqual(given, {p / x for x in expect})
AssertionError: Items in the second set but not the first:
PosixPath('/media/disk/home/proj/python/cpython/build/test_python_27756/@test_27756_tmp/dirA/linkC')
PosixPath('/media/disk/home/proj/python/cpython/build/test_python_27756/@test_27756_tmp/dirB/fileB')
PosixPath('/media/disk/home/proj/python/cpython/build/test_python_27756/@test_27756_tmp/dirB/linkD')
msg257699 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-01-07 16:44
Oops, sorry about that. I will see if I can figure out how to repro this -- on my own Mac it succeeds. In the mean time if you could have a look yourself (since you can repro it on your own computer) I'd be grateful. Initial hunches: maybe a case-insensitivity crept into the test?
msg257708 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-01-07 18:44
Looks like the root cause of the breakage is actually that issue #24120 isn't quite fixed yet. I'm now able to repro on a Linux VM.
msg257711 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-01-07 18:58
New changeset 4043e08e6e52 by Guido van Rossum in branch '3.4':
Add another try/except PermissionError to avoid depending on listdir order. Fix issues #24120 and #26012.
https://hg.python.org/cpython/rev/4043e08e6e52

New changeset 8a3b0c1fb3d3 by Guido van Rossum in branch '3.5':
Add another try/except PermissionError to avoid depending on listdir order. Fix issues #24120 and #26012. (Merge 3.4->3.5)
https://hg.python.org/cpython/rev/8a3b0c1fb3d3

New changeset 398cb8c183da by Guido van Rossum in branch 'default':
Add another try/except PermissionError to avoid depending on listdir order. Fix issues #24120 and #26012. (Merge 3.5->3.6)
https://hg.python.org/cpython/rev/398cb8c183da
msg257712 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-01-07 18:59
Should be fixed for real now.
msg257717 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-07 21:15
Thanks, your latest change seems to have fixed it on my Linux computer, and most of the buildbots. However now there is a failure on <http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x>:

======================================================================
FAIL: test_rglob_common (test.test_pathlib.PathTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_pathlib.py", line 1460, in test_rglob_common
    "linkB/fileB", "dirA/linkC/fileB"])
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_pathlib.py", line 1451, in _check
    self.assertEqual(set(glob), { P(BASE, q) for q in expected })
AssertionError: Items in the second set but not the first:
WindowsPath('C:/buildbot.python.org/3.x.kloth-win64/build/build/test_python_1700/@test_1700_tmp/linkB/fileB')
WindowsPath('C:/buildbot.python.org/3.x.kloth-win64/build/build/test_python_1700/@test_1700_tmp/dirB/linkD/fileB')
WindowsPath('C:/buildbot.python.org/3.x.kloth-win64/build/build/test_python_1700/@test_1700_tmp/dirA/linkC/fileB')

======================================================================
FAIL: test_rglob_common (test.test_pathlib.WindowsPathTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_pathlib.py", line 1460, in test_rglob_common
    "linkB/fileB", "dirA/linkC/fileB"])
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_pathlib.py", line 1451, in _check
    self.assertEqual(set(glob), { P(BASE, q) for q in expected })
AssertionError: Items in the second set but not the first:
WindowsPath('C:/buildbot.python.org/3.x.kloth-win64/build/build/test_python_1700/@test_1700_tmp/linkB/fileB')
WindowsPath('C:/buildbot.python.org/3.x.kloth-win64/build/build/test_python_1700/@test_1700_tmp/dirB/linkD/fileB')
WindowsPath('C:/buildbot.python.org/3.x.kloth-win64/build/build/test_python_1700/@test_1700_tmp/dirA/linkC/fileB')

I don’t have a Windows setup so I can’t really help much with this one.
msg257718 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-01-07 21:30
I think I understand the Windows failure. I uncommented some tests that were previously broken due to the symlink loop and/or PermissionError, but one of these has a different expected outcome if symlinks don't work. I've pushed a hopeful fix for that (no Windows box here either, but the buildbots are good for this :-).
msg257726 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-01-07 23:11
Looks like that Windows buildbot is now green. Of the stable bots only OpenIndiana is red, and it's unrelated.
History
Date User Action Args
2016-01-07 23:11:09gvanrossumsetmessages: + msg257726
2016-01-07 21:30:15gvanrossumsetmessages: + msg257718
2016-01-07 21:15:54martin.pantersetmessages: + msg257717
2016-01-07 18:59:52gvanrossumsetstatus: open -> closed

messages: + msg257712
2016-01-07 18:58:59python-devsetmessages: + msg257711
2016-01-07 18:44:49gvanrossumsetmessages: + msg257708
2016-01-07 16:44:14gvanrossumsetmessages: + msg257699
2016-01-07 11:43:31martin.pantersetstatus: closed -> open
nosy: + martin.panter
messages: + msg257683

2016-01-06 18:37:14gvanrossumsetstatus: open -> closed
type: behavior
resolution: fixed
stage: resolved
2016-01-06 18:36:40python-devsetnosy: + python-dev
messages: + msg257628
2016-01-06 17:45:20gvanrossumsetassignee: gvanrossum
messages: + msg257620
2016-01-05 07:03:20gvanrossumsetmessages: + msg257522
2016-01-05 06:21:55serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg257519
2016-01-05 05:25:17josh.rsetnosy: + josh.r
messages: + msg257516
2016-01-05 02:39:44ned.deilysetnosy: + pitrou
2016-01-05 02:14:14gvanrossumsetcomponents: + Library (Lib)
2016-01-05 02:13:51gvanrossumcreate