Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pathlib.Path().rglob() is fooled by symlink loops #70200

Closed
gvanrossum opened this issue Jan 5, 2016 · 14 comments
Closed

pathlib.Path().rglob() is fooled by symlink loops #70200

gvanrossum opened this issue Jan 5, 2016 · 14 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@gvanrossum
Copy link
Member

BPO 26012
Nosy @gvanrossum, @pitrou, @vadmium, @serhiy-storchaka, @MojoVampire

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/gvanrossum'
closed_at = <Date 2016-01-07.18:59:52.608>
created_at = <Date 2016-01-05.02:13:51.418>
labels = ['type-bug', 'library']
title = 'pathlib.Path().rglob() is fooled by symlink loops'
updated_at = <Date 2016-01-07.23:11:09.234>
user = 'https://github.com/gvanrossum'

bugs.python.org fields:

activity = <Date 2016-01-07.23:11:09.234>
actor = 'gvanrossum'
assignee = 'gvanrossum'
closed = True
closed_date = <Date 2016-01-07.18:59:52.608>
closer = 'gvanrossum'
components = ['Library (Lib)']
creation = <Date 2016-01-05.02:13:51.418>
creator = 'gvanrossum'
dependencies = []
files = []
hgrepos = []
issue_num = 26012
keywords = []
message_count = 14.0
messages = ['257511', '257516', '257519', '257522', '257620', '257628', '257683', '257699', '257708', '257711', '257712', '257717', '257718', '257726']
nosy_count = 6.0
nosy_names = ['gvanrossum', 'pitrou', 'python-dev', 'martin.panter', 'serhiy.storchaka', 'josh.r']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue26012'
versions = ['Python 3.4', 'Python 3.5', 'Python 3.6']

@gvanrossum
Copy link
Member Author

I created a symlink loop as follows:

mkdir tmp
cd tmp
ln -s ../tmp baz
cd ..

Then I tried to list it recursively using rglob():

>> list(pathlib.Path('tmp').rglob('**/*')

This caused an infinite regress:
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/guido/src/cpython36/Lib/pathlib.py", line 1065, in rglob
    for p in selector.select_from(self):
  File "/Users/guido/src/cpython36/Lib/pathlib.py", line 549, in _select_from
    for p in successor_select(starting_point, is_dir, exists, listdir):
  File "/Users/guido/src/cpython36/Lib/pathlib.py", line 548, in _select_from
    for starting_point in self._iterate_directories(parent_path, is_dir, listdir):
  File "/Users/guido/src/cpython36/Lib/pathlib.py", line 538, in _iterate_directories
    for p in self._iterate_directories(path, is_dir, listdir):
[...]
  File "/Users/guido/src/cpython36/Lib/pathlib.py", line 538, in _iterate_directories
    for p in self._iterate_directories(path, is_dir, listdir):
  File "/Users/guido/src/cpython36/Lib/pathlib.py", line 537, in _iterate_directories
    if is_dir(path):
  File "/Users/guido/src/cpython36/Lib/pathlib.py", line 1303, in is_dir
    return S_ISDIR(self.stat().st_mode)
  File "/Users/guido/src/cpython36/Lib/pathlib.py", line 1111, in stat
    return self._accessor.stat(self)
  File "/Users/guido/src/cpython36/Lib/pathlib.py", line 371, in wrapped
    return strfunc(str(pathobj), *args)
OSError: [Errno 62] Too many levels of symbolic links: 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz'

@gvanrossum gvanrossum added the stdlib Python modules in the Lib dir label Jan 5, 2016
@MojoVampire
Copy link
Mannequin

MojoVampire mannequin commented Jan 5, 2016

So would the goal of a fix be something like dev/inode memoization to prevent traversing the same link twice? To provide an argument to prevent following symlinks at all (as in stuff like os.walk), possibly defaulting to followlinks=False? It looks like languages like Ruby never follow symlinks when performing recursive globs to avoid this issue ( https://stackoverflow.com/questions/357754/can-i-traverse-symlinked-directories-in-ruby-with-a-glob ). It's probably easiest to just block recursing through symlinks (at least by default); memoizing risks unbounded memory overhead for large directory trees.

@serhiy-storchaka
Copy link
Member

glob.glob() just stops too deep recursion.

>>> import glob
>>> glob.glob('tmp/**/*')
['tmp/baz/baz']
>>> glob.glob('tmp/**/*', recursive=True)
['tmp/baz', 'tmp/baz/baz', 'tmp/baz/baz/baz', 'tmp/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz']

@gvanrossum
Copy link
Member Author

I agree it's easiest just not to traverse symlinks matching **. glob.py
avoids the errors (but not the senseless recursion) by simply ignoring
OSError coming out of listdir(). That might be a good idea anyways (it
might even be a simpler way to avoid the PermissionError reported in
bpo-24120).

@gvanrossum
Copy link
Member Author

I'm going to fix this by skipping all symlinks in _RecursiveWildcardSelector.

@gvanrossum gvanrossum self-assigned this Jan 6, 2016
@python-dev
Copy link
Mannequin

python-dev mannequin commented Jan 6, 2016

New changeset 18f5b125a863 by Guido van Rossum in branch '3.4':
Issue bpo-26012: Don't traverse into symlinks for ** pattern in pathlib.Path.[r]glob().
https://hg.python.org/cpython/rev/18f5b125a863

New changeset 9826dbad1252 by Guido van Rossum in branch '3.5':
Issue bpo-26012: Don't traverse into symlinks for ** pattern in pathlib.Path.[r]glob(). (Merge 3.4->3.5)
https://hg.python.org/cpython/rev/9826dbad1252

New changeset 36864abbfe02 by Guido van Rossum in branch 'default':
Issue bpo-26012: Don't traverse into symlinks for ** pattern in pathlib.Path.[r]glob(). (Merge 3.5->3.6)
https://hg.python.org/cpython/rev/36864abbfe02

@gvanrossum gvanrossum added the type-bug An unexpected behavior, bug, or error label Jan 6, 2016
@vadmium
Copy link
Member

vadmium commented Jan 7, 2016

This seems to have broken test_pathlib, both on my computer and some Linux and BSD buildbots:

======================================================================
FAIL: test_rglob_symlink_loop (test.test_pathlib.PathTest)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "/media/disk/home/proj/python/cpython/Lib/test/test_pathlib.py", line 1455, in test_rglob_symlink_loop
    self.assertEqual(given, {p / x for x in expect})
AssertionError: Items in the second set but not the first:
PosixPath('/media/disk/home/proj/python/cpython/build/test_python_27756/@test_27756_tmp/dirA/linkC')
PosixPath('/media/disk/home/proj/python/cpython/build/test_python_27756/@test_27756_tmp/dirB/fileB')
PosixPath('/media/disk/home/proj/python/cpython/build/test_python_27756/@test_27756_tmp/dirB/linkD')

======================================================================
FAIL: test_rglob_symlink_loop (test.test_pathlib.PosixPathTest)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "/media/disk/home/proj/python/cpython/Lib/test/test_pathlib.py", line 1455, in test_rglob_symlink_loop
    self.assertEqual(given, {p / x for x in expect})
AssertionError: Items in the second set but not the first:
PosixPath('/media/disk/home/proj/python/cpython/build/test_python_27756/@test_27756_tmp/dirA/linkC')
PosixPath('/media/disk/home/proj/python/cpython/build/test_python_27756/@test_27756_tmp/dirB/fileB')
PosixPath('/media/disk/home/proj/python/cpython/build/test_python_27756/@test_27756_tmp/dirB/linkD')

@vadmium vadmium reopened this Jan 7, 2016
@gvanrossum
Copy link
Member Author

Oops, sorry about that. I will see if I can figure out how to repro this -- on my own Mac it succeeds. In the mean time if you could have a look yourself (since you can repro it on your own computer) I'd be grateful. Initial hunches: maybe a case-insensitivity crept into the test?

@gvanrossum
Copy link
Member Author

Looks like the root cause of the breakage is actually that issue bpo-24120 isn't quite fixed yet. I'm now able to repro on a Linux VM.

@python-dev
Copy link
Mannequin

python-dev mannequin commented Jan 7, 2016

New changeset 4043e08e6e52 by Guido van Rossum in branch '3.4':
Add another try/except PermissionError to avoid depending on listdir order. Fix issues bpo-24120 and bpo-26012.
https://hg.python.org/cpython/rev/4043e08e6e52

New changeset 8a3b0c1fb3d3 by Guido van Rossum in branch '3.5':
Add another try/except PermissionError to avoid depending on listdir order. Fix issues bpo-24120 and bpo-26012. (Merge 3.4->3.5)
https://hg.python.org/cpython/rev/8a3b0c1fb3d3

New changeset 398cb8c183da by Guido van Rossum in branch 'default':
Add another try/except PermissionError to avoid depending on listdir order. Fix issues bpo-24120 and bpo-26012. (Merge 3.5->3.6)
https://hg.python.org/cpython/rev/398cb8c183da

@gvanrossum
Copy link
Member Author

Should be fixed for real now.

@vadmium
Copy link
Member

vadmium commented Jan 7, 2016

Thanks, your latest change seems to have fixed it on my Linux computer, and most of the buildbots. However now there is a failure on <http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x\>:

======================================================================
FAIL: test_rglob_common (test.test_pathlib.PathTest)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_pathlib.py", line 1460, in test_rglob_common
    "linkB/fileB", "dirA/linkC/fileB"])
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_pathlib.py", line 1451, in _check
    self.assertEqual(set(glob), { P(BASE, q) for q in expected })
AssertionError: Items in the second set but not the first:
WindowsPath('C:/buildbot.python.org/3.x.kloth-win64/build/build/test_python_1700/@test_1700_tmp/linkB/fileB')
WindowsPath('C:/buildbot.python.org/3.x.kloth-win64/build/build/test_python_1700/@test_1700_tmp/dirB/linkD/fileB')
WindowsPath('C:/buildbot.python.org/3.x.kloth-win64/build/build/test_python_1700/@test_1700_tmp/dirA/linkC/fileB')

======================================================================
FAIL: test_rglob_common (test.test_pathlib.WindowsPathTest)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_pathlib.py", line 1460, in test_rglob_common
    "linkB/fileB", "dirA/linkC/fileB"])
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_pathlib.py", line 1451, in _check
    self.assertEqual(set(glob), { P(BASE, q) for q in expected })
AssertionError: Items in the second set but not the first:
WindowsPath('C:/buildbot.python.org/3.x.kloth-win64/build/build/test_python_1700/@test_1700_tmp/linkB/fileB')
WindowsPath('C:/buildbot.python.org/3.x.kloth-win64/build/build/test_python_1700/@test_1700_tmp/dirB/linkD/fileB')
WindowsPath('C:/buildbot.python.org/3.x.kloth-win64/build/build/test_python_1700/@test_1700_tmp/dirA/linkC/fileB')

I don’t have a Windows setup so I can’t really help much with this one.

@gvanrossum
Copy link
Member Author

I think I understand the Windows failure. I uncommented some tests that were previously broken due to the symlink loop and/or PermissionError, but one of these has a different expected outcome if symlinks don't work. I've pushed a hopeful fix for that (no Windows box here either, but the buildbots are good for this :-).

@gvanrossum
Copy link
Member Author

Looks like that Windows buildbot is now green. Of the stable bots only OpenIndiana is red, and it's unrelated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants