classification
Title: pathlib.Path().rglob() breaks with broken symlinks
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Jörg Stucke, brett.cannon, eamanu, eryksun, matrixise, miss-islington, pitrou
Priority: normal Keywords: patch

Created on 2019-02-19 12:09 by Jörg Stucke, last changed 2019-05-21 19:05 by pitrou. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 11964 closed eamanu, 2019-02-21 00:50
PR 11988 merged Jörg Stucke, 2019-02-22 13:29
PR 13468 closed miss-islington, 2019-05-21 17:44
PR 13469 merged miss-islington, 2019-05-21 18:31
Messages (12)
msg335937 - (view) Author: Jörg Stucke (Jörg Stucke) * Date: 2019-02-19 12:09
When using rglob() to iterate over the files of a directory containing a broken symlink (a link pointing to itself) rglob breaks with "[Errno 40] Too many levels of symbolic links" (OS: Linux).

Steps to reproduce:

mkdir tmp
touch foo
ls -s foo tmp/foo
cd tmp
file foo
foo: broken symbolic link to foo

python3
>>> from pathlib import Path
>>> for f in Path().rglob("*"):
        print(x)

foo
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/pathlib.py", line 1105, in rglob
    for p in selector.select_from(self):
  File "/usr/local/lib/python3.8/pathlib.py", line 552, in _select_from
    for starting_point in self._iterate_directories(parent_path, is_dir, scandir):
  File "/usr/local/lib/python3.8/pathlib.py", line 536, in _iterate_directories
    entry_is_dir = entry.is_dir()
OSError: [Errno 40] Too many levels of symbolic links: './foo'


What is more, stat(), is_dir(), is_file() and exists() also do not like those broken links:
>>> Path("foo").is_file()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/pathlib.py", line 1361, in is_file
    return S_ISREG(self.stat().st_mode)
  File "/usr/local/lib/python3.8/pathlib.py", line 1151, in stat
    return self._accessor.stat(self)
OSError: [Errno 40] Too many levels of symbolic links: 'foo'


Is this intended behaviour or is this a bug? I guess it's not intended, since it makes it impossible to iterate over such a directory with rglob(). I could not find anything similar in the bug tracker, but https://bugs.python.org/issue26012 seems to be related.

Tested with Python 3.8.0a1, 3.6.7 and 3.5.2 (OS: Linux Mint 19)
msg335940 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2019-02-19 12:23
I confirm this issue with python 3.7

but your script is wrong (you declare f and use x in your script)

/tmp$ mkdir demo
/tmp$ cd demo/
/t/demo$ mkdir tmp                                                                                                                                               /t/demo$ touch foo                                                                                                                                               /t/demo$ ln -s foo tmp/foo                                                                                                                                       /t/demo$ cd tmp/                                                                                                                                                 /t/d/tmp$ file foo                                                                                                                                               
foo: broken symbolic link to foo

/t/d/tmp$ python3                                                                                                                                                mar 19 fév 2019 13:20:15 CET
Python 3.7.2 (default, Jan 16 2019, 19:49:22) 
[GCC 8.2.1 20181215 (Red Hat 8.2.1-6)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pathlib import Path
>>> for p in Path().rglob('*'):
...     print(p)
... 
foo
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.7/pathlib.py", line 1105, in rglob
    for p in selector.select_from(self):
  File "/usr/lib64/python3.7/pathlib.py", line 552, in _select_from
    for starting_point in self._iterate_directories(parent_path, is_dir, scandir):
  File "/usr/lib64/python3.7/pathlib.py", line 536, in _iterate_directories
    entry_is_dir = entry.is_dir()
OSError: [Errno 40] Too many levels of symbolic links: './foo'
>>>
msg335966 - (view) Author: Jörg Stucke (Jörg Stucke) * Date: 2019-02-19 16:22
A possible solution for python 3.7+ could be to add "ELOOP" to _IGNORED_ERROS in pathlib but I'm not exactly sure of the side effects.
msg336095 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2019-02-20 14:49
3.5 is in security mode, we can remove 3.5 from the list for this issue.
msg336150 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2019-02-20 20:25
I consider this an enhancement since you do have a loop in your symlinks and so having it not exactly work isn't totally shocking. But that doesn't mean that if someone can come up with a reasonable solution to fixing this annoyance it woudn't be accepted or appreciated!
msg336151 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2019-02-20 20:26
And I don't know what a good solution would be. :) I.e. should some other exception be raised? Should it be ignored? I just don't know personally.
msg336181 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-02-21 04:01
In Windows, the error for a path reparse (e.g. symlink or junction) that can't be resolved is ERROR_CANT_RESOLVE_FILENAME. Another common error is ERROR_INVALID_REPARSE_DATA. This can occur if the reparse data is malformed or if the target device is invalid for the reparse type. For example, a junction is restricted to [bind-]mounting local volumes, so it's invalid if it targets a remote device.

Windows errors to ignore should be added to _IGNORED_WINERRORS in Lib/pathlib.py. For example:

    _IGNORED_WINERRORS = (
        # ...
        1921, # ERROR_CANT_RESOLVE_FILENAME - similar to POSIX ELOOP
        4392, # ERROR_INVALID_REPARSE_DATA  - also for disallowed device targets
    )
msg336211 - (view) Author: Emmanuel Arias (eamanu) * Date: 2019-02-21 12:01
Hello! 

I make the PR: https://github.com/python/cpython/pull/11964 

But I need help to test it :-(

Any could help me please?
msg336306 - (view) Author: Jörg Stucke (Jörg Stucke) * Date: 2019-02-22 13:41
I tried to add a test file in https://github.com/python/cpython/pull/11988 
To fix all now broken tests I had to add a try except block to the _WildcardSelector as well (analogous to the _RecursiveWildcardSelector).

I could only check on Linux and I have no idea how it behaves on any other OS.
msg336315 - (view) Author: Jörg Stucke (Jörg Stucke) * Date: 2019-02-22 14:40
As expected the Windows CI build failed.
All test fails were caused by:
OSError: [WinError 1921] The name of the file cannot be resolved by the system: 'C:\\projects\\cpython\\build\\test_python_936\\@test_936_tmp\\brokenLinkLoop'

Therefore, I added WinError 1921 to _IGNORED_WINERRORS as suggested by Eryk Sun.
msg343075 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2019-05-21 17:44
New changeset d5c120f7eb6f2a9cdab282a5d588afed307a23df by Antoine Pitrou (Jörg Stucke) in branch 'master':
bpo-36035: fix Path.rglob for broken links (GH-11988)
https://github.com/python/cpython/commit/d5c120f7eb6f2a9cdab282a5d588afed307a23df
msg343082 - (view) Author: miss-islington (miss-islington) Date: 2019-05-21 19:05
New changeset aea49b18752880e5d0260f16ca7ff2c6dce78515 by Miss Islington (bot) in branch '3.7':
[3.7] bpo-36035: fix Path.rglob for broken links (GH-11988) (GH-13469)
https://github.com/python/cpython/commit/aea49b18752880e5d0260f16ca7ff2c6dce78515
History
Date User Action Args
2019-05-21 19:05:33pitrousetstatus: open -> closed
stage: patch review -> resolved
resolution: fixed
versions: - Python 3.6
2019-05-21 19:05:12miss-islingtonsetnosy: + miss-islington
messages: + msg343082
2019-05-21 18:31:20miss-islingtonsetpull_requests: + pull_request13380
2019-05-21 17:44:56pitrousetmessages: + msg343075
2019-05-21 17:44:52miss-islingtonsetpull_requests: + pull_request13379
2019-02-22 14:40:41Jörg Stuckesetmessages: + msg336315
2019-02-22 13:41:10Jörg Stuckesetmessages: + msg336306
2019-02-22 13:29:39Jörg Stuckesetpull_requests: + pull_request12011
2019-02-21 12:01:43eamanusetnosy: + eamanu
messages: + msg336211
2019-02-21 04:01:04eryksunsetnosy: + eryksun
messages: + msg336181
2019-02-21 00:50:34eamanusetkeywords: + patch
stage: patch review
pull_requests: + pull_request11990
2019-02-20 20:26:03brett.cannonsetmessages: + msg336151
2019-02-20 20:25:03brett.cannonsettype: behavior -> enhancement

messages: + msg336150
nosy: + brett.cannon
2019-02-20 14:49:42matrixisesetmessages: + msg336095
versions: - Python 3.5
2019-02-19 16:22:17Jörg Stuckesetmessages: + msg335966
2019-02-19 12:23:33matrixisesetnosy: + pitrou
2019-02-19 12:23:22matrixisesetnosy: + matrixise

messages: + msg335940
versions: + Python 3.7
2019-02-19 12:09:57Jörg Stuckecreate