Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pathlib.Path().rglob() breaks with broken symlinks #80216

Closed
jstucke mannequin opened this issue Feb 19, 2019 · 12 comments
Closed

pathlib.Path().rglob() breaks with broken symlinks #80216

jstucke mannequin opened this issue Feb 19, 2019 · 12 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@jstucke
Copy link
Mannequin

jstucke mannequin commented Feb 19, 2019

BPO 36035
Nosy @brettcannon, @pitrou, @eryksun, @matrixise, @eamanu, @miss-islington, @jstucke
PRs
  • bpo-36035: pathlib.Path().rglob() breaks with broken symlinks #11964
  • bpo-36035: fix rglob for broken links #11988
  • [3.7] bpo-36035: fix Path.rglob for broken links (GH-11988) #13468
  • [3.7] bpo-36035: fix Path.rglob for broken links (GH-11988) #13469
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-05-21.19:05:33.349>
    created_at = <Date 2019-02-19.12:09:56.999>
    labels = ['3.7', '3.8', 'type-feature', 'library']
    title = 'pathlib.Path().rglob() breaks with broken symlinks'
    updated_at = <Date 2019-05-21.19:05:33.348>
    user = 'https://github.com/jstucke'

    bugs.python.org fields:

    activity = <Date 2019-05-21.19:05:33.348>
    actor = 'pitrou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-05-21.19:05:33.349>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2019-02-19.12:09:56.999>
    creator = 'J\xc3\xb6rg Stucke'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 36035
    keywords = ['patch']
    message_count = 12.0
    messages = ['335937', '335940', '335966', '336095', '336150', '336151', '336181', '336211', '336306', '336315', '343075', '343082']
    nosy_count = 7.0
    nosy_names = ['brett.cannon', 'pitrou', 'eryksun', 'matrixise', 'eamanu', 'miss-islington', 'J\xc3\xb6rg Stucke']
    pr_nums = ['11964', '11988', '13468', '13469']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue36035'
    versions = ['Python 3.7', 'Python 3.8']

    @jstucke
    Copy link
    Mannequin Author

    jstucke mannequin commented Feb 19, 2019

    When using rglob() to iterate over the files of a directory containing a broken symlink (a link pointing to itself) rglob breaks with "[Errno 40] Too many levels of symbolic links" (OS: Linux).

    Steps to reproduce:

    mkdir tmp
    touch foo
    ls -s foo tmp/foo
    cd tmp
    file foo
    foo: broken symbolic link to foo

    python3
    >>> from pathlib import Path
    >>> for f in Path().rglob("*"):
            print(x)
    
    foo
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python3.8/pathlib.py", line 1105, in rglob
        for p in selector.select_from(self):
      File "/usr/local/lib/python3.8/pathlib.py", line 552, in _select_from
        for starting_point in self._iterate_directories(parent_path, is_dir, scandir):
      File "/usr/local/lib/python3.8/pathlib.py", line 536, in _iterate_directories
        entry_is_dir = entry.is_dir()
    OSError: [Errno 40] Too many levels of symbolic links: './foo'
    
    
    What is more, stat(), is_dir(), is_file() and exists() also do not like those broken links:
    >>> Path("foo").is_file()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python3.8/pathlib.py", line 1361, in is_file
        return S_ISREG(self.stat().st_mode)
      File "/usr/local/lib/python3.8/pathlib.py", line 1151, in stat
        return self._accessor.stat(self)
    OSError: [Errno 40] Too many levels of symbolic links: 'foo'

    Is this intended behaviour or is this a bug? I guess it's not intended, since it makes it impossible to iterate over such a directory with rglob(). I could not find anything similar in the bug tracker, but https://bugs.python.org/issue26012 seems to be related.

    Tested with Python 3.8.0a1, 3.6.7 and 3.5.2 (OS: Linux Mint 19)

    @jstucke jstucke mannequin added type-bug An unexpected behavior, bug, or error 3.8 only security fixes stdlib Python modules in the Lib dir labels Feb 19, 2019
    @matrixise
    Copy link
    Member

    I confirm this issue with python 3.7

    but your script is wrong (you declare f and use x in your script)

    /tmp$ mkdir demo
    /tmp$ cd demo/
    /t/demo$ mkdir tmp /t/demo$ touch foo /t/demo$ ln -s foo tmp/foo /t/demo$ cd tmp/ /t/d/tmp$ file foo
    foo: broken symbolic link to foo

    /t/d/tmp$ python3                                                                                                                                                mar 19 fév 2019 13:20:15 CET
    Python 3.7.2 (default, Jan 16 2019, 19:49:22) 
    [GCC 8.2.1 20181215 (Red Hat 8.2.1-6)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from pathlib import Path
    >>> for p in Path().rglob('*'):
    ...     print(p)
    ... 
    foo
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib64/python3.7/pathlib.py", line 1105, in rglob
        for p in selector.select_from(self):
      File "/usr/lib64/python3.7/pathlib.py", line 552, in _select_from
        for starting_point in self._iterate_directories(parent_path, is_dir, scandir):
      File "/usr/lib64/python3.7/pathlib.py", line 536, in _iterate_directories
        entry_is_dir = entry.is_dir()
    OSError: [Errno 40] Too many levels of symbolic links: './foo'
    >>>

    @matrixise matrixise added the 3.7 (EOL) end of life label Feb 19, 2019
    @jstucke
    Copy link
    Mannequin Author

    jstucke mannequin commented Feb 19, 2019

    A possible solution for python 3.7+ could be to add "ELOOP" to _IGNORED_ERROS in pathlib but I'm not exactly sure of the side effects.

    @matrixise
    Copy link
    Member

    3.5 is in security mode, we can remove 3.5 from the list for this issue.

    @brettcannon
    Copy link
    Member

    I consider this an enhancement since you do have a loop in your symlinks and so having it not exactly work isn't totally shocking. But that doesn't mean that if someone can come up with a reasonable solution to fixing this annoyance it woudn't be accepted or appreciated!

    @brettcannon brettcannon added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Feb 20, 2019
    @brettcannon
    Copy link
    Member

    And I don't know what a good solution would be. :) I.e. should some other exception be raised? Should it be ignored? I just don't know personally.

    @eryksun
    Copy link
    Contributor

    eryksun commented Feb 21, 2019

    In Windows, the error for a path reparse (e.g. symlink or junction) that can't be resolved is ERROR_CANT_RESOLVE_FILENAME. Another common error is ERROR_INVALID_REPARSE_DATA. This can occur if the reparse data is malformed or if the target device is invalid for the reparse type. For example, a junction is restricted to [bind-]mounting local volumes, so it's invalid if it targets a remote device.

    Windows errors to ignore should be added to _IGNORED_WINERRORS in Lib/pathlib.py. For example:

        _IGNORED_WINERRORS = (
            # ...
            1921, # ERROR_CANT_RESOLVE_FILENAME - similar to POSIX ELOOP
            4392, # ERROR_INVALID_REPARSE_DATA  - also for disallowed device targets
        )

    @eamanu
    Copy link
    Mannequin

    eamanu mannequin commented Feb 21, 2019

    Hello!

    I make the PR: #11964

    But I need help to test it :-(

    Any could help me please?

    @jstucke
    Copy link
    Mannequin Author

    jstucke mannequin commented Feb 22, 2019

    I tried to add a test file in #11988
    To fix all now broken tests I had to add a try except block to the _WildcardSelector as well (analogous to the _RecursiveWildcardSelector).

    I could only check on Linux and I have no idea how it behaves on any other OS.

    @jstucke
    Copy link
    Mannequin Author

    jstucke mannequin commented Feb 22, 2019

    As expected the Windows CI build failed.
    All test fails were caused by:
    OSError: [WinError 1921] The name of the file cannot be resolved by the system: 'C:\\projects\\cpython\\build\\test_python_936\\@test_936_tmp\\brokenLinkLoop'

    Therefore, I added WinError 1921 to _IGNORED_WINERRORS as suggested by Eryk Sun.

    @pitrou
    Copy link
    Member

    pitrou commented May 21, 2019

    New changeset d5c120f by Antoine Pitrou (Jörg Stucke) in branch 'master':
    bpo-36035: fix Path.rglob for broken links (GH-11988)
    d5c120f

    @miss-islington
    Copy link
    Contributor

    New changeset aea49b1 by Miss Islington (bot) in branch '3.7':
    [3.7] bpo-36035: fix Path.rglob for broken links (GH-11988) (GH-13469)
    aea49b1

    @pitrou pitrou closed this as completed May 21, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants