Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc] os.walk() consider some symlinks as dirs instead of non-dirs #57179

Closed
socketpair mannequin opened this issue Sep 13, 2011 · 5 comments
Closed

[doc] os.walk() consider some symlinks as dirs instead of non-dirs #57179

socketpair mannequin opened this issue Sep 13, 2011 · 5 comments
Assignees
Labels
3.11 only security fixes docs Documentation in the Doc dir type-feature A feature request or enhancement

Comments

@socketpair
Copy link
Mannequin

socketpair mannequin commented Sep 13, 2011

BPO 12970
Nosy @ncoghlan, @vstinner, @benhoyt, @4kir4, @socketpair
Files
  • z.patch: patch for the problem
  • docs-walk-issue12970.patch: Update os.walk() docs
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2011-09-13.13:02:11.368>
    labels = ['3.11', 'type-feature', 'docs']
    title = '[doc] os.walk() consider some symlinks as dirs instead of non-dirs'
    updated_at = <Date 2021-11-28.13:04:55.425>
    user = 'https://github.com/socketpair'

    bugs.python.org fields:

    activity = <Date 2021-11-28.13:04:55.425>
    actor = 'iritkatriel'
    assignee = 'docs@python'
    closed = False
    closed_date = None
    closer = None
    components = ['Documentation']
    creation = <Date 2011-09-13.13:02:11.368>
    creator = 'socketpair'
    dependencies = []
    files = ['23140', '36138']
    hgrepos = []
    issue_num = 12970
    keywords = ['patch']
    message_count = 5.0
    messages = ['143965', '143967', '152805', '223367', '224167']
    nosy_count = 9.0
    nosy_names = ['ncoghlan', 'vstinner', 'benhoyt', 'docs@python', 'akira', 'socketpair', 'alexey-smirnov', 'Sung-Yu.Chen', 'ukl']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue12970'
    versions = ['Python 3.11']

    @socketpair
    Copy link
    Mannequin Author

    socketpair mannequin commented Sep 13, 2011

    Consider code:

    for (root, dirs, nondirs) in os.walk(path, followlinks=False):
        print (nondirs)

    This code will not print symlinks that refer to some dir. I think it is the bug.

    In other words: If followlinks is True, we should consider some symlinks as dirs. If not, any symlink is the non-dir.

    Patch included.

    Also, please fix documentation about this nuance.

    @socketpair socketpair mannequin added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir labels Sep 13, 2011
    @socketpair socketpair mannequin assigned docspython Sep 13, 2011
    @socketpair socketpair mannequin added the docs Documentation in the Doc dir label Sep 13, 2011
    @socketpair
    Copy link
    Mannequin Author

    socketpair mannequin commented Sep 13, 2011

    Also, there is some mis-optimisation for followlinks=False: stat() and then lstat() will be called. Instead of one lstat().

    Code may be rewritten as (but I don't know about cross-platform issues):
    ---------------------------------

    if followlinks:
        mode = os.stat(path).st_mode
    else:
        mode = os.lstat(path).st_mode
    
    if stat.S_ISDIR(mode):
        dirs.append(path)
    else:
        nondir.append(path)

    It will be much cleaner than current (or patched with my patch) implementation

    @socketpair socketpair mannequin changed the title os.wlak() consider some symlinks as dirs instead of non-dirs os.walk() consider some symlinks as dirs instead of non-dirs Sep 13, 2011
    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Feb 7, 2012

    This behaviour came up recently when implementing os.fwalk() [1]. There are problems with all 3 possible approaches (list as dirs, list as files, don't list at all) when followlinks is False. Since all alternatives are potentially surprising, the current behaviour wins by default (as people will already have written their code to cope with that behaviour and there's no net gain in changing the default, since the desired treatment of such links will vary according to the task at hand).

    As a result, I'm converting this to a pure documentation issue - the os.walk() docs should definitely mention this subtlety. The behaviour won't be changing, though.

    [1] http://bugs.python.org/issue13734,#msg151077

    @ncoghlan ncoghlan added type-feature A feature request or enhancement and removed stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Feb 7, 2012
    @ukl
    Copy link
    Mannequin

    ukl mannequin commented Jul 17, 2014

    I like the function as it is documented, i.e. "filenames is a list of the names of the non-directory files in dirpath.". This includes all symlinks (in the followlinks=False cast at least).

    I'd say not including symlinks to directories but symlinks to files is a magnitude more surprising than treating a symlink to a directory as a file. And if you consider this as a short comming of the documentation this isn't (IMHO) a subtlety. The (my?) intuition says: all entries of a root (apart from . and .. as documented) are included in either dirnames or filenames.

    Yes, changing behaviour here might break some code, but this applies to all changes.

    For some usecases it might be right to just skip over symlinks-to-dirs, but if it's not you have to opendir + read all root entries again in the loop to find all symlinks which effectively means reimplementing os.walk.

    @4kir4
    Copy link
    Mannequin

    4kir4 mannequin commented Jul 28, 2014

    I've updated os.walk() documentation to mention that *dirnames* list
    includes symlinks to directories.

    To imitate the other two cases:

    • treat the symlinks as files:
        for dirpath, dirnames, files in os.walk(top):
            dirs = []
            for name in dirnames:
                (files if islink(join(dirpath, name)) else dirs).append(name)
            dirnames = dirs
    • don't include in either of the lists:
        for dirpath, dirnames, files in os.walk(top):
            dirnames[:] = [name for name in dirnames
                           if not islink(join(dirpath, name))]

    where islink = os.path.islink and join = os.path.join.

    I've uploaded the documentation patch. Please, review.

    @iritkatriel iritkatriel added the 3.11 only security fixes label Nov 28, 2021
    @iritkatriel iritkatriel changed the title os.walk() consider some symlinks as dirs instead of non-dirs [doc] os.walk() consider some symlinks as dirs instead of non-dirs Nov 28, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @JelleZijlstra JelleZijlstra self-assigned this Oct 7, 2022
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 7, 2022
    (cherry picked from commit 0f498f1)
    
    Co-authored-by: Stanley <46876382+slateny@users.noreply.github.com>
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 7, 2022
    (cherry picked from commit 0f498f1)
    
    Co-authored-by: Stanley <46876382+slateny@users.noreply.github.com>
    miss-islington added a commit that referenced this issue Oct 8, 2022
    (cherry picked from commit 0f498f1)
    
    Co-authored-by: Stanley <46876382+slateny@users.noreply.github.com>
    miss-islington added a commit that referenced this issue Oct 8, 2022
    (cherry picked from commit 0f498f1)
    
    Co-authored-by: Stanley <46876382+slateny@users.noreply.github.com>
    @slateny slateny closed this as completed Oct 8, 2022
    carljm added a commit to carljm/cpython that referenced this issue Oct 8, 2022
    * main: (38 commits)
      pythongh-92886: make test_ast pass with -O (assertions off) (pythonGH-98058)
      pythongh-92886: make test_coroutines pass with -O (assertions off) (pythonGH-98060)
      pythongh-57179: Add note on symlinks for os.walk (python#94799)
      pythongh-94808: Fix regex on exotic platforms (python#98036)
      pythongh-90085: Remove vestigial -t and -c timeit options (python#94941)
      pythonGH-83901: Improve Signature.bind error message for missing keyword-only params (python#95347)
      pythongh-61105: Add default param, note on using cookiejar subclass (python#95427)
      pythongh-96288: Add a sentence to `os.mkdir`'s docstring. (python#96271)
      pythongh-96073: fix backticks in NEWS entry (pythonGH-98056)
      pythongh-92886: [clinic.py] raise exception on invalid input instead of assertion (pythonGH-98051)
      pythongh-97997: Add col_offset field to tokenizer and use that for AST nodes (python#98000)
      pythonGH-88968: Reject socket that is already used as a transport (python#98010)
      pythongh-96346: Use double caching for re._compile() (python#96347)
      pythongh-91708: Revert params note in urllib.parse.urlparse table (python#96699)
      pythongh-96265: Fix some formatting in faq/design.rst (python#96924)
      pythongh-73196: Add namespace/scope clarification for inheritance section (python#92840)
      pythongh-97646: Change `.js` and `.mjs` files mimetype to conform to RFC 9239 (python#97934)
      pythongh-97923: Always run Ubuntu SSL tests with others in CI (python#97940)
      pythongh-97956: Mention `generate_global_objects.py` in `AC How-To` (python#97957)
      pythongh-96959: Update HTTP links which are redirected to HTTPS (python#98039)
      ...
    carljm added a commit to carljm/cpython that referenced this issue Oct 9, 2022
    * main: (5519 commits)
      Minor edits to the Descriptor HowTo Guide (pythonGH-24901)
      Fix link to Lifecycle of a Pull Request in CONTRIBUTING (python#98102)
      pythonGH-94597: deprecate `SafeChildWatcher`, `FastChildWatcher` and `MultiLoopChildWatcher` child watchers  (python#98089)
      Auto-cancel old builds when new commit pushed to branch (python#98009)
      pythongh-95011: Migrate syslog module to Argument Clinic (pythonGH-95012)
      pythongh-68686: Retire eptag ptag scripts (python#98064)
      pythongh-97922: Run the GC only on eval breaker (python#97920)
      GitHub Workflows security hardening (python#96492)
      Add `@ezio-melotti` as codeowner for `.github/`. (python#98079)
      pythongh-97913 Docs: Add walrus operator to the index (python#97921)
      [doc] Fix broken links to C extensions accelerating stdlib modules (python#96914)
      pythongh-97822: Fix http.server documentation reference to test() function (python#98027)
      pythongh-91052: Add PyDict_Unwatch for unwatching a dictionary (python#98055)
      pythonGH-98023: Change default child watcher to PidfdChildWatcher on supported systems (python#98024)
      pythonGH-94182: Run the PidfdChildWatcher on the running loop (python#94184)
      pythongh-92886: make test_ast pass with -O (assertions off) (pythonGH-98058)
      pythongh-92886: make test_coroutines pass with -O (assertions off) (pythonGH-98060)
      pythongh-57179: Add note on symlinks for os.walk (python#94799)
      pythongh-94808: Fix regex on exotic platforms (python#98036)
      pythongh-90085: Remove vestigial -t and -c timeit options (python#94941)
      ...
    mpage pushed a commit to mpage/cpython that referenced this issue Oct 11, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes docs Documentation in the Doc dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants