classification
Title: os.walk() consider some symlinks as dirs instead of non-dirs
Type: enhancement Stage:
Components: Documentation Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Sung-Yu.Chen, akira, alexey-smirnov, benhoyt, docs@python, ncoghlan, socketpair, ukl, vstinner
Priority: normal Keywords: patch

Created on 2011-09-13 13:02 by socketpair, last changed 2014-07-28 14:16 by akira.

Files
File name Uploaded Description Edit
z.patch socketpair, 2011-09-13 13:02 patch for the problem
docs-walk-issue12970.patch akira, 2014-07-28 14:16 Update os.walk() docs review
Messages (5)
msg143965 - (view) Author: Марк Коренберг (socketpair) * Date: 2011-09-13 13:02
Consider code:

for (root, dirs, nondirs) in os.walk(path, followlinks=False):
    print (nondirs)

This code will not print symlinks that refer to some dir. I think it is the bug.

In other words: If followlinks is True, we should consider some symlinks as dirs. If not, any symlink is the non-dir.

Patch included.

Also, please fix documentation about this nuance.
msg143967 - (view) Author: Марк Коренберг (socketpair) * Date: 2011-09-13 13:16
Also, there is some mis-optimisation for followlinks=False: stat() and then lstat() will be called. Instead of one lstat().

Code may be rewritten as (but I don't know about cross-platform issues):
---------------------------------
if followlinks:
    mode = os.stat(path).st_mode
else:
    mode = os.lstat(path).st_mode

if stat.S_ISDIR(mode):
    dirs.append(path)
else:
    nondir.append(path)
---------------------------------
It will be much cleaner than current (or patched with my patch) implementation
msg152805 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-02-07 11:54
This behaviour came up recently when implementing os.fwalk() [1]. There are problems with all 3 possible approaches (list as dirs, list as files, don't list at all) when followlinks is False. Since all alternatives are potentially surprising, the current behaviour wins by default (as people will already have written their code to cope with that behaviour and there's no net gain in changing the default, since the desired treatment of such links will vary according to the task at hand).

As a result, I'm converting this to a pure documentation issue - the os.walk() docs should definitely mention this subtlety. The behaviour won't be changing, though.

[1] http://bugs.python.org/issue13734,#msg151077
msg223367 - (view) Author: Uwe Kleine-König (ukl) * Date: 2014-07-17 20:20
I like the function as it is documented, i.e. "filenames is a list of the names of the non-directory files in dirpath.". This includes all symlinks (in the followlinks=False cast at least).

I'd say not including symlinks to directories but symlinks to files is a magnitude more surprising than treating a symlink to a directory as a file. And if you consider this as a short comming of the documentation this isn't (IMHO) a subtlety. The (my?) intuition says: all entries of a root (apart from . and .. as documented) are included in either dirnames or filenames.

Yes, changing behaviour here might break some code, but this applies to all changes.

For some usecases it might be right to just skip over symlinks-to-dirs, but if it's not you have to opendir + read all root entries again in the loop to find all symlinks which effectively means reimplementing os.walk.
msg224167 - (view) Author: Akira Li (akira) * Date: 2014-07-28 14:16
I've updated os.walk() documentation to mention that *dirnames* list
includes symlinks to directories.

To imitate the other two cases:

- treat the symlinks as files:

    for dirpath, dirnames, files in os.walk(top):
        dirs = []
        for name in dirnames:
            (files if islink(join(dirpath, name)) else dirs).append(name)
        dirnames = dirs

- don't include in either of the lists:

    for dirpath, dirnames, files in os.walk(top):
        dirnames[:] = [name for name in dirnames
                       if not islink(join(dirpath, name))]

where islink = os.path.islink and join = os.path.join.

I've uploaded the documentation patch. Please, review.
History
Date User Action Args
2014-07-28 14:16:38akirasetfiles: + docs-walk-issue12970.patch
nosy: + akira
messages: + msg224167

2014-07-28 13:06:14vstinnersetnosy: + benhoyt
2014-07-17 20:20:59uklsetnosy: + ukl
messages: + msg223367
2012-02-07 11:54:27ncoghlansetversions: - Python 3.1, Python 3.4
nosy: + ncoghlan

messages: + msg152805

components: - Library (Lib)
type: behavior -> enhancement
2012-02-07 02:06:21Sung-Yu.Chensetnosy: + Sung-Yu.Chen
2011-09-14 06:08:33alexey-smirnovsetnosy: + alexey-smirnov
2011-09-13 13:24:15socketpairsettitle: os.wlak() consider some symlinks as dirs instead of non-dirs -> os.walk() consider some symlinks as dirs instead of non-dirs
2011-09-13 13:16:04socketpairsetmessages: + msg143967
2011-09-13 13:06:17vstinnersetnosy: + vstinner
2011-09-13 13:02:11socketpaircreate