Author benhoyt
Recipients benhoyt, python-dev, scott.dial, serhiy.storchaka, vstinner
Date 2015-03-12.02:53:48
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1426128829.99.0.763230702833.issue23605@psf.upfronthosting.co.za>
In-reply-to
Content
To Victor and Serhiy:

1) Serhiy's point about not needing to build the symlinks set when followlinks is True is a good one, because it'll never get used. Just a "if not followlinks: ..." around that try/except would do it. Though this is a small optimization, as I expect 95% of people use os.walk() with followlinks=False, which is the default.

2) Much as I don't want to admit it :-), I think Serhiy has a point regarding the change in behaviour. Can we accept this tiny change? This happens because the previous implementation of os.walk() calls path.islink(), which does a real syscall, after the yield returns.

So if the caller modifies "dirnames" and adds a symlink to a directory it won't be in the symlinks set. Or if (as Serhiy's example shows) they change a symlink-to-a-directory to a directory the new implementation doesn't do another syscall to check, so differs from the old one -- in Serhiy's example, the old os.walk() will recurse into the changed symlink-to-a-directory-that's-now-a-directory, whereas the new os.walk() won't recurse.

Arguably the new behaviour is just as good here, but it is different. And Serhiy's function unsymlink() is a reasonable example of how this might play out.

The os.walk() docs explicitly allow modifying "dirnames" -- not just removing and reordering, but also adding entries, which could surface the difference in behaviour: "the caller can modify the dirnames list in-place ... even to inform walk() about directories the caller creates or renames before it resumes walk() again". See here:

https://docs.python.org/3.4/library/os.html#os.walk

So I do think it's a real issue. Will this really be an issue in practice? I don't know; I'm tempted to think not.

Obviously if we have to call stat/islink again, it defeats half of the purpose of scandir. But I don't see a way around it if we want to keep the exact old os.walk() behaviour. We could fix the behaviour if a directory/symlink was added to "dirnames" fairly easily by testing whether it changed, but I don't see a way to fix the unsymlinks() example without a syscall.

Thoughts?
History
Date User Action Args
2015-03-12 02:53:50benhoytsetrecipients: + benhoyt, scott.dial, vstinner, python-dev, serhiy.storchaka
2015-03-12 02:53:49benhoytsetmessageid: <1426128829.99.0.763230702833.issue23605@psf.upfronthosting.co.za>
2015-03-12 02:53:49benhoytlinkissue23605 messages
2015-03-12 02:53:48benhoytcreate