This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: os.walk: bottom-up
Type: behavior Stage: resolved
Components: Documentation Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: BreamoreBoy, docs@python, patrick.vrijlandt, pitrou, python-dev, rosslagerwall, zbysz
Priority: normal Keywords: patch

Created on 2012-01-13 11:03 by patrick.vrijlandt, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue13779.patch zbysz, 2012-02-22 16:01 add one sentence clarification to the docs review
Messages (10)
msg151169 - (view) Author: patrick vrijlandt (patrick.vrijlandt) Date: 2012-01-13 11:03
PythonWin 3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)] on win32.
Portions Copyright 1994-2008 Mark Hammond - see 'Help/About PythonWin' for further copyright information.
>>> import os
>>> os.makedirs("g:/a/b/c")
>>> os.listdir("g:/a")
['b']
>>> for root, dirs, files in os.walk("g:/a", topdown = False):
... 	print(root, dirs, files, os.listdir(root))
... 	os.rmdir(root)
... 	
g:/a\b\c [] [] []
g:/a\b ['c'] [] []
g:/a ['b'] [] []
>>> 

From the documentation of os.walk:
If topdown is False, the triple for a directory is generated after the triples for all of its subdirectories (directories are generated bottom-up).

As the above example shows, the directories are generated in the correct order, "generated" referring to yield from generator os.walk. However, the generated (files? and) dirs do not necessarily reflect the current situation as produced by os.listdir.

Therefore, this does not clear the entire directory tree as I would expect.

>>> os.makedirs("g:/a/b/c")
>>> for root, dirs, files in os.walk("g:/a", topdown = False):
... 	print(root, dirs, files, os.listdir(root))	
... 	if not (files + dirs):
... 	    os.rmdir(root)
... 
g:/a\b\c [] [] []
g:/a\b ['c'] [] []
g:/a ['b'] [] ['b']

I think that at least the documentation should be more clear on this issue. I would like even better, if files + dirs would match os.listdir on the moment they are generated (=yielded).
msg151366 - (view) Author: Zbyszek Jędrzejewski-Szmek (zbysz) * Date: 2012-01-16 14:34
Hi,
I think that the documentation is pretty clear ("[if topdown=False] ...  the directories in dirnames have already been generated by the time dirnames itself is generated"). And such behaviour is what one would expect, since it is the result of the simplest implementation (listdir(), yield <subdir>, yield <subdir>, yeild <subdir>, yeild <dir>).

I don't think that the implementation will be changed, since it is pretty to do listdir() yourself if needed, and it would make the function slower for the common use case.

Improving the documentation is always possible, what sentence would you see added (what would make the behaviour clearer to you?) ?
msg151370 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-01-16 15:03
> However, the generated (files? and) dirs do not necessarily reflect the current situation as produced by os.listdir.

What do you mean exactly? Another process has re-created "b" in parallel?
This race condition is pretty impossible to solve in the general case (unless the filesystem is transactional): even if you call os.listdir() again, a new file or dir could appear just before the following call to os.rmdir().

(just for the record, shutil.rmtree already allows you to delete a filesystem tree)
msg151375 - (view) Author: patrick vrijlandt (patrick.vrijlandt) Date: 2012-01-16 15:56
The documentation of this function is generally ambiguous, because os.walk is a generator. Thus "generate" means (1) yielded from a generator and (2) prepared for later use within the generator. To avoid the ambiguity, "generate" should be avoided if possible. (1) might be replaced by "return" and (2) by "prepare".

@zbysz
The paragraph you cite is about "Modifying dirnames ". If you are not "Modifying dirnames" (like me) this section easily skips your attention.

My problem was, of course, that I had myself changed the directory structure - It's not a race condition with another process but an effect  of the loop os.walk was managing.

@pitrou
shutil.rmtree cannot help me, because I'm only deleting empty dirs.

Proposal (very verbose I'm afraid):
If you change the directory structure below dirpath while topdown=True, you can modify dirnames in-place so that walk will visit the right directories. If you change the directory structure below dirpath while topdown=False (maybe while you where there), dirnames and filenames can still reflect the old situation and it might be necessary to call os.listdir again.
msg151385 - (view) Author: Zbyszek Jędrzejewski-Szmek (zbysz) * Date: 2012-01-16 16:34
> The documentation of this function is generally ambiguous, because
> os.walk is a generator. Thus "generate" means (1) yielded from a 
> generator and (2) prepared for later use within the generator. To avoid 
> the ambiguity, "generate" should be avoided if possible. (1) might be 
> replaced by "return" and (2) by "prepare".
It think that you are wrong here: generate is consistently used in
meaning (1) in the docstring. Also "return" means something completely different. "generate" is better than "prepare", because it is more specific,
especially from the POV of the user of the function, who only cares about
when something is yielded, not when it is prepared.

> Proposal (very verbose I'm afraid):
> If you change the directory structure below dirpath while topdown=True, 
> you can modify dirnames in-place so that walk will visit the right 
> directories. If you change the directory structure below dirpath while 
> topdown=False (maybe while you where there), dirnames and filenames can 
> still reflect the old situation and it might be necessary to call 
> os.listdir again.
Hm, I think that the difference in behaviour between topdown=False and topdown=True is smaller then this proposal implies.
When topdown=True, the list is not updated after changes on disk. So both removing and adding directories does _not_ cause them to added or removed from the list of subdirectories to visit. Nevertheless, the default behaviour on
error is to do nothing, so it _looks_ like they were skipped.

So I think that the documentation could only clarify that the list of subdirectories is queried first, before the dir or the subdirectories are "generated".
msg151386 - (view) Author: Zbyszek Jędrzejewski-Szmek (zbysz) * Date: 2012-01-16 16:36
>>> os.makedirs('/tmp/a/b/c')
>>> gen  = os.walk('/tmp/a')
>>> next(gen)
('/tmp/a', ['b'], [])
>>> os.makedirs('/tmp/a/b2')
>>> next(gen)
('/tmp/a/b', ['c'], [])
>>> next(gen)
('/tmp/a/b/c', [], [])

>>> gen  = os.walk('/tmp/a', onerror=print)
>>> next(gen)
('/tmp/a', ['b2', 'b'], [])
>>> os.rmdir('/tmp/a/b2')
>>> next(gen)
[Errno 2] No such file or directory: '/tmp/a/b2'
('/tmp/a/b', ['c'], [])
msg153964 - (view) Author: Zbyszek Jędrzejewski-Szmek (zbysz) * Date: 2012-02-22 16:01
Maybe this could be closed? I'm attaching a small patch with a documentation clarification. (Adds "In either case the list of
subdirectories is retrieved before the tuples for the directory and
its subdirectories are generated.").
msg154053 - (view) Author: patrick vrijlandt (patrick.vrijlandt) Date: 2012-02-23 08:00
Good solution. +1 for closing.

Patrick
msg220636 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-06-15 13:29
The OP is happy with the proposed doc patch so can we have this committed please.
msg220694 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-06-16 03:52
New changeset 351c1422848f by Benjamin Peterson in branch '2.7':
clarify when the list of subdirectories is read (closes #13779)
http://hg.python.org/cpython/rev/351c1422848f

New changeset b10322b5ef0f by Benjamin Peterson in branch '3.4':
clarify when the list of subdirectories is read (closes #13779)
http://hg.python.org/cpython/rev/b10322b5ef0f

New changeset 1411df211159 by Benjamin Peterson in branch 'default':
merge 3.4 (#13779)
http://hg.python.org/cpython/rev/1411df211159
History
Date User Action Args
2022-04-11 14:57:25adminsetgithub: 57988
2014-06-16 03:52:12python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg220694

resolution: fixed
stage: resolved
2014-06-15 13:29:44BreamoreBoysetnosy: + BreamoreBoy
messages: + msg220636
2012-02-23 08:00:40patrick.vrijlandtsetmessages: + msg154053
2012-02-22 16:01:17zbyszsetfiles: + issue13779.patch
keywords: + patch
messages: + msg153964
2012-01-16 16:36:39zbyszsetmessages: + msg151386
2012-01-16 16:34:12zbyszsetmessages: + msg151385
2012-01-16 15:56:26patrick.vrijlandtsetmessages: + msg151375
2012-01-16 15:03:22pitrousetnosy: + pitrou
messages: + msg151370
2012-01-16 14:34:46zbyszsetnosy: + zbysz
messages: + msg151366
2012-01-14 05:25:38rosslagerwallsetnosy: + rosslagerwall
2012-01-13 11:03:09patrick.vrijlandtcreate