Issue 26860: Make os.walk and os.fwalk yield namedtuple instead of tuple

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/71047

classification

Title:	Make os.walk and os.fwalk yield namedtuple instead of tuple
Type:	enhancement	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.6

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:	rhettinger	Nosy List:	ethan.furman, giampaolo.rodola, loewis, palaviv, rhettinger, serhiy.storchaka
Priority:	normal	Keywords:	patch

Created on 2016-04-26 13:45 by palaviv, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
os-walk-result-namedtuple.patch	palaviv, 2016-04-26 13:45		review

Messages (13)
msg264285 - (view)	Author: Aviv Palivoda (palaviv) *	Date: 2016-04-26 13:45
I am suggesting that os.walk and os.fwalk will yield a namedtuple instead of the regular tuple they currently yield. The use case for this change can be seen in the next example: def walk_wrapper(walk_it): for dir_entry in walk_it: if dir_entry[0] == "aaa": yield dir_entry Because walk_it can be either os.walk or os.fwalk I need to access dir_entry via index. My change will allow me to change this function to: def walk_wrapper(walk_it): for dir_entry in walk_it: if dir_entry.dirpath == "aaa": yield dir_entry Witch is more clear and readable.
msg264288 - (view)	Author: Ethan Furman (ethan.furman) *	Date: 2016-04-26 13:58
Quick review of patch looks good. I'll try to look it over more closely later.
msg264418 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2016-04-28 06:42
Classes are normally named with CamelCase. Also, "walk_result" or "WalkResult" seems like an odd name that doesn't really fit. DirEntry or DirInfo is a better match (see the OP's example, "for dir_entry in walk_it: ...") The "versionchanged" should be a "versionadded". The docs should use "named tuple" instead of "namedtuple". The former is the generic term used in the glossary to describe the instances. The latter is the factory function that creates a new tuple subclass. The attribute descriptions for the docs are pretty good. They should also be applied as actual docstrings in the code as well. The docs and code for fwalk() needs to be harmonized with walk() so the the tuple fields use the same names: change (root, dirs, files) to (dirpath, dirnames, filenames).
msg264421 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-04-28 08:00
Sorry, but I disagree with Raymond in many points. > Classes are normally named with CamelCase. Also, "walk_result" or "WalkResult" seems like an odd name that doesn't really fit. DirEntry or DirInfo is a better match (see the OP's example, "for dir_entry in walk_it: ...") See "stat_result", "statvfs_result", "waitid_result", "uname_result", and "times_result". DirEntry is already used in the os module. And if accept this feature, needed separate types for walk() and fwalk() results. > The "versionchanged" should be a "versionadded". os.walk() is not new. Just it's result is changed. Class "walk_result" can be tagged with "versionadded", but I'm not sure there is a need to document it separately. The documentation of the os module already too large. "uname_result" and "times_result" are not documented. > The docs and code for fwalk() needs to be harmonized with walk() so the the tuple fields use the same names: change (root, dirs, files) to (dirpath, dirnames, filenames). (root, dirs, files) is shorter than (dirpath, dirnames, filenames) and these names were used with os.walk() and os.fwalk() for years. I general, I have doubts about this feature. 1. There is little backward incompatibility. At least pickle is not backward compatible, and I guess other serialization methods. 2. os.walk() and os.fwalk() are purposed to be used in for loop with immediate unpacking result tuple: for root, dirs, files in os.walk(...): ... Adding named tuple doesn't add any benefit for common case. In OP case, you can either use fwalk-based implementation of walk (issue15200): def fwalk_as_walk(args, kwargs): for x in os.fwalk(args, *kwargs): yield x[:-1] or just ignore the rest of tuple items: for root, _ in walk_it: ... 3. Using namedtuple is slower and consumes more memory than using tuple. Even for FS-related operation like os.walk() this can matter. A lot of code is optimized for exact tuples, with namedtuple this optimization is lost. 4. New names (dirpath, dirnames, filenames) are questionable. Why not use undersores (dir_names)? "dir" in dirpath refers to the current proceeded directory, but "dir" in dirnames refers to it's subdirectories. Currently you are free to use short names (root, dirs, files) from examples or what you prefer, but with namedtuple you are sticked with standard names forever. There are no names that satisfy everybody. 5. Third-party walk-like iterators generate tuples, so you can't use attribute access in too general code.
msg264478 - (view)	Author: Aviv Palivoda (palaviv) *	Date: 2016-04-29 08:42
In regard to Raymond`s points I agree with Serhiy`s comments. As for Serhiy`s doubts: > 3. Using namedtuple is slower and consumes more memory than using tuple. Even for FS-related operation like os.walk() this can matter. A lot of code is optimized for exact tuples, with namedtuple this optimization is lost. I did some testing on my own PC: ./python -m timeit -s "from os import walk" "for x in walk('Lib'): pass" Regular tuple: 7.53 msec Named tuple: 7.66 msec > 4. New names (dirpath, dirnames, filenames) are questionable. Why not use undersores (dir_names)? "dir" in dirpath refers to the current proceeded directory, but "dir" in dirnames refers to it's subdirectories. Currently you are free to use short names (root, dirs, files) from examples or what you prefer, but with namedtuple you are sticked with standard names forever. There are no names that satisfy everybody. I agree that there will be no names that will satisfy everybody but I think the names that are currently in the documentation are the most trivial choice. As for points 1,2,5 this feature doesn`t break any of the old walk API. One more point I would like input on is the testing. I can remove the walk method from the WalkTests, FwalkTests classes and use the new named tuple attributes in the tests. Do you think its better or should we keep the tests with the old API (access using indexes)?
msg264503 - (view)	Author: Ethan Furman (ethan.furman) *	Date: 2016-04-29 14:45
I'm not clear on what you asking, but regardless we should have both the old (by-index) tests and new by-attribute tests.
msg264506 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2016-04-29 16:09
https://www.python.org/dev/peps/pep-0008/#class-names -- "Class names should normally use the CapWords convention." Examples: --------- crypt.py 6:from collections import namedtuple as _namedtuple 13:class _Method(_namedtuple('_Method', 'name ident salt_chars total_size')): difflib.py 34:from collections import namedtuple as _namedtuple 36:Match = _namedtuple('Match', 'a b size') dis.py 163:_Instruction = collections.namedtuple("_Instruction", 280: Generates a sequence of Instruction namedtuples giving the details of each doctest.py 107:from collections import namedtuple 109:TestResults = namedtuple('TestResults', 'failed attempted') functools.py 21:from collections import namedtuple 345:_CacheInfo = namedtuple("CacheInfo", ["hits", "misses", "maxsize", "currsize"]) inspect.py 51:from collections import namedtuple, OrderedDict 323:Attribute = namedtuple('Attribute', 'name kind defining_class object') 968:Arguments = namedtuple('Arguments', 'args, varargs, varkw') 1008:ArgSpec = namedtuple('ArgSpec', 'args varargs keywords defaults') 1032:FullArgSpec = namedtuple('FullArgSpec', 1124:ArgInfo = namedtuple('ArgInfo', 'args varargs keywords locals') 1317:ClosureVars = namedtuple('ClosureVars', 'nonlocals globals builtins unbound') 1372:Traceback = namedtuple('Traceback', 'filename lineno function code_context index') 1412:FrameInfo = namedtuple('FrameInfo', ('frame',) + Traceback._fields) nntplib.py 159:GroupInfo = collections.namedtuple('GroupInfo', 162:ArticleInfo = collections.namedtuple('ArticleInfo', No doubt, there are exceptions to the rule in the standard library which is less consistent than we might like: "stat_result". That said, stat_result is a structseq and many C type names are old or violate the rules (list vs List, etc). New named tuples should follow PEP 8 can use CapWords convention unless there is a strong reason not to in a particular case.
msg264507 - (view)	Author: Aviv Palivoda (palaviv) *	Date: 2016-04-29 16:26
Thanks for the response Ethan I think that I will leave the tests as they are in the current patch. > No doubt, there are exceptions to the rule in the standard library which is less consistent than we might like: "stat_result". That said, stat_result is a structseq and many C type names are old or violate the rules (list vs List, etc). New named tuples should follow PEP 8 can use CapWords convention unless there is a strong reason not to in a particular case. I actually thought we should keep on consistency with other "result" like objects. I can see your point about new named tuples that should follow PEP 8 and DirEntry is an example of new "result" class that follow PEP8. What names do you suggest? Maybe DirInfo and FDirInfo?
msg291346 - (view)	Author: Giampaolo Rodola' (giampaolo.rodola) *	Date: 2017-04-08 21:40
Should we have concerns about performances? Accessing a namedtuple value is almost 4x times slower compared to a plain tuple [1] and os.walk() may iterate hundreds of times. http://stackoverflow.com/questions/2646157/what-is-the-fastest-to-access-struct-like-object-in-python
msg291352 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2017-04-09 06:08
I would expect that the field access time is inconsequential compared to just about every other aspect of os.walk().
msg291354 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-04-09 07:28
namedtuple's attribute access was optimized in recent years. In 3.7 it is 30% faster than in 3.4. So now it is only 3x times slower compared to a plain tuple. On other hand, os.walk() and os.fwalk() was optimized too. In 3.7 they are up to 3.5x times faster than in 3.4 (with hot caches). I didn't make measurements, but I expect that using namedtuples with os.walk() can decrease its performance at least by few percents. My main concern is that this feature will increase the complexity of the documentation of the os module (very little) and may encourage writing less clear code (but this is just my own preference, others can found new style more clear).
msg291355 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-04-09 07:29
s/at least/at most/
msg291402 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2017-04-10 00:55
There doesn't seem to be a consensus that the proposal is a net win. Serhiy made a persuasive argument that the added complexity isn't worth it. I'll leave this open for a day or two so that anyone else can make their case. Otherwise, I'll mark this as closed/rejected.

History
Date	User	Action	Args
2022-04-11 14:58:30	admin	set	github: 71047
2017-04-11 02:02:11	rhettinger	set	status: open -> closed resolution: rejected stage: resolved
2017-04-10 00:56:10	rhettinger	set	messages: - msg291388
2017-04-10 00:55:45	rhettinger	set	messages: + msg291402
2017-04-09 19:03:04	rhettinger	set	assignee: rhettinger messages: + msg291388
2017-04-09 07:29:02	serhiy.storchaka	set	messages: + msg291355
2017-04-09 07:28:29	serhiy.storchaka	set	messages: + msg291354
2017-04-09 06:08:56	rhettinger	set	messages: + msg291352
2017-04-08 21:40:52	giampaolo.rodola	set	nosy: + giampaolo.rodola messages: + msg291346
2016-05-04 13:09:28	ppperry	set	title: os.walk and os.fwalk yield namedtuple instead of tuple -> Make os.walk and os.fwalk yield namedtuple instead of tuple
2016-04-29 16:26:58	palaviv	set	messages: + msg264507
2016-04-29 16:09:11	rhettinger	set	messages: + msg264506
2016-04-29 14:45:22	ethan.furman	set	messages: + msg264503
2016-04-29 08:42:25	palaviv	set	messages: + msg264478
2016-04-28 08:00:48	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg264421
2016-04-28 06:42:47	rhettinger	set	nosy: + rhettinger messages: + msg264418
2016-04-26 13:58:00	ethan.furman	set	messages: + msg264288
2016-04-26 13:53:40	ethan.furman	set	nosy: + ethan.furman
2016-04-26 13:45:06	palaviv	create