This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: `pathlib.Path.iterdir()` wastes memory by using `os.listdir()` rather than `os.scandir()`
Type: resource usage Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: barneygale, pitrou, remi.lapeyre, serhiy.storchaka, xtreak
Priority: normal Keywords: patch

Created on 2020-03-09 00:17 by barneygale, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 18865 closed barneygale, 2020-03-09 00:22
Messages (9)
msg363689 - (view) Author: Barney Gale (barneygale) * Date: 2020-03-09 00:17
`pathlib.Path.iterdir()` uses `os.listdir()` rather than `os.scandir()`. I think this has a small performance cost, per PEP 471:

> It returns a generator instead of a list, so that scandir acts as a true iterator instead of returning the full list immediately.

As `scandir()` is already available from `_NormalAccessor` it's a simple patch to use `scandir()` instead.
msg363721 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2020-03-09 12:03
This optimisation was also hinted at https://bugs.python.org/issue26032#msg257653
msg363722 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-03-09 12:32
It is not so easy. There was reason why it was not done earlier. scandir() wastes more limited resource than memory -- file descriptors. It should also be properly closed and do not depend on the garbage collector.

Consider the example:

def traverse(path, visit):
    for child in path.iterdir():
        if child.is_dir():
            traverse(path, visit)
        else:
            visit(child)

With your optimization it may fail with OSError: Too many open files.
msg363734 - (view) Author: Barney Gale (barneygale) * Date: 2020-03-09 14:30
Ah, right you are! The globbing helpers call `list(os.scandir(...))` - perhaps we should do the same here?
msg363736 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-03-09 14:46
It would be slower and less reliable implementation of os.listdir().
msg363741 - (view) Author: Barney Gale (barneygale) * Date: 2020-03-09 14:56
Less reliable how? Doesn't appear any slower:

barney.gale@heilbron:~$ python3 -m timeit -s "import os; os.listdir('/usr/local')"
100000000 loops, best of 3: 0.0108 usec per loop
barney.gale@heilbron:~$ python3 -m timeit -s "import os; list(os.scandir('/usr/local'))"
100000000 loops, best of 3: 0.00919 usec per loop
msg363742 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2020-03-09 15:19
This is not how timeit works, you just measured the time taken by an empty loop, you can look at `python3 -m timeit -h` to get help how to call it. I think a correct invocation would be:

(venv) ➜  ~ python3 -m timeit -s 'from os import scandir' "list(scandir('/usr/local'))"
10000 loops, best of 5: 24.3 usec per loop
(venv) ➜  ~ python3 -m timeit -s 'from os import listdir' "listdir('/usr/local')"
10000 loops, best of 5: 22.2 usec per loop

so it looks like scandir as a small overhead when accumulating all results and not using the extra info it returns.
msg363753 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-03-09 17:07
> Less reliable how?

See issue39916.

> so it looks like scandir as a small overhead when accumulating all results and not using the extra info it returns.

Try with larger directories. The difference may be not so small.

$ python3 -m timeit -s 'from os import scandir' "list(scandir('/usr/include'))"
10000 loops, best of 3: 176 usec per loop
$ python3 -m timeit -s 'from os import listdir' "listdir('/usr/include')"
10000 loops, best of 3: 114 usec per loop
msg363763 - (view) Author: Barney Gale (barneygale) * Date: 2020-03-09 19:21
Thanks Rémi and Serhiy! Closing this ticket as the patch doesn't provide any sort of improvement.
History
Date User Action Args
2022-04-11 14:59:27adminsetgithub: 84088
2020-03-09 19:21:58barneygalesetmessages: + msg363763
2020-03-09 19:17:55barneygalesetstatus: open -> closed
resolution: not a bug
stage: patch review -> resolved
2020-03-09 17:07:40serhiy.storchakasetmessages: + msg363753
2020-03-09 15:19:36remi.lapeyresetnosy: + remi.lapeyre
messages: + msg363742
2020-03-09 14:56:02barneygalesetmessages: + msg363741
2020-03-09 14:46:13serhiy.storchakasetmessages: + msg363736
2020-03-09 14:30:19barneygalesetmessages: + msg363734
2020-03-09 12:32:54serhiy.storchakasetmessages: + msg363722
2020-03-09 12:03:14xtreaksetnosy: + xtreak, serhiy.storchaka, pitrou
messages: + msg363721
2020-03-09 00:22:05barneygalesetkeywords: + patch
stage: patch review
pull_requests: + pull_request18223
2020-03-09 00:17:57barneygalecreate