msg257352 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-01-02 17:57 |
Since 3.5 os.walk() consumes a lot of file descriptors. Its number equals to the deep of directories tree. Since the number of file descriptors is limited, this can cause problems.
This was the main reason for rejecting fwalk-based implementation of os.walk() (issue15200).
|
msg257404 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-01-03 08:48 |
Here are two alternative solutions for this issue.
|
msg258018 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2016-01-11 21:10 |
I don't like the first solution (it just collects all scandir() results in a list, which defeats one of the purposes of using scandir()), but I don't understand the second solution.
Ideally the number of FDs used should be limited by the depth of the tree being walked, right?
|
msg258022 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-01-11 21:35 |
Both patches are basically equivalent. The first one collects all scandir() results in a list, the second one collects only directory names in a list. The purpose of using os.scandir() in os.walk() was a speed up (issue23605), and both patches preserve it.
Yes, the number of FDs used is equivalent to the depth of the tree which can be very deep (I just created a tree depth of 1000 levels). And what is worse, all these FDs can be effectively leaked if the walking was not finished. This is unwanted behavior change.
|
msg258025 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2016-01-11 21:42 |
I am all for preventing the leaks. But using FDs proportional to the tree
depth seems reasonable to me. (If you are worried about some kind of DoS
attack on the algorithm by someone who can build a tree with depth 1000,
well, if they can do that they can also create a flat folder with a million
files in it.)
Is there a potential hybrid strategy, where for small directories we close
the FD but for large directories we keep it open?
|
msg258096 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-01-12 10:06 |
> But using FDs proportional to the tree depth seems reasonable to me.
This was the main reason for rejecting issue15200. May be we can allow this for globbing, but only explicitly enabled with special parameter or separate functions. The benefit for globbing is starting to yield results without internal caching. The benefit for os.walk() is smaller, because in any case it yields files and directories not one-by-one, but collected into lists.
> Is there a potential hybrid strategy, where for small directories we close
> the FD but for large directories we keep it open?
May be with more complicated code. I think this needs explicit close() method, so it can go only in 3.6. For now I think we have to commit one of my patches (the difference only stylistic).
|
msg258116 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2016-01-12 18:36 |
I like them both, if I had to choose I'd pick patch 2.
But yes, we need to add a close() method to the scandir iterator object.
In the meantime, I am still worried about what would happen if somehow the loop got interrupted and the frame got kept alive and the iterator wasn't closed by its dealloc until much later. This kind of thing was common in asyncio and we had to resort to similar tricks to break some cycles. Maybe you can add a try/finally that *deletes* scandir_it to force it to close itself (at least in CPython)? That can go into 3.5.
|
msg258238 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-01-14 22:26 |
Yes, OSError, MemoryError and KeyboardInterrupt can be raised during iterating scandir object or calling is_dir() or is_symlink() methods of the result. They stop iterating and left file descriptor open. If close file descriptor on error during iteration (issue26117), patch 1 becomes more safe, because it minimizes the lifetime of opened descriptor.
|
msg260089 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2016-02-11 11:32 |
New changeset e951d76f1945 by Serhiy Storchaka in branch '3.5':
Issue #25995: os.walk() no longer uses FDs proportional to the tree depth.
https://hg.python.org/cpython/rev/e951d76f1945
New changeset 6197a09a56b1 by Serhiy Storchaka in branch 'default':
Issue #25995: os.walk() no longer uses FDs proportional to the tree depth.
https://hg.python.org/cpython/rev/6197a09a56b1
|
msg260095 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-02-11 11:43 |
walk_consume_fds_opt1.patch is applied in 3.5 (it is more reliable with the absence of close()) and walk_consume_fds_opt2.patch is applied in 3.6.
|
msg260784 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2016-02-24 10:40 |
The change seems to be causing some Windows buildbot failures <http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.5/builds/666/steps/test/logs/stdio>:
======================================================================
ERROR: test_walk_bad_dir (test.test_os.BytesWalkTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\buildbot.python.org\3.5.kloth-win64\build\lib\test\test_os.py", line 931, in test_walk_bad_dir
root, dirs, files = next(walk_it)
File "C:\buildbot.python.org\3.5.kloth-win64\build\lib\test\test_os.py", line 1027, in walk
for broot, bdirs, bfiles in os.walk(os.fsencode(top), **kwargs):
File "C:\buildbot.python.org\3.5.kloth-win64\build\lib\os.py", line 372, in walk
entries = list(scandir(top))
TypeError: os.scandir() doesn't support bytes path on Windows, use Unicode instead
======================================================================
ERROR: test_walk_bottom_up (test.test_os.BytesWalkTests)
ERROR: test_walk_prune (test.test_os.BytesWalkTests)
Was this line
entries = list(scandir(top))
meant to be
entries = list(scandir_it)
|
msg260788 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2016-02-24 11:05 |
New changeset 9bffe39e8273 by Serhiy Storchaka in branch '3.5':
Fixed a bug in os.walk() with bytes path on Windows caused by merging fixes
https://hg.python.org/cpython/rev/9bffe39e8273
|
msg260789 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-02-24 11:05 |
Thanks Martin!
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:25 | admin | set | github: 70183 |
2016-02-24 11:05:58 | serhiy.storchaka | set | status: open -> closed
messages:
+ msg260789 |
2016-02-24 11:05:05 | python-dev | set | messages:
+ msg260788 |
2016-02-24 10:40:27 | martin.panter | set | status: closed -> open nosy:
+ martin.panter messages:
+ msg260784
|
2016-02-11 11:43:04 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages:
+ msg260095
stage: patch review -> resolved |
2016-02-11 11:32:19 | python-dev | set | nosy:
+ python-dev messages:
+ msg260089
|
2016-02-08 15:59:59 | serhiy.storchaka | set | assignee: serhiy.storchaka |
2016-01-14 22:26:51 | serhiy.storchaka | set | messages:
+ msg258238 |
2016-01-12 18:36:17 | gvanrossum | set | messages:
+ msg258116 |
2016-01-12 10:06:48 | serhiy.storchaka | set | messages:
+ msg258096 |
2016-01-11 21:42:12 | gvanrossum | set | messages:
+ msg258025 |
2016-01-11 21:35:22 | serhiy.storchaka | set | messages:
+ msg258022 |
2016-01-11 21:10:38 | gvanrossum | set | nosy:
+ gvanrossum messages:
+ msg258018
|
2016-01-03 08:49:03 | serhiy.storchaka | set | files:
+ walk_consume_fds_opt2.patch |
2016-01-03 08:48:40 | serhiy.storchaka | set | files:
+ walk_consume_fds_opt1.patch keywords:
+ patch messages:
+ msg257404
stage: patch review |
2016-01-02 17:57:05 | serhiy.storchaka | create | |