msg257351 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-01-02 17:48 |
os.scandir() opens a file descriptor and closes it only in its destructor. This can causes file descriptor leaks in Python implementations without reference counting and if the scandir iterator becomes a part of reference loop or long living object. Since the number of open file descriptors is limited, this can leads to problems.
We need to add the close() method to the scandir iterator (as in files and generators). It would be useful also to make it a context manager.
In 3.5 we have to add a warning about this behavior.
|
msg257377 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-01-02 22:26 |
Hi,
If I recall correctly, this issue was discussed in the long review of os.scandir(): issue #22524.
"os.scandir() opens a file descriptor and closes it only in its destructor."
Hopefully, it's incorrect: it's closed when the iterator is exhausted. See how ScandirIterator_close() is used.
"This can causes file descriptor leaks in Python implementations without reference counting"
Destructors are part of the Python language. Are you aware of a Python implementation *never* calls destructors? I know that PyPy can call destructors "later", not necessary when the last reference to the object is removed. Do you think that we may reach the file descriptor limit because of that?
"We need to add the close() method to the scandir iterator (as in files and generators)."
Is it part of the iterator protocol? Or do you mean to explicitly call close()?
"It would be useful also to make it a context manager."
If we decide to add a close() method, it like the idea of also supporting the context manager protocol.
"In 3.5 we have to add a warning about this behavior."
Yeah, maybe we can elaborate the doc to explain how the file descriptor / Windows handle is used.
|
msg257391 - (view) |
Author: Ben Hoyt (benhoyt) * |
Date: 2016-01-03 01:24 |
I'm not sure this is actually a leak, because (looking at the code) ScandirIterator_close() is called not just in the destructor, but also at the end of iterating, just before raising StopIteration. Is the issue that if an exception is raised or you stop iterating before the end, then it's less determined when the destructor/close is called?
I think we could fairly easily add an explicit close method to the iterator and make it a context manager (add __enter__ and __exit__ methods to the iterator). Is this what you're thinking of:
# manual close
it = scandir.scandir(path)
first_entry = next(it)
it.close()
with scandir.scandir(path) as it:
first_entry = next(it)
|
msg257398 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-01-03 06:31 |
Yes, it's what I mean. Add methods close, __enter__ and __exit__ to the iterator. The scandir iterator is not just iterator, it is like file object.
And as in file object, we perhaps have to emit a ResourceWarning in the destructor if close() or __exit__() were not called.
|
msg258017 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2016-01-11 21:08 |
Note there's also a nasty corner case related to generators and GC. If a generator contains a with-block or finally-clause, and the generator is not run until its end because the caller hit an exception on one of the items returned, and the generator object is somehow kept alive (either because it's stored in some longer-living state or because the Python implementation doesn't use reference counting) then the close() call in the finally-clause or in the __exit__ method doesn't run until the generator object is garbage-collected.
IOW even making it a context manager doesn't completely solve this issue (though it can certainly help).
|
msg258040 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2016-01-11 23:52 |
Guido are you saying in the following code, the “finally” message is not guaranteed to be printed out? Or just that you cannot limit a ResourceWarning to garbage collection?
def g():
try:
yield "item"
finally:
# Run at exhaustion, close(), and garbage collection
print("finally")
gi = g()
try:
item = next(gi)
print(item / 2) # Oops, TypeError
finally:
# Should be run as the exception passes through. GeneratorExit is raised inside the generator, causing the other “finally” block to execute. All before the original exception is caught and a traceback is printed.
gi.close()
But as I understand it, os.scandir() is not a generator, so none of these problems would apply.
|
msg258045 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2016-01-12 00:40 |
It was a bit more subtle. I think like this:
def f():
with some_lock:
yield 0
yield 1
def g():
with another_lock:
it = f()
for i in it:
raise
We determined that another_lock was freed *before* some_lock. This is because the local variable 'it' keeps the generator object returned by f() alive until after the with-block in g() has released another_lock.
I think this does not strictly require f to be a generator -- it's any kind of closable resource that only gets closed by its destructor.
The solution is to add a try/finally that calls it.close() before leaving the with-block in g().
Hope this helps.
|
msg258047 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2016-01-12 01:30 |
Okay I understand. You have to explicitly call the close() method if you want a generator to be cleaned up properly, which parallels how Serhiy’s proposal would be used.
|
msg258233 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2016-01-14 22:14 |
I definitely support adding a close() method, and a ResourceWarning if the iterator is not exhausted when it is deallocated. Adding context manager support is probably worthwhile as well, though there is one disadvantage: it would be hard to implement scandir() as a plain generator, because generators don’t support the context manager protocol. But C Python’s implementation is in C, and even in native Python you could implement it as a custom iterator class.
I suggest using this issue for adding the new API(s), presumably to 3.6 only, and Issue 26111 for adding a warning to the existing 3.5 documentation.
|
msg258239 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-01-14 22:34 |
If scandir() is implemented as native Python generator (see for example issue25911), it could easily be converted to context manager:
def scandir(path):
return contextlib.closing(native_scandir(path))
def native_scandir(path):
...
yield ...
|
msg258242 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-01-14 22:39 |
Since the close() method can be added only in 3.6, but we have leaks in 3.5 too, I suggest to closed file descriptor if error happened during iteration (issue26117). This will allow to write safe code even in 3.5.
|
msg258244 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2016-01-14 22:49 |
Contextlib.closing() cannot be used in general, because it doesn’t inherit the iterator protocol from the wrapped generator. So I think you really need a class that implements __exit__(), __iter__(), etc at the same time.
|
msg259867 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-02-08 17:41 |
Proposed patch adds the close() methon and the support of the context manager protocol for the os.scandir class.
|
msg259891 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-02-08 22:14 |
Context manager protocol, close() method: it looks more and more like a file. In this case, I suggest to emit a ResourceWarning in the destructor if it's not closed explicitly. It mean that the scandir() must always be used with "with" or at least that close() is always closed.
We probably need to update the code to be more explicitly on that.
|
msg259901 - (view) |
Author: Josh Rosenberg (josh.r) * |
Date: 2016-02-09 00:01 |
Adding a ResourceWarning even if the generator is run to completion? That seems... dev hostile. I mean, yes, probably best to document it as best practice to use with with statement, but something simple like `files = sorted(os.scandir('.'), key=lambda x: x.stat().st_mtime)` to get files ordered by modification time (which cleanly runs the generator to exhaustion immediately) should not be raising ResourceWarning in 3.6 when it didn't do so in 3.5, and has minimal risk of leaking in any event.
|
msg259904 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2016-02-09 01:45 |
I would be in favour of adding a ResourceWarning in 3.6 if the iterator is garbage collected without being exhausted. But as Josh says, it might be overkill emitting a warning when we already know the iterator has been finished and cleaned up.
|
msg259920 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-02-09 09:18 |
Josh Rosenberg added the comment:
> Adding a ResourceWarning even if the generator is run to completion?
I'm ok to only emit the warning is the generator is not exhausted.
The warning would be emited in the destructor if the generator is not
closed. In practice, the generator is *already* closed on the last
file, so the warning would not be emited if the generator is
exhausted.
|
msg259937 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-02-09 15:14 |
Updated patch adds a resource warning, tests, improves the documentation, and addresses other comments.
|
msg259939 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-02-09 15:50 |
> Updated patch adds a resource warning, tests, improves the documentation, and addresses other comments.
Review sent.
|
msg259947 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-02-09 18:07 |
Updated patch addresses new Victor's comments.
|
msg259968 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2016-02-10 02:46 |
About testing that list(iterator) is empty after the iterator is closed, IMO this is an implementation detail. It would be equally (or more) sensible to raise an exception, like proposed for “async def” coroutines in Issue 25887. I suppose the main purpose of those tests is to ensure there is no leakage or warning, so maybe add some more filterwarnings() checks in those tests instead?
I left some other review comments.
|
msg259996 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-02-10 09:03 |
Updated patch addresses Martins comments.
|
msg259998 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-02-10 09:25 |
scandir_close_4.patch LGTM, I just added a comment on test_os.py, you may factorize the code.
|
msg260041 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-02-10 21:57 |
> I just added a comment on test_os.py, you may factorize the code.
Ah, Serhiy created the issue #26325 for that. Nice.
It's up to you to commit this change, or #26325 first, both are good now ;)
|
msg260087 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2016-02-11 11:21 |
New changeset f98ed0616d07 by Serhiy Storchaka in branch 'default':
Issue #25994: Added the close() method and the support of the context manager
https://hg.python.org/cpython/rev/f98ed0616d07
|
msg260094 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-02-11 11:41 |
Committed with using the helper from issue26325 for testing.
Thank you all for your reviews, especially for help with the documentation.
|
msg260096 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-02-11 11:53 |
> Committed with using the helper from issue26325 for testing.
Great! Good job.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:25 | admin | set | github: 70182 |
2016-02-11 11:53:27 | vstinner | set | messages:
+ msg260096 |
2016-02-11 11:41:09 | serhiy.storchaka | set | status: open -> closed messages:
+ msg260094
assignee: serhiy.storchaka resolution: fixed stage: patch review -> resolved |
2016-02-11 11:21:54 | python-dev | set | nosy:
+ python-dev messages:
+ msg260087
|
2016-02-10 21:57:49 | vstinner | set | messages:
+ msg260041 |
2016-02-10 09:25:30 | vstinner | set | messages:
+ msg259998 |
2016-02-10 09:03:30 | serhiy.storchaka | set | files:
+ scandir_close_4.patch
messages:
+ msg259996 |
2016-02-10 02:46:25 | martin.panter | set | messages:
+ msg259968 |
2016-02-09 18:07:17 | serhiy.storchaka | set | files:
+ scandir_close_3.patch
messages:
+ msg259947 |
2016-02-09 15:50:47 | vstinner | set | messages:
+ msg259939 |
2016-02-09 15:14:03 | serhiy.storchaka | set | files:
+ scandir_close_2.patch
messages:
+ msg259937 |
2016-02-09 09:18:36 | vstinner | set | messages:
+ msg259920 |
2016-02-09 01:45:36 | martin.panter | set | messages:
+ msg259904 |
2016-02-09 00:01:11 | josh.r | set | nosy:
+ josh.r messages:
+ msg259901
|
2016-02-08 22:14:22 | vstinner | set | messages:
+ msg259891 |
2016-02-08 17:41:33 | serhiy.storchaka | set | files:
+ scandir_close.patch versions:
- Python 3.5 messages:
+ msg259867
keywords:
+ patch stage: patch review |
2016-01-14 22:49:22 | martin.panter | set | messages:
+ msg258244 |
2016-01-14 22:39:37 | serhiy.storchaka | set | messages:
+ msg258242 |
2016-01-14 22:34:59 | serhiy.storchaka | set | messages:
+ msg258239 |
2016-01-14 22:14:16 | martin.panter | set | messages:
+ msg258233 |
2016-01-14 21:35:09 | martin.panter | link | issue26111 dependencies |
2016-01-12 01:30:53 | martin.panter | set | messages:
+ msg258047 |
2016-01-12 00:40:11 | gvanrossum | set | messages:
+ msg258045 |
2016-01-11 23:52:45 | martin.panter | set | nosy:
+ martin.panter messages:
+ msg258040
|
2016-01-11 21:08:07 | gvanrossum | set | nosy:
+ gvanrossum messages:
+ msg258017
|
2016-01-07 12:46:07 | serhiy.storchaka | link | issue26032 dependencies |
2016-01-03 08:23:29 | serhiy.storchaka | link | issue25596 dependencies |
2016-01-03 06:31:51 | serhiy.storchaka | set | messages:
+ msg257398 |
2016-01-03 01:24:46 | benhoyt | set | messages:
+ msg257391 |
2016-01-02 22:26:20 | vstinner | set | messages:
+ msg257377 |
2016-01-02 17:48:16 | serhiy.storchaka | create | |