classification
Title: File descriptor leaks in os.scandir()
Type: resource usage Stage: resolved
Components: Extension Modules Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: benhoyt, gvanrossum, haypo, josh.r, martin.panter, python-dev, serhiy.storchaka
Priority: high Keywords: patch

Created on 2016-01-02 17:48 by serhiy.storchaka, last changed 2016-02-11 11:53 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
scandir_close.patch serhiy.storchaka, 2016-02-08 17:41 review
scandir_close_2.patch serhiy.storchaka, 2016-02-09 15:14 review
scandir_close_3.patch serhiy.storchaka, 2016-02-09 18:07 review
scandir_close_4.patch serhiy.storchaka, 2016-02-10 09:03 review
Messages (27)
msg257351 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-01-02 17:48
os.scandir() opens a file descriptor and closes it only in its destructor. This can causes file descriptor leaks in Python implementations without reference counting and if the scandir iterator becomes a part of reference loop or long living object. Since the number of open file descriptors is limited, this can leads to problems.

We need to add the close() method to the scandir iterator (as in files and generators). It would be useful also to make it a context manager.

In 3.5 we have to add a warning about this behavior.
msg257377 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-01-02 22:26
Hi,

If I recall correctly, this issue was discussed in the long review of os.scandir(): issue #22524.


"os.scandir() opens a file descriptor and closes it only in its destructor."

Hopefully, it's incorrect: it's closed when the iterator is exhausted. See how ScandirIterator_close() is used.


"This can causes file descriptor leaks in Python implementations without reference counting"

Destructors are part of the Python language. Are you aware of a Python implementation *never* calls destructors? I know that PyPy can call destructors "later", not necessary when the last reference to the object is removed. Do you think that we may reach the file descriptor limit because of that?


"We need to add the close() method to the scandir iterator (as in files and generators)."

Is it part of the iterator protocol? Or do you mean to explicitly call close()?


"It would be useful also to make it a context manager."

If we decide to add a close() method, it like the idea of also supporting the context manager protocol.


"In 3.5 we have to add a warning about this behavior."

Yeah, maybe we can elaborate the doc to explain how the file descriptor / Windows handle is used.
msg257391 - (view) Author: Ben Hoyt (benhoyt) * Date: 2016-01-03 01:24
I'm not sure this is actually a leak, because (looking at the code) ScandirIterator_close() is called not just in the destructor, but also at the end of iterating, just before raising StopIteration. Is the issue that if an exception is raised or you stop iterating before the end, then it's less determined when the destructor/close is called?

I think we could fairly easily add an explicit close method to the iterator and make it a context manager (add __enter__ and __exit__ methods to the iterator). Is this what you're thinking of:

    # manual close
    it = scandir.scandir(path)
    first_entry = next(it)
    it.close()

    with scandir.scandir(path) as it:
        first_entry = next(it)
msg257398 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-01-03 06:31
Yes, it's what I mean. Add methods close, __enter__ and __exit__ to the iterator. The scandir iterator is not just iterator, it is like file object.

And as in file object, we perhaps have to emit a ResourceWarning in the destructor if close() or __exit__() were not called.
msg258017 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-01-11 21:08
Note there's also a nasty corner case related to generators and GC. If a generator contains a with-block or finally-clause, and the generator is not run until its end because the caller hit an exception on one of the items returned, and the generator object is somehow kept alive (either because it's stored in some longer-living state or because the Python implementation doesn't use reference counting) then the close() call in the finally-clause or in the __exit__ method doesn't run until the generator object is garbage-collected.

IOW even making it a context manager doesn't completely solve this issue (though it can certainly help).
msg258040 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-11 23:52
Guido are you saying in the following code, the “finally” message is not guaranteed to be printed out? Or just that you cannot limit a ResourceWarning to garbage collection?

def g():
    try:
        yield "item"
    finally:
        # Run at exhaustion, close(), and garbage collection
        print("finally")

gi = g()
try:
    item = next(gi)
    print(item / 2)  # Oops, TypeError
finally:
    # Should be run as the exception passes through. GeneratorExit is raised inside the generator, causing the other “finally” block to execute. All before the original exception is caught and a traceback is printed.
    gi.close()

But as I understand it, os.scandir() is not a generator, so none of these problems would apply.
msg258045 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-01-12 00:40
It was a bit more subtle. I think like this:

def f():
    with some_lock:
        yield 0
        yield 1

def g():
    with another_lock:
        it = f()
        for i in it:
            raise

We determined that another_lock was freed *before* some_lock.  This is because the local variable 'it' keeps the generator object returned by f() alive until after the with-block in g() has released another_lock.

I think this does not strictly require f to be a generator -- it's any kind of closable resource that only gets closed by its destructor.

The solution is to add a try/finally that calls it.close() before leaving the with-block in g().

Hope this helps.
msg258047 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-12 01:30
Okay I understand. You have to explicitly call the close() method if you want a generator to be cleaned up properly, which parallels how Serhiy’s proposal would be used.
msg258233 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-14 22:14
I definitely support adding a close() method, and a ResourceWarning if the iterator is not exhausted when it is deallocated. Adding context manager support is probably worthwhile as well, though there is one disadvantage: it would be hard to implement scandir() as a plain generator, because generators don’t support the context manager protocol. But C Python’s implementation is in C, and even in native Python you could implement it as a custom iterator class.

I suggest using this issue for adding the new API(s), presumably to 3.6 only, and Issue 26111 for adding a warning to the existing 3.5 documentation.
msg258239 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-01-14 22:34
If scandir() is implemented as native Python generator (see for example issue25911), it could easily be converted to context manager:

def scandir(path):
    return contextlib.closing(native_scandir(path))

def native_scandir(path):
    ...
    yield ...
msg258242 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-01-14 22:39
Since the close() method can be added only in 3.6, but we have leaks in 3.5 too, I suggest to closed file descriptor if error happened during iteration (issue26117). This will allow to write safe code even in 3.5.
msg258244 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-14 22:49
Contextlib.closing() cannot be used in general, because it doesn’t inherit the iterator protocol from the wrapped generator. So I think you really need a class that implements __exit__(), __iter__(), etc at the same time.
msg259867 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-08 17:41
Proposed patch adds the close() methon and the support of the context manager protocol for the os.scandir class.
msg259891 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-02-08 22:14
Context manager protocol, close() method: it looks more and more like a file. In this case, I suggest to emit a ResourceWarning in the destructor if it's not closed explicitly. It mean that the scandir() must always be used with "with" or at least that close() is always closed.

We probably need to update the code to be more explicitly on that.
msg259901 - (view) Author: Josh Rosenberg (josh.r) * Date: 2016-02-09 00:01
Adding a ResourceWarning even if the generator is run to completion? That seems... dev hostile. I mean, yes, probably best to document it as best practice to use with with statement, but something simple like `files = sorted(os.scandir('.'), key=lambda x: x.stat().st_mtime)` to get files ordered by modification time (which cleanly runs the generator to exhaustion immediately) should not be raising ResourceWarning in 3.6 when it didn't do so in 3.5, and has minimal risk of leaking in any event.
msg259904 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-02-09 01:45
I would be in favour of adding a ResourceWarning in 3.6 if the iterator is garbage collected without being exhausted. But as Josh says, it might be overkill emitting a warning when we already know the iterator has been finished and cleaned up.
msg259920 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-02-09 09:18
Josh Rosenberg added the comment:
> Adding a ResourceWarning even if the generator is run to completion?

I'm ok to only emit the warning is the generator is not exhausted.

The warning would be emited in the destructor if the generator is not
closed. In practice, the generator is *already* closed on the last
file, so the warning would not be emited if the generator is
exhausted.
msg259937 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-09 15:14
Updated patch adds a resource warning, tests, improves the documentation, and addresses other comments.
msg259939 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-02-09 15:50
> Updated patch adds a resource warning, tests, improves the documentation, and addresses other comments.

Review sent.
msg259947 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-09 18:07
Updated patch addresses new Victor's comments.
msg259968 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-02-10 02:46
About testing that list(iterator) is empty after the iterator is closed, IMO this is an implementation detail. It would be equally (or more) sensible to raise an exception, like proposed for “async def” coroutines in Issue 25887. I suppose the main purpose of those tests is to ensure there is no leakage or warning, so maybe add some more filterwarnings() checks in those tests instead?

I left some other review comments.
msg259996 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-10 09:03
Updated patch addresses Martins comments.
msg259998 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-02-10 09:25
scandir_close_4.patch LGTM, I just added a comment on test_os.py, you may factorize the code.
msg260041 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-02-10 21:57
> I just added a comment on test_os.py, you may factorize the code.

Ah, Serhiy created the issue #26325 for that. Nice.

It's up to you to commit this change, or #26325 first, both are good now ;)
msg260087 - (view) Author: Roundup Robot (python-dev) Date: 2016-02-11 11:21
New changeset f98ed0616d07 by Serhiy Storchaka in branch 'default':
Issue #25994: Added the close() method and the support of the context manager
https://hg.python.org/cpython/rev/f98ed0616d07
msg260094 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-11 11:41
Committed with using the helper from issue26325 for testing.

Thank you all for your reviews, especially for help with the documentation.
msg260096 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-02-11 11:53
> Committed with using the helper from issue26325 for testing.

Great! Good job.
History
Date User Action Args
2016-02-11 11:53:27hayposetmessages: + msg260096
2016-02-11 11:41:09serhiy.storchakasetstatus: open -> closed
messages: + msg260094

assignee: serhiy.storchaka
resolution: fixed
stage: patch review -> resolved
2016-02-11 11:21:54python-devsetnosy: + python-dev
messages: + msg260087
2016-02-10 21:57:49hayposetmessages: + msg260041
2016-02-10 09:25:30hayposetmessages: + msg259998
2016-02-10 09:03:30serhiy.storchakasetfiles: + scandir_close_4.patch

messages: + msg259996
2016-02-10 02:46:25martin.pantersetmessages: + msg259968
2016-02-09 18:07:17serhiy.storchakasetfiles: + scandir_close_3.patch

messages: + msg259947
2016-02-09 15:50:47hayposetmessages: + msg259939
2016-02-09 15:14:03serhiy.storchakasetfiles: + scandir_close_2.patch

messages: + msg259937
2016-02-09 09:18:36hayposetmessages: + msg259920
2016-02-09 01:45:36martin.pantersetmessages: + msg259904
2016-02-09 00:01:11josh.rsetnosy: + josh.r
messages: + msg259901
2016-02-08 22:14:22hayposetmessages: + msg259891
2016-02-08 17:41:33serhiy.storchakasetfiles: + scandir_close.patch
versions: - Python 3.5
messages: + msg259867

keywords: + patch
stage: patch review
2016-01-14 22:49:22martin.pantersetmessages: + msg258244
2016-01-14 22:39:37serhiy.storchakasetmessages: + msg258242
2016-01-14 22:34:59serhiy.storchakasetmessages: + msg258239
2016-01-14 22:14:16martin.pantersetmessages: + msg258233
2016-01-14 21:35:09martin.panterlinkissue26111 dependencies
2016-01-12 01:30:53martin.pantersetmessages: + msg258047
2016-01-12 00:40:11gvanrossumsetmessages: + msg258045
2016-01-11 23:52:45martin.pantersetnosy: + martin.panter
messages: + msg258040
2016-01-11 21:08:07gvanrossumsetnosy: + gvanrossum
messages: + msg258017
2016-01-07 12:46:07serhiy.storchakalinkissue26032 dependencies
2016-01-03 08:23:29serhiy.storchakalinkissue25596 dependencies
2016-01-03 06:31:51serhiy.storchakasetmessages: + msg257398
2016-01-03 01:24:46benhoytsetmessages: + msg257391
2016-01-02 22:26:20hayposetmessages: + msg257377
2016-01-02 17:48:16serhiy.storchakacreate