classification
Title: Possible resource leeak in glob in non-refcount implementations
Type: resource usage Stage: patch review
Components: Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Guido.van.Rossum, gvanrossum, miss-islington, serhiy.storchaka, terry.reedy
Priority: normal Keywords: patch

Created on 2021-06-22 05:40 by serhiy.storchaka, last changed 2021-06-27 11:28 by serhiy.storchaka.

Pull Requests
URL Status Linked Edit
PR 26843 merged serhiy.storchaka, 2021-06-22 05:44
PR 26872 merged miss-islington, 2021-06-23 09:53
PR 26916 merged serhiy.storchaka, 2021-06-26 18:24
Messages (9)
msg396306 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-06-22 05:40
There is possible resource leak in glob on alternate Python implementation witch do not use reference counting to immediate releasing resources.

It is in the code

    names = list(_iterdir(dirname, dir_fd, dironly))

where _iterdir() is a generator function which calls os.scandir() and yields entry names after some filtering. If an exception happens inside _iterdir(), the scandir iterator will be immediately closed, because of using the with statement. But an exception can happens outside of _iterdir(), in the list constructor (MemoryError). In this case the generator will be closed immediately in CPython because of reference counting (newly created generator has only one reference), but on other implementation it can be deferred on arbitrary time. And even in CPython we rely on garbage collector if there reference loops in the generator frame. This issue has very small chance to occur but still.

The right way is to close the generator explicitly:

    it = _iterdir(dirname, dir_fd, dironly)
    try:
        names = list(it)
    finally:
        it.close()

or

    with contextlib.closing(_iterdir(dirname, dir_fd, dironly)) as it:
        names = list(it)

We should analyze all other generator functions which acquire some resources and ensure that they are always properly closed.
msg396358 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-06-22 19:10
So this is a problem because the generator owns a resource that it will only release once its .close() method is called, right? And we have no control over when that happens -- we can't make it the responsibility of list() to close the iterator passed into it. It is indeed a painful thing. I wonder if we should put a warning somewhere in the docs and tutorials for generators?

The PR looks good.
msg396397 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-06-23 09:53
New changeset 5c7940257e1f611e7284fd504887bd29a63d0a94 by Serhiy Storchaka in branch 'main':
bpo-44482: Fix very unlikely resource leak in glob in non-CPython implementations (GH-26843)
https://github.com/python/cpython/commit/5c7940257e1f611e7284fd504887bd29a63d0a94
msg396402 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-06-23 10:07
Yes. I added you Guido to the nosy list to attract attention to a general issue. If the generator owns resources, it should be guaranteed closed. Since generator do not support the context manager protocol, we need to use closing().

Using generators is common in the code on my work, so I am going to revise all uses of resource-owning generators in my work code, in the stdlib and in common libraries like aiohttp.

May be we will add a decorator for generator functions which would add support of the context manager protocol. Or make generator objects supporting the context manager protocol by default. Or make the with statement fall back to the close() method by default. Or even add special syntax shortcut for combining "with" and "for" if this idiom will be common enough.
msg396406 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-06-23 10:28
New changeset 38e021ab90b595945f15c59273c788f2f24053dc by Miss Islington (bot) in branch '3.10':
bpo-44482: Fix very unlikely resource leak in glob in non-CPython implementations (GH-26843) (GH-26872)
https://github.com/python/cpython/commit/38e021ab90b595945f15c59273c788f2f24053dc
msg396423 - (view) Author: Guido van Rossum (Guido.van.Rossum) Date: 2021-06-23 14:31
I think this may be worth bringing up on python-dev. The solution currently is rather verbose. I like adding the with-protocol to generators.
msg396553 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-06-26 14:54
Nick Coughlin explained on issue 13814 msg151763 why he thought that making generators be context managers could/should not be done.

"Generators deliberately don't support the context management protocol. This is so that they raise an explicit TypeError or AttributeError (pointing out that __exit__ is missing) if you leave out the @contextmanager decorator when you're using a generator to write an actual context manager.

Generators supporting the context management protocol natively would turn that into a far more subtle (and confusing) error: your code would silently fail to invoke the generator body.

Ensuring this common error remains easy to detect is far more important than making it easier to invoke close() on a generator object (particularly when contextlib.closing() already makes that very easy)."

The new entry in the Design FAQ is this:

"Why don’t generators support the with statement?

For technical reasons, a generator used directly as a context manager would not work correctly. When, as is most common, a generator is used as an iterator run to completion, no closing is needed. When it is, wrap it as “contextlib.closing(generator)” in the ‘with’ statment."

If Nick's explanation is not currently correct, the above should be removed.  If it is, perhaps further doc is needed.
msg396554 - (view) Author: Guido van Rossum (Guido.van.Rossum) Date: 2021-06-26 18:09
That explanation is still valid, I just hadn’t thought of it that way.
msg396573 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-06-27 11:28
New changeset 4861fdaf25f246eb9ee4e8161c15dad26efe895d by Serhiy Storchaka in branch '3.9':
[3.9] bpo-44482: Fix very unlikely resource leak in glob in non-CPython implementations (GH-26843). (GH-26916)
https://github.com/python/cpython/commit/4861fdaf25f246eb9ee4e8161c15dad26efe895d
History
Date User Action Args
2021-06-27 11:28:49serhiy.storchakasetmessages: + msg396573
2021-06-26 18:24:56serhiy.storchakasetpull_requests: + pull_request25489
2021-06-26 18:09:37Guido.van.Rossumsetmessages: + msg396554
2021-06-26 14:54:06terry.reedysetnosy: + terry.reedy
messages: + msg396553
2021-06-23 14:31:58Guido.van.Rossumsetnosy: + Guido.van.Rossum
messages: + msg396423
2021-06-23 10:28:12serhiy.storchakasetmessages: + msg396406
2021-06-23 10:07:39serhiy.storchakasetmessages: + msg396402
2021-06-23 09:53:48miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request25447
2021-06-23 09:53:40serhiy.storchakasetmessages: + msg396397
2021-06-22 19:10:18gvanrossumsetmessages: + msg396358
2021-06-22 05:44:50serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request25424
2021-06-22 05:40:18serhiy.storchakacreate