Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible resource leak in glob in non-refcount implementations #88648

Closed
serhiy-storchaka opened this issue Jun 22, 2021 · 10 comments
Closed

Possible resource leak in glob in non-refcount implementations #88648

serhiy-storchaka opened this issue Jun 22, 2021 · 10 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@serhiy-storchaka
Copy link
Member

BPO 44482
Nosy @gvanrossum, @terryjreedy, @serhiy-storchaka, @gvanrossum, @miss-islington
PRs
  • bpo-44482: Fix very unlikely resource leak in glob in non-CPython implementations #26843
  • [3.10] bpo-44482: Fix very unlikely resource leak in glob in non-CPython implementations (GH-26843) #26872
  • [3.9] bpo-44482: Fix very unlikely resource leak in glob in non-CPython implementations (GH-26843). #26916
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2021-06-22.05:40:18.277>
    labels = ['3.11', 'library', '3.9', '3.10', 'performance']
    title = 'Possible resource leeak in glob in non-refcount implementations'
    updated_at = <Date 2021-06-27.11:28:49.507>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2021-06-27.11:28:49.507>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2021-06-22.05:40:18.277>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 44482
    keywords = ['patch']
    message_count = 9.0
    messages = ['396306', '396358', '396397', '396402', '396406', '396423', '396553', '396554', '396573']
    nosy_count = 5.0
    nosy_names = ['gvanrossum', 'terry.reedy', 'serhiy.storchaka', 'Guido.van.Rossum', 'miss-islington']
    pr_nums = ['26843', '26872', '26916']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue44482'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @serhiy-storchaka
    Copy link
    Member Author

    There is possible resource leak in glob on alternate Python implementation witch do not use reference counting to immediate releasing resources.

    It is in the code

        names = list(_iterdir(dirname, dir_fd, dironly))

    where _iterdir() is a generator function which calls os.scandir() and yields entry names after some filtering. If an exception happens inside _iterdir(), the scandir iterator will be immediately closed, because of using the with statement. But an exception can happens outside of _iterdir(), in the list constructor (MemoryError). In this case the generator will be closed immediately in CPython because of reference counting (newly created generator has only one reference), but on other implementation it can be deferred on arbitrary time. And even in CPython we rely on garbage collector if there reference loops in the generator frame. This issue has very small chance to occur but still.

    The right way is to close the generator explicitly:

        it = _iterdir(dirname, dir_fd, dironly)
        try:
            names = list(it)
        finally:
            it.close()

    or

        with contextlib.closing(_iterdir(dirname, dir_fd, dironly)) as it:
            names = list(it)

    We should analyze all other generator functions which acquire some resources and ensure that they are always properly closed.

    @serhiy-storchaka serhiy-storchaka added 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir performance Performance or resource usage labels Jun 22, 2021
    @gvanrossum
    Copy link
    Member

    So this is a problem because the generator owns a resource that it will only release once its .close() method is called, right? And we have no control over when that happens -- we can't make it the responsibility of list() to close the iterator passed into it. It is indeed a painful thing. I wonder if we should put a warning somewhere in the docs and tutorials for generators?

    The PR looks good.

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset 5c79402 by Serhiy Storchaka in branch 'main':
    bpo-44482: Fix very unlikely resource leak in glob in non-CPython implementations (GH-26843)
    5c79402

    @serhiy-storchaka
    Copy link
    Member Author

    Yes. I added you Guido to the nosy list to attract attention to a general issue. If the generator owns resources, it should be guaranteed closed. Since generator do not support the context manager protocol, we need to use closing().

    Using generators is common in the code on my work, so I am going to revise all uses of resource-owning generators in my work code, in the stdlib and in common libraries like aiohttp.

    May be we will add a decorator for generator functions which would add support of the context manager protocol. Or make generator objects supporting the context manager protocol by default. Or make the with statement fall back to the close() method by default. Or even add special syntax shortcut for combining "with" and "for" if this idiom will be common enough.

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset 38e021a by Miss Islington (bot) in branch '3.10':
    bpo-44482: Fix very unlikely resource leak in glob in non-CPython implementations (GH-26843) (GH-26872)
    38e021a

    @gvanrossum
    Copy link
    Member

    I think this may be worth bringing up on python-dev. The solution currently is rather verbose. I like adding the with-protocol to generators.

    @terryjreedy
    Copy link
    Member

    Nick Coughlin explained on bpo-13814 msg151763 why he thought that making generators be context managers could/should not be done.

    "Generators deliberately don't support the context management protocol. This is so that they raise an explicit TypeError or AttributeError (pointing out that __exit__ is missing) if you leave out the @contextmanager decorator when you're using a generator to write an actual context manager.

    Generators supporting the context management protocol natively would turn that into a far more subtle (and confusing) error: your code would silently fail to invoke the generator body.

    Ensuring this common error remains easy to detect is far more important than making it easier to invoke close() on a generator object (particularly when contextlib.closing() already makes that very easy)."

    The new entry in the Design FAQ is this:

    "Why don’t generators support the with statement?

    For technical reasons, a generator used directly as a context manager would not work correctly. When, as is most common, a generator is used as an iterator run to completion, no closing is needed. When it is, wrap it as “contextlib.closing(generator)” in the ‘with’ statment."

    If Nick's explanation is not currently correct, the above should be removed. If it is, perhaps further doc is needed.

    @gvanrossum
    Copy link
    Member

    That explanation is still valid, I just hadn’t thought of it that way.

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset 4861fda by Serhiy Storchaka in branch '3.9':
    [3.9] bpo-44482: Fix very unlikely resource leak in glob in non-CPython implementations (GH-26843). (GH-26916)
    4861fda

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @kumaraditya303
    Copy link
    Contributor

    Closing as the PR was merged.

    @serhiy-storchaka serhiy-storchaka changed the title Possible resource leeak in glob in non-refcount implementations Possible resource leak in glob in non-refcount implementations Jun 27, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    Development

    No branches or pull requests

    4 participants