Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importlib: module.__spec__ leaks importlib namespaces (test_importlib leaked xxx references) #84231

Closed
vstinner opened this issue Mar 23, 2020 · 16 comments
Labels
3.9 only security fixes tests Tests in the Lib/test dir

Comments

@vstinner
Copy link
Member

BPO 40050
Nosy @brettcannon, @ncoghlan, @vstinner, @ericsnowcurrently, @corona10, @shihai1991
PRs
  • Revert "bpo-1635741: Port _weakref extension module to multiphase initialization (PEP 489) (GH-19084)" #19128
  • bpo-40050: Fix importlib._bootstrap_external #19135
  • bpo-40050: Rephrase NEWS entry #19148
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-03-24.17:15:43.538>
    created_at = <Date 2020-03-23.22:41:03.438>
    labels = ['tests', '3.9']
    title = 'importlib: module.__spec__ leaks  importlib namespaces (test_importlib leaked xxx references)'
    updated_at = <Date 2020-03-25.17:32:17.760>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2020-03-25.17:32:17.760>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-03-24.17:15:43.538>
    closer = 'vstinner'
    components = ['Tests']
    creation = <Date 2020-03-23.22:41:03.438>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 40050
    keywords = ['patch']
    message_count = 16.0
    messages = ['364905', '364910', '364911', '364914', '364924', '364925', '364926', '364929', '364930', '364931', '364932', '364946', '364947', '364949', '364953', '365005']
    nosy_count = 6.0
    nosy_names = ['brett.cannon', 'ncoghlan', 'vstinner', 'eric.snow', 'corona10', 'shihai1991']
    pr_nums = ['19128', '19135', '19148']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue40050'
    versions = ['Python 3.9']

    @vstinner
    Copy link
    Member Author

    https://buildbot.python.org/all/#/builders/206/builds/119

    test_importlib leaked [6303, 6299, 6303] references, sum=18905
    test_importlib leaked [2022, 2020, 2022] memory blocks, sum=6064

    Issue reported at: https://bugs.python.org/issue1635741#msg364845

    It seems like the regression was introduced by the following change:

    commit 8334f30 (HEAD)
    Author: Hai Shi <shihai1992@gmail.com>
    Date: Fri Mar 20 16:16:45 2020 +0800

    bpo-1635741: Port _weakref extension module to multiphase initialization (PEP-489) (GH-19084)
    

    @vstinner vstinner added 3.9 only security fixes tests Tests in the Lib/test dir labels Mar 23, 2020
    @vstinner
    Copy link
    Member Author

    New changeset 188078c by Victor Stinner in branch 'master':
    Revert "bpo-1635741: Port _weakref extension module to multiphase initialization (PEP-489) (GH-19084)" (bpo-19128)
    188078c

    @vstinner
    Copy link
    Member Author

    I can reproduce the leak with the following test, stored as Lib/test/test_leak.py:
    ---

    from test import support
    import unittest
    
    class Tests(unittest.TestCase):
        def test_import_fresh(self):
            support.import_fresh_module('importlib.machinery',
                fresh=('importlib',),
                blocked=('_frozen_importlib', '_frozen_importlib_external'))

    Extract of "./python -m test -R 3:3 test_leak" output:
    ---
    test_leak leaked [6303, 6299, 6303] references, sum=18905
    test_leak leaked [2022, 2020, 2022] memory blocks, sum=6064
    ---

    @vstinner
    Copy link
    Member Author

    Before commit 8334f30, at the first _imp.create_builtin() calls, it calls _PyImport_FixupExtensionObject() which stores the created module in an internal "extensions" dictionary. PyInit__weakref() returns a module object. Next calls to _imp.create_builtin("_weakref") simply returned the cached _weakref module.

    At commit 8334f30, _imp.create_builtin() doesn't store it in the internal "extensions" dictionary, but calls PyModule_FromDefAndSpec() instead. PyInit__weakref() returns a module definition (PyModuleDef_Type: "moduledef"). Each _imp.create_builtin("_weakref") creates a new module.

    @vstinner
    Copy link
    Member Author

    I can reproduce the leak with command:

    ./python -m test -R 3:3 test_leak

    File 1:
    ---

    import unittest
    import sys
    from importlib import _bootstrap as BOOTSTRAP
    
    
    def _save_and_remove_module(name, orig_modules):
        """Helper function to save and remove a module from sys.modules
    Raise ImportError if the module can't be imported.
    """
    # try to import the module and raise an error if it can't be imported
    if name not in sys.modules:
        __import__(name)
        del sys.modules[name]
    for modname in list(sys.modules):
        if modname == name or modname.startswith(name + '.'):
            orig_modules[modname] = sys.modules[modname]
            del sys.modules[modname]
    
    def _save_and_block_module(name, orig_modules):
        """Helper function to save and block a module in sys.modules
    Return True if the module was in sys.modules, False otherwise.
    """
    saved = True
    try:
        orig_modules[name] = sys.modules[name]
    except KeyError:
        saved = False
    sys.modules[name] = None
    return saved
    
    def import_fresh_module():
        name = 'importlib3'
        fresh = ('importlib',)
        blocked = ('_frozen_importlib', '_frozen_importlib_external')
        orig_modules = {}
        names_to_remove = []
        _save_and_remove_module(name, orig_modules)
        try:
            for fresh_name in fresh:
                _save_and_remove_module(fresh_name, orig_modules)
            for blocked_name in blocked:
                if not _save_and_block_module(blocked_name, orig_modules):
                    names_to_remove.append(blocked_name)
    
            with BOOTSTRAP._ModuleLockManager(name):
                spec = BOOTSTRAP._find_spec(name, None)
                module = BOOTSTRAP.module_from_spec(spec)
                sys.modules[spec.name] = module
                spec.loader.exec_module(module)
                del sys.modules[spec.name]
    finally:
        for orig_name, module in orig_modules.items():
            sys.modules[orig_name] = module
        for name_to_remove in names_to_remove:
            del sys.modules[name_to_remove]
    
    class Tests(unittest.TestCase):
        def test_import_fresh(self):
            import_fresh_module()

    File 2: Lib/importlib3.py
    ---

    import _imp
    import sys

    try:
    import _frozen_importlib as _bootstrap
    case = 1
    except ImportError:
    case = 2

    if case == 2:
        _bootstrap = type(sys)("_bootstrap")
        _bootstrap._weakref = sys.modules['_weakref']
    
        class ModuleSpec:
            @staticmethod
            def method():
                pass
    
        spec = ModuleSpec()
        spec.name = "_weakref"
    
        module = _imp.create_builtin(spec)
        module.__spec__ = spec
    sys.modules["_weakref"] = module
    

    ---

    @vstinner
    Copy link
    Member Author

    With the latest Lib/importlib3.py, there are less leaked references, but there are still leaked references:

    test_leak leaked [139, 139, 139] references, sum=417
    test_leak leaked [42, 42, 42] memory blocks, sum=126

    @vstinner
    Copy link
    Member Author

    File 1:

    Oops, you should read:

    File 1: Lib/test/test_leak.py

    @vstinner
    Copy link
    Member Author

    Code extremely simplified:
    ---

    import unittest
    import sys
    import _imp
    
    class ModuleSpec:
        pass
    
    class Tests(unittest.TestCase):
        def test_import_fresh(self):
            spec = ModuleSpec()
            spec.sys_weakref = sys.modules["_weakref"]
            spec.name = "_weakref"
            module = _imp.create_builtin(spec)
            module.__spec__ = spec
            sys.modules["_weakref"] = module

    I still get a leak:
    ---
    test_leak leaked [34, 34, 34] references, sum=102
    test_leak leaked [11, 11, 11] memory blocks, sum=33
    ---

    I understand that sys.modules["_weakref"] is overriden by the test with a new module, and the new module holds a reference somehow to the previous _weakref module. The old and the new modules remain alive.

    @vstinner
    Copy link
    Member Author

    Code extremely simplified: (...)

    Relationship between origin unmodified code and the extremely simplified code:

    • Lib/test/test_importlib/init.py: load_tests() is called at each "test -R 3:3" iteration

    • Lib/test/test_importlib/test_windows.py: at each iteartion:

      • machinery = test.test_importlib.util.import_importlib('importlib.machinery') is called: it reloads the whole importlib package, with _frozen_importlib and _frozen_importlib_external blocked (force to reload pure Python importlib._bootstrap and importlib._bootstrap_external modules)
      • support.import_module('winreg', required_on=['win']) is called with the fresh new importlib package: in short, it tries to import winreg which fails with ModuleNotFoundError
    • Lib/importlib/init.py: each time the module is reloaded

      • importlib._bootstrap._setup() is called: it copies '_thread', '_warnings' and '_weakref' from sys.modules into importlib._bootstrap namespace (module globals) using setattr(self_module, ...)
      • importlib._bootstrap_external._setup() is called: it calls _bootstrap._builtin_from_name('_weakref')

    When importlib._bootstrap_external._setup() calls _bootstrap._builtin_from_name('_weakref'), _init_module_attrs() copies the "spec" into module.__spec__. But the spec contains a reference to importlib._bootstrap namespace.

    Before _weakref is converted to multiphase initialization:

    $ ./python
    >>> import _weakref
    >>> id(_weakref)
    140119879580176
    >>> id(_weakref.__spec__.__init__.__globals__['_weakref'])
    140119879580176

    => same module

    After the change:

    $ ./python
    >>> import _weakref
    >>> id(_weakref)
    140312826159952
    >>> id(_weakref.__spec__.__init__.__globals__['_weakref'])
    140312826366288

    => two different objects

    The problem is that module.__spec__ pulls the whole importlib package which contains tons of objects.

    @vstinner
    Copy link
    Member Author

    Quick & dirty workaround:

    diff --git a/Lib/importlib/_bootstrap_external.py b/Lib/importlib/_bootstrap_external.py
    index 7353bf9a78..d988552f2d 100644
    --- a/Lib/importlib/_bootstrap_external.py
    +++ b/Lib/importlib/_bootstrap_external.py
    @@ -1620,7 +1620,10 @@ def _setup(_bootstrap_module):
         setattr(self_module, '_thread', thread_module)
     
         # Directly load the _weakref module (needed during bootstrap).
    -    weakref_module = _bootstrap._builtin_from_name('_weakref')
    +    if '_weakref' not in sys.modules:
    +        weakref_module = _bootstrap._builtin_from_name('_weakref')
    +    else:
    +        weakref_module = sys.modules['_weakref']
         setattr(self_module, '_weakref', weakref_module)
     
         # Directly load the winreg module (needed during bootstrap).

    But I think that the issue is larger than just _weakref.

    • Maybe the test_importlib should before save/restore the the "Python state" rather than modifying modules
    • Maybe module.__spec__ should leak less importlib internals: explicitly clear namespaces? Use static methods? I'm not sure how to do that.
    • Remove module.__spec__? ... that would mean rejecting PEP-451 implemented in Python 3.4, that sounds like a major regression :-(

    @vstinner vstinner changed the title test_importlib leaked [6303, 6299, 6303] references importlib: module.__spec__ leaks importlib namespaces (test_importlib leaked xxx references) Mar 24, 2020
    @vstinner vstinner changed the title test_importlib leaked [6303, 6299, 6303] references importlib: module.__spec__ leaks importlib namespaces (test_importlib leaked xxx references) Mar 24, 2020
    @vstinner
    Copy link
    Member Author

    Before _weakref is converted to multiphase initialization:
    (...)
    => same module

    That's because modules which don't use the multiphase initialization are cached in _PyImport_FixupExtensionObject(): see msg364914 for details. Next calls to _imp.create_builtin("_weakref") simply returned the cached _weakref module. It's loaded exactly once.

    @vstinner
    Copy link
    Member Author

    New changeset 83d46e0 by Victor Stinner in branch 'master':
    bpo-40050: Fix importlib._bootstrap_external (GH-19135)
    83d46e0

    @brettcannon
    Copy link
    Member

    From Victor:

    • Maybe the test_importlib should before save/restore the the "Python state" rather than modifying modules

    You will have to be more specific than that as there is an import_state() context manager to control import state via the sys module.

    • Maybe module.__spec__ should leak less importlib internals: explicitly clear namespaces? Use static methods? I'm not sure how to do that.

    Are you saying change everything on __spec__ objects to be static methods? That won't work because the whole point of __spec__ objects is to essentially be data classes to store details of a module.

    • Remove module.__spec__? ... that would mean rejecting PEP-451 implemented in Python 3.4, that sounds like a major regression :-(

    No. You would break the world and undo years of work to clean up import semantics with no good way to go back to the old way.

    Can I ask why you suddenly want to throw __spec__ objects out due to a leak tied back to _weakref switching to multi-phase initialization? Also note that the use of weakrefs by importlib isn't tied to __spec__ objects but to per-module import locks in importlib._bootstrap.

    @vstinner
    Copy link
    Member Author

    I close the issue.

    I pushed commit 83d46e0 which should not be controversial. In short, the fix is to remove two unused imports :-D

    The fix doesn't remove module.__spec__ nor avoid usage of _weakref, it only fix this issue (reference leak) by copying code from _bootstrap.py to _bootstrap_external.py. In fact, modules like _io were already correctly imported by _bootstrap_external._setup(). Only _thread, _weakref and winreg imports caused this bug.

    I don't think that it's worth it to backport the change to stable branches 3.7 and 3.8, since stable branches don't use multiphase initialization for _weakref. The two unused imports don't cause any harm in importlib._bootstrap_external.

    Can I ask why you suddenly want to throw __spec__ objects out due to a leak tied back to _weakref switching to multi-phase initialization?

    I was thinking just aloud to try to find a solution for this reference leak.

    @vstinner
    Copy link
    Member Author

    Since this reference leak is fixed, I reapplied Hai Shi's change for _weakref in bpo-1635741:

    New changeset 93460d0 by Victor Stinner in branch 'master':
    bpo-1635741: Port _weakref extension module to multiphase initialization (PEP-489) (GH-19140)
    93460d0

    @vstinner
    Copy link
    Member Author

    New changeset ace018c by Victor Stinner in branch 'master':
    bpo-40050: Rephrase NEWS entry (GH-19148)
    ace018c

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes tests Tests in the Lib/test dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants