classification
Title: importlib: module.__spec__ leaks importlib namespaces (test_importlib leaked xxx references)
Type: Stage: resolved
Components: Tests Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, corona10, eric.snow, ncoghlan, shihai1991, vstinner
Priority: normal Keywords: patch

Created on 2020-03-23 22:41 by vstinner, last changed 2020-03-25 17:32 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 19128 merged vstinner, 2020-03-23 22:45
PR 19135 merged vstinner, 2020-03-24 15:19
PR 19148 merged vstinner, 2020-03-24 21:55
Messages (16)
msg364905 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-23 22:41
https://buildbot.python.org/all/#/builders/206/builds/119

test_importlib leaked [6303, 6299, 6303] references, sum=18905
test_importlib leaked [2022, 2020, 2022] memory blocks, sum=6064

Issue reported at: https://bugs.python.org/issue1635741#msg364845

It seems like the regression was introduced by the following change:

commit 8334f30a74abcf7e469b901afc307887aa85a888 (HEAD)
Author: Hai Shi <shihai1992@gmail.com>
Date:   Fri Mar 20 16:16:45 2020 +0800

    bpo-1635741: Port _weakref extension module to multiphase initialization (PEP 489) (GH-19084)
msg364910 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-23 23:48
New changeset 188078c39dec24aa5b3f2073bdc9a68ebaae42de by Victor Stinner in branch 'master':
Revert "bpo-1635741: Port _weakref extension module to multiphase initialization (PEP 489) (GH-19084)" (#19128)
https://github.com/python/cpython/commit/188078c39dec24aa5b3f2073bdc9a68ebaae42de
msg364911 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-24 00:23
I can reproduce the leak with the following test, stored as Lib/test/test_leak.py:
---
from test import support
import unittest

class Tests(unittest.TestCase):
    def test_import_fresh(self):
        support.import_fresh_module('importlib.machinery',
            fresh=('importlib',),
            blocked=('_frozen_importlib', '_frozen_importlib_external'))
---

Extract of "./python -m test -R 3:3 test_leak" output:
---
test_leak leaked [6303, 6299, 6303] references, sum=18905
test_leak leaked [2022, 2020, 2022] memory blocks, sum=6064
---
msg364914 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-24 02:01
Before commit 8334f30a74abcf7e469b901afc307887aa85a888, at the first _imp.create_builtin() calls, it calls _PyImport_FixupExtensionObject() which stores the created module in an internal "extensions" dictionary. PyInit__weakref() returns a module object. Next calls to _imp.create_builtin("_weakref") simply returned the cached _weakref module.

At commit 8334f30a74abcf7e469b901afc307887aa85a888, _imp.create_builtin() doesn't store it in the internal "extensions" dictionary, but calls PyModule_FromDefAndSpec() instead. PyInit__weakref() returns a module definition (PyModuleDef_Type: "moduledef"). Each _imp.create_builtin("_weakref") creates a new module.
msg364924 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-24 11:07
I can reproduce the leak with command:

./python -m test -R 3:3 test_leak

File 1:
---
import unittest
import sys
from importlib import _bootstrap as BOOTSTRAP


def _save_and_remove_module(name, orig_modules):
    """Helper function to save and remove a module from sys.modules

    Raise ImportError if the module can't be imported.
    """
    # try to import the module and raise an error if it can't be imported
    if name not in sys.modules:
        __import__(name)
        del sys.modules[name]
    for modname in list(sys.modules):
        if modname == name or modname.startswith(name + '.'):
            orig_modules[modname] = sys.modules[modname]
            del sys.modules[modname]

def _save_and_block_module(name, orig_modules):
    """Helper function to save and block a module in sys.modules

    Return True if the module was in sys.modules, False otherwise.
    """
    saved = True
    try:
        orig_modules[name] = sys.modules[name]
    except KeyError:
        saved = False
    sys.modules[name] = None
    return saved



def import_fresh_module():
    name = 'importlib3'
    fresh = ('importlib',)
    blocked = ('_frozen_importlib', '_frozen_importlib_external')
    orig_modules = {}
    names_to_remove = []
    _save_and_remove_module(name, orig_modules)
    try:
        for fresh_name in fresh:
            _save_and_remove_module(fresh_name, orig_modules)
        for blocked_name in blocked:
            if not _save_and_block_module(blocked_name, orig_modules):
                names_to_remove.append(blocked_name)

        with BOOTSTRAP._ModuleLockManager(name):
            spec = BOOTSTRAP._find_spec(name, None)
            module = BOOTSTRAP.module_from_spec(spec)
            sys.modules[spec.name] = module
            spec.loader.exec_module(module)
            del sys.modules[spec.name]


    finally:
        for orig_name, module in orig_modules.items():
            sys.modules[orig_name] = module
        for name_to_remove in names_to_remove:
            del sys.modules[name_to_remove]

class Tests(unittest.TestCase):
    def test_import_fresh(self):
        import_fresh_module()
---

File 2: Lib/importlib3.py
---
import _imp
import sys

try:
    import _frozen_importlib as _bootstrap
    case = 1
except ImportError:
    case = 2

if case == 2:
    _bootstrap = type(sys)("_bootstrap")
    _bootstrap._weakref = sys.modules['_weakref']

    class ModuleSpec:
        @staticmethod
        def method():
            pass

    spec = ModuleSpec()
    spec.name = "_weakref"

    module = _imp.create_builtin(spec)
    module.__spec__ = spec

    sys.modules["_weakref"] = module
---
msg364925 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-24 11:10
With the latest Lib/importlib3.py, there are less leaked references, but there are still leaked references:

test_leak leaked [139, 139, 139] references, sum=417
test_leak leaked [42, 42, 42] memory blocks, sum=126
msg364926 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-24 11:11
> File 1:

Oops, you should read:

File 1: Lib/test/test_leak.py
msg364929 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-24 13:26
Code extremely simplified:
---
import unittest
import sys
import _imp

class ModuleSpec:
    pass

class Tests(unittest.TestCase):
    def test_import_fresh(self):
        spec = ModuleSpec()
        spec.sys_weakref = sys.modules["_weakref"]
        spec.name = "_weakref"
        module = _imp.create_builtin(spec)
        module.__spec__ = spec
        sys.modules["_weakref"] = module
---

I still get a leak:
---
test_leak leaked [34, 34, 34] references, sum=102
test_leak leaked [11, 11, 11] memory blocks, sum=33
---

I understand that sys.modules["_weakref"] is overriden by the test with a new module, and the new module holds a reference somehow to the previous _weakref module. The old and the new modules remain alive.
msg364930 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-24 13:52
> Code extremely simplified: (...)

Relationship between origin unmodified code and the extremely simplified code:

* Lib/test/test_importlib/__init__.py: load_tests() is called at each "test -R 3:3" iteration
* Lib/test/test_importlib/test_windows.py: at each iteartion:

  * machinery = test.test_importlib.util.import_importlib('importlib.machinery') is called: it reloads the whole importlib package, with _frozen_importlib and _frozen_importlib_external blocked (force to reload pure Python importlib._bootstrap and importlib._bootstrap_external modules)
  * support.import_module('winreg', required_on=['win']) is called with the fresh new importlib package: in short, it tries to import winreg which fails with ModuleNotFoundError

* Lib/importlib/__init__.py: each time the module is reloaded

  * importlib._bootstrap._setup() is called: it copies '_thread', '_warnings' and '_weakref' from sys.modules into importlib._bootstrap namespace (module globals) using setattr(self_module, ...)
  * importlib._bootstrap_external._setup() is called: it calls _bootstrap._builtin_from_name('_weakref')


When importlib._bootstrap_external._setup() calls _bootstrap._builtin_from_name('_weakref'), _init_module_attrs() copies the "spec" into module.__spec__. But the spec contains a reference to importlib._bootstrap namespace.


Before _weakref is converted to multiphase initialization:

$ ./python
>>> import _weakref
>>> id(_weakref)
140119879580176
>>> id(_weakref.__spec__.__init__.__globals__['_weakref'])
140119879580176

=> same module


After the change:

$ ./python
>>> import _weakref
>>> id(_weakref)
140312826159952
>>> id(_weakref.__spec__.__init__.__globals__['_weakref'])
140312826366288

=> two different objects


The problem is that module.__spec__ pulls the whole importlib package which contains tons of objects.
msg364931 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-24 13:58
Quick & dirty workaround:

diff --git a/Lib/importlib/_bootstrap_external.py b/Lib/importlib/_bootstrap_external.py
index 7353bf9a78..d988552f2d 100644
--- a/Lib/importlib/_bootstrap_external.py
+++ b/Lib/importlib/_bootstrap_external.py
@@ -1620,7 +1620,10 @@ def _setup(_bootstrap_module):
     setattr(self_module, '_thread', thread_module)
 
     # Directly load the _weakref module (needed during bootstrap).
-    weakref_module = _bootstrap._builtin_from_name('_weakref')
+    if '_weakref' not in sys.modules:
+        weakref_module = _bootstrap._builtin_from_name('_weakref')
+    else:
+        weakref_module = sys.modules['_weakref']
     setattr(self_module, '_weakref', weakref_module)
 
     # Directly load the winreg module (needed during bootstrap).


But I think that the issue is larger than just _weakref.

* Maybe the test_importlib should before save/restore the the "Python state" rather than modifying modules
* Maybe module.__spec__ should leak less importlib internals: explicitly clear namespaces? Use static methods? I'm not sure how to do that.
* Remove module.__spec__? ... that would mean rejecting PEP 451 implemented in Python 3.4, that sounds like a major regression :-(
msg364932 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-24 13:59
> Before _weakref is converted to multiphase initialization:
> (...)
> => same module

That's because modules which don't use the multiphase initialization are cached in _PyImport_FixupExtensionObject(): see msg364914 for details. Next calls to _imp.create_builtin("_weakref") simply returned the cached _weakref module. It's loaded exactly once.
msg364946 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-24 17:03
New changeset 83d46e0622d2efdf5f3bf8bf8904d0dcb55fc322 by Victor Stinner in branch 'master':
bpo-40050: Fix importlib._bootstrap_external (GH-19135)
https://github.com/python/cpython/commit/83d46e0622d2efdf5f3bf8bf8904d0dcb55fc322
msg364947 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2020-03-24 17:09
From Victor:
* Maybe the test_importlib should before save/restore the the "Python state" rather than modifying modules

You will have to be more specific than that as there is an import_state() context manager to control import state via the sys module.


* Maybe module.__spec__ should leak less importlib internals: explicitly clear namespaces? Use static methods? I'm not sure how to do that.

Are you saying change everything on __spec__ objects to be static methods? That won't work because the whole point of __spec__ objects is to essentially be data classes to store details of a module.

* Remove module.__spec__? ... that would mean rejecting PEP 451 implemented in Python 3.4, that sounds like a major regression :-(

No. You would break the world and undo years of work to clean up import semantics with no good way to go back to the old way.

Can I ask why you suddenly want to throw __spec__ objects out due to a leak tied back to _weakref switching to multi-phase initialization? Also note that the use of weakrefs by importlib isn't tied to __spec__ objects but to per-module import locks in importlib._bootstrap.
msg364949 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-24 17:15
I close the issue.

I pushed commit 83d46e0622d2efdf5f3bf8bf8904d0dcb55fc322 which should not be controversial. In short, the fix is to remove two unused imports :-D

The fix doesn't remove module.__spec__ nor avoid usage of _weakref, it only fix this issue (reference leak) by copying code from _bootstrap.py to _bootstrap_external.py. In fact, modules like _io were already correctly imported by _bootstrap_external._setup(). Only _thread, _weakref and winreg imports caused this bug.

I don't think that it's worth it to backport the change to stable branches 3.7 and 3.8, since stable branches don't use multiphase initialization for _weakref. The two unused imports don't cause any harm in importlib._bootstrap_external.


> Can I ask why you suddenly want to throw __spec__ objects out due to a leak tied back to _weakref switching to multi-phase initialization?

I was thinking just aloud to try to find a solution for this reference leak.
msg364953 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-24 17:33
Since this reference leak is fixed, I reapplied Hai Shi's change for _weakref in bpo-1635741:

New changeset 93460d097f50db0870161a63911d61ce3c5f4583 by Victor Stinner in branch 'master':
bpo-1635741: Port _weakref extension module to multiphase initialization (PEP 489) (GH-19140)
https://github.com/python/cpython/commit/93460d097f50db0870161a63911d61ce3c5f4583
msg365005 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-25 17:32
New changeset ace018ca47c03ca699603341b12781b5329d2eaa by Victor Stinner in branch 'master':
bpo-40050: Rephrase NEWS entry (GH-19148)
https://github.com/python/cpython/commit/ace018ca47c03ca699603341b12781b5329d2eaa
History
Date User Action Args
2020-03-25 17:32:17vstinnersetmessages: + msg365005
2020-03-24 21:55:30vstinnersetpull_requests: + pull_request18509
2020-03-24 17:33:29vstinnersetmessages: + msg364953
2020-03-24 17:15:43vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg364949

stage: patch review -> resolved
2020-03-24 17:09:11brett.cannonsetmessages: + msg364947
2020-03-24 17:03:41vstinnersetmessages: + msg364946
2020-03-24 15:19:34vstinnersetpull_requests: + pull_request18496
2020-03-24 15:16:57corona10setnosy: + corona10
2020-03-24 13:59:53vstinnersetmessages: + msg364932
2020-03-24 13:58:13vstinnersetnosy: + brett.cannon, ncoghlan, eric.snow

messages: + msg364931
title: test_importlib leaked [6303, 6299, 6303] references -> importlib: module.__spec__ leaks importlib namespaces (test_importlib leaked xxx references)
2020-03-24 13:52:23vstinnersetmessages: + msg364930
2020-03-24 13:26:06vstinnersetmessages: + msg364929
2020-03-24 11:11:48vstinnersetmessages: + msg364926
2020-03-24 11:10:22vstinnersetmessages: + msg364925
2020-03-24 11:07:37vstinnersetmessages: + msg364924
2020-03-24 04:41:34shihai1991setnosy: + shihai1991
2020-03-24 02:01:34vstinnersetmessages: + msg364914
2020-03-24 00:23:37vstinnersetmessages: + msg364911
2020-03-23 23:48:06vstinnersetmessages: + msg364910
2020-03-23 22:45:33vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request18490
2020-03-23 22:41:03vstinnercreate