Issue40050
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2020-03-23 22:41 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 19128 | merged | vstinner, 2020-03-23 22:45 | |
PR 19135 | merged | vstinner, 2020-03-24 15:19 | |
PR 19148 | merged | vstinner, 2020-03-24 21:55 |
Messages (16) | |||
---|---|---|---|
msg364905 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-03-23 22:41 | |
https://buildbot.python.org/all/#/builders/206/builds/119 test_importlib leaked [6303, 6299, 6303] references, sum=18905 test_importlib leaked [2022, 2020, 2022] memory blocks, sum=6064 Issue reported at: https://bugs.python.org/issue1635741#msg364845 It seems like the regression was introduced by the following change: commit 8334f30a74abcf7e469b901afc307887aa85a888 (HEAD) Author: Hai Shi <shihai1992@gmail.com> Date: Fri Mar 20 16:16:45 2020 +0800 bpo-1635741: Port _weakref extension module to multiphase initialization (PEP 489) (GH-19084) |
|||
msg364910 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-03-23 23:48 | |
New changeset 188078c39dec24aa5b3f2073bdc9a68ebaae42de by Victor Stinner in branch 'master': Revert "bpo-1635741: Port _weakref extension module to multiphase initialization (PEP 489) (GH-19084)" (#19128) https://github.com/python/cpython/commit/188078c39dec24aa5b3f2073bdc9a68ebaae42de |
|||
msg364911 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-03-24 00:23 | |
I can reproduce the leak with the following test, stored as Lib/test/test_leak.py: --- from test import support import unittest class Tests(unittest.TestCase): def test_import_fresh(self): support.import_fresh_module('importlib.machinery', fresh=('importlib',), blocked=('_frozen_importlib', '_frozen_importlib_external')) --- Extract of "./python -m test -R 3:3 test_leak" output: --- test_leak leaked [6303, 6299, 6303] references, sum=18905 test_leak leaked [2022, 2020, 2022] memory blocks, sum=6064 --- |
|||
msg364914 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-03-24 02:01 | |
Before commit 8334f30a74abcf7e469b901afc307887aa85a888, at the first _imp.create_builtin() calls, it calls _PyImport_FixupExtensionObject() which stores the created module in an internal "extensions" dictionary. PyInit__weakref() returns a module object. Next calls to _imp.create_builtin("_weakref") simply returned the cached _weakref module. At commit 8334f30a74abcf7e469b901afc307887aa85a888, _imp.create_builtin() doesn't store it in the internal "extensions" dictionary, but calls PyModule_FromDefAndSpec() instead. PyInit__weakref() returns a module definition (PyModuleDef_Type: "moduledef"). Each _imp.create_builtin("_weakref") creates a new module. |
|||
msg364924 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-03-24 11:07 | |
I can reproduce the leak with command: ./python -m test -R 3:3 test_leak File 1: --- import unittest import sys from importlib import _bootstrap as BOOTSTRAP def _save_and_remove_module(name, orig_modules): """Helper function to save and remove a module from sys.modules Raise ImportError if the module can't be imported. """ # try to import the module and raise an error if it can't be imported if name not in sys.modules: __import__(name) del sys.modules[name] for modname in list(sys.modules): if modname == name or modname.startswith(name + '.'): orig_modules[modname] = sys.modules[modname] del sys.modules[modname] def _save_and_block_module(name, orig_modules): """Helper function to save and block a module in sys.modules Return True if the module was in sys.modules, False otherwise. """ saved = True try: orig_modules[name] = sys.modules[name] except KeyError: saved = False sys.modules[name] = None return saved def import_fresh_module(): name = 'importlib3' fresh = ('importlib',) blocked = ('_frozen_importlib', '_frozen_importlib_external') orig_modules = {} names_to_remove = [] _save_and_remove_module(name, orig_modules) try: for fresh_name in fresh: _save_and_remove_module(fresh_name, orig_modules) for blocked_name in blocked: if not _save_and_block_module(blocked_name, orig_modules): names_to_remove.append(blocked_name) with BOOTSTRAP._ModuleLockManager(name): spec = BOOTSTRAP._find_spec(name, None) module = BOOTSTRAP.module_from_spec(spec) sys.modules[spec.name] = module spec.loader.exec_module(module) del sys.modules[spec.name] finally: for orig_name, module in orig_modules.items(): sys.modules[orig_name] = module for name_to_remove in names_to_remove: del sys.modules[name_to_remove] class Tests(unittest.TestCase): def test_import_fresh(self): import_fresh_module() --- File 2: Lib/importlib3.py --- import _imp import sys try: import _frozen_importlib as _bootstrap case = 1 except ImportError: case = 2 if case == 2: _bootstrap = type(sys)("_bootstrap") _bootstrap._weakref = sys.modules['_weakref'] class ModuleSpec: @staticmethod def method(): pass spec = ModuleSpec() spec.name = "_weakref" module = _imp.create_builtin(spec) module.__spec__ = spec sys.modules["_weakref"] = module --- |
|||
msg364925 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-03-24 11:10 | |
With the latest Lib/importlib3.py, there are less leaked references, but there are still leaked references: test_leak leaked [139, 139, 139] references, sum=417 test_leak leaked [42, 42, 42] memory blocks, sum=126 |
|||
msg364926 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-03-24 11:11 | |
> File 1: Oops, you should read: File 1: Lib/test/test_leak.py |
|||
msg364929 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-03-24 13:26 | |
Code extremely simplified: --- import unittest import sys import _imp class ModuleSpec: pass class Tests(unittest.TestCase): def test_import_fresh(self): spec = ModuleSpec() spec.sys_weakref = sys.modules["_weakref"] spec.name = "_weakref" module = _imp.create_builtin(spec) module.__spec__ = spec sys.modules["_weakref"] = module --- I still get a leak: --- test_leak leaked [34, 34, 34] references, sum=102 test_leak leaked [11, 11, 11] memory blocks, sum=33 --- I understand that sys.modules["_weakref"] is overriden by the test with a new module, and the new module holds a reference somehow to the previous _weakref module. The old and the new modules remain alive. |
|||
msg364930 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-03-24 13:52 | |
> Code extremely simplified: (...) Relationship between origin unmodified code and the extremely simplified code: * Lib/test/test_importlib/__init__.py: load_tests() is called at each "test -R 3:3" iteration * Lib/test/test_importlib/test_windows.py: at each iteartion: * machinery = test.test_importlib.util.import_importlib('importlib.machinery') is called: it reloads the whole importlib package, with _frozen_importlib and _frozen_importlib_external blocked (force to reload pure Python importlib._bootstrap and importlib._bootstrap_external modules) * support.import_module('winreg', required_on=['win']) is called with the fresh new importlib package: in short, it tries to import winreg which fails with ModuleNotFoundError * Lib/importlib/__init__.py: each time the module is reloaded * importlib._bootstrap._setup() is called: it copies '_thread', '_warnings' and '_weakref' from sys.modules into importlib._bootstrap namespace (module globals) using setattr(self_module, ...) * importlib._bootstrap_external._setup() is called: it calls _bootstrap._builtin_from_name('_weakref') When importlib._bootstrap_external._setup() calls _bootstrap._builtin_from_name('_weakref'), _init_module_attrs() copies the "spec" into module.__spec__. But the spec contains a reference to importlib._bootstrap namespace. Before _weakref is converted to multiphase initialization: $ ./python >>> import _weakref >>> id(_weakref) 140119879580176 >>> id(_weakref.__spec__.__init__.__globals__['_weakref']) 140119879580176 => same module After the change: $ ./python >>> import _weakref >>> id(_weakref) 140312826159952 >>> id(_weakref.__spec__.__init__.__globals__['_weakref']) 140312826366288 => two different objects The problem is that module.__spec__ pulls the whole importlib package which contains tons of objects. |
|||
msg364931 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-03-24 13:58 | |
Quick & dirty workaround: diff --git a/Lib/importlib/_bootstrap_external.py b/Lib/importlib/_bootstrap_external.py index 7353bf9a78..d988552f2d 100644 --- a/Lib/importlib/_bootstrap_external.py +++ b/Lib/importlib/_bootstrap_external.py @@ -1620,7 +1620,10 @@ def _setup(_bootstrap_module): setattr(self_module, '_thread', thread_module) # Directly load the _weakref module (needed during bootstrap). - weakref_module = _bootstrap._builtin_from_name('_weakref') + if '_weakref' not in sys.modules: + weakref_module = _bootstrap._builtin_from_name('_weakref') + else: + weakref_module = sys.modules['_weakref'] setattr(self_module, '_weakref', weakref_module) # Directly load the winreg module (needed during bootstrap). But I think that the issue is larger than just _weakref. * Maybe the test_importlib should before save/restore the the "Python state" rather than modifying modules * Maybe module.__spec__ should leak less importlib internals: explicitly clear namespaces? Use static methods? I'm not sure how to do that. * Remove module.__spec__? ... that would mean rejecting PEP 451 implemented in Python 3.4, that sounds like a major regression :-( |
|||
msg364932 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-03-24 13:59 | |
> Before _weakref is converted to multiphase initialization: > (...) > => same module That's because modules which don't use the multiphase initialization are cached in _PyImport_FixupExtensionObject(): see msg364914 for details. Next calls to _imp.create_builtin("_weakref") simply returned the cached _weakref module. It's loaded exactly once. |
|||
msg364946 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-03-24 17:03 | |
New changeset 83d46e0622d2efdf5f3bf8bf8904d0dcb55fc322 by Victor Stinner in branch 'master': bpo-40050: Fix importlib._bootstrap_external (GH-19135) https://github.com/python/cpython/commit/83d46e0622d2efdf5f3bf8bf8904d0dcb55fc322 |
|||
msg364947 - (view) | Author: Brett Cannon (brett.cannon) * | Date: 2020-03-24 17:09 | |
From Victor: * Maybe the test_importlib should before save/restore the the "Python state" rather than modifying modules You will have to be more specific than that as there is an import_state() context manager to control import state via the sys module. * Maybe module.__spec__ should leak less importlib internals: explicitly clear namespaces? Use static methods? I'm not sure how to do that. Are you saying change everything on __spec__ objects to be static methods? That won't work because the whole point of __spec__ objects is to essentially be data classes to store details of a module. * Remove module.__spec__? ... that would mean rejecting PEP 451 implemented in Python 3.4, that sounds like a major regression :-( No. You would break the world and undo years of work to clean up import semantics with no good way to go back to the old way. Can I ask why you suddenly want to throw __spec__ objects out due to a leak tied back to _weakref switching to multi-phase initialization? Also note that the use of weakrefs by importlib isn't tied to __spec__ objects but to per-module import locks in importlib._bootstrap. |
|||
msg364949 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-03-24 17:15 | |
I close the issue. I pushed commit 83d46e0622d2efdf5f3bf8bf8904d0dcb55fc322 which should not be controversial. In short, the fix is to remove two unused imports :-D The fix doesn't remove module.__spec__ nor avoid usage of _weakref, it only fix this issue (reference leak) by copying code from _bootstrap.py to _bootstrap_external.py. In fact, modules like _io were already correctly imported by _bootstrap_external._setup(). Only _thread, _weakref and winreg imports caused this bug. I don't think that it's worth it to backport the change to stable branches 3.7 and 3.8, since stable branches don't use multiphase initialization for _weakref. The two unused imports don't cause any harm in importlib._bootstrap_external. > Can I ask why you suddenly want to throw __spec__ objects out due to a leak tied back to _weakref switching to multi-phase initialization? I was thinking just aloud to try to find a solution for this reference leak. |
|||
msg364953 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-03-24 17:33 | |
Since this reference leak is fixed, I reapplied Hai Shi's change for _weakref in bpo-1635741: New changeset 93460d097f50db0870161a63911d61ce3c5f4583 by Victor Stinner in branch 'master': bpo-1635741: Port _weakref extension module to multiphase initialization (PEP 489) (GH-19140) https://github.com/python/cpython/commit/93460d097f50db0870161a63911d61ce3c5f4583 |
|||
msg365005 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-03-25 17:32 | |
New changeset ace018ca47c03ca699603341b12781b5329d2eaa by Victor Stinner in branch 'master': bpo-40050: Rephrase NEWS entry (GH-19148) https://github.com/python/cpython/commit/ace018ca47c03ca699603341b12781b5329d2eaa |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:28 | admin | set | github: 84231 |
2020-03-25 17:32:17 | vstinner | set | messages: + msg365005 |
2020-03-24 21:55:30 | vstinner | set | pull_requests: + pull_request18509 |
2020-03-24 17:33:29 | vstinner | set | messages: + msg364953 |
2020-03-24 17:15:43 | vstinner | set | status: open -> closed resolution: fixed messages: + msg364949 stage: patch review -> resolved |
2020-03-24 17:09:11 | brett.cannon | set | messages: + msg364947 |
2020-03-24 17:03:41 | vstinner | set | messages: + msg364946 |
2020-03-24 15:19:34 | vstinner | set | pull_requests: + pull_request18496 |
2020-03-24 15:16:57 | corona10 | set | nosy:
+ corona10 |
2020-03-24 13:59:53 | vstinner | set | messages: + msg364932 |
2020-03-24 13:58:13 | vstinner | set | nosy:
+ brett.cannon, ncoghlan, eric.snow messages: + msg364931 title: test_importlib leaked [6303, 6299, 6303] references -> importlib: module.__spec__ leaks importlib namespaces (test_importlib leaked xxx references) |
2020-03-24 13:52:23 | vstinner | set | messages: + msg364930 |
2020-03-24 13:26:06 | vstinner | set | messages: + msg364929 |
2020-03-24 11:11:48 | vstinner | set | messages: + msg364926 |
2020-03-24 11:10:22 | vstinner | set | messages: + msg364925 |
2020-03-24 11:07:37 | vstinner | set | messages: + msg364924 |
2020-03-24 04:41:34 | shihai1991 | set | nosy:
+ shihai1991 |
2020-03-24 02:01:34 | vstinner | set | messages: + msg364914 |
2020-03-24 00:23:37 | vstinner | set | messages: + msg364911 |
2020-03-23 23:48:06 | vstinner | set | messages: + msg364910 |
2020-03-23 22:45:33 | vstinner | set | keywords:
+ patch stage: patch review pull_requests: + pull_request18490 |
2020-03-23 22:41:03 | vstinner | create |