classification
Title: test.regrtest has way too many imports
Type: Stage: resolved
Components: Tests Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: pablogsal, serhiy.storchaka, shihai1991, terry.reedy, vstinner, zach.ware
Priority: normal Keywords: patch

Created on 2020-09-04 15:36 by vstinner, last changed 2021-03-23 19:24 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 22089 closed vstinner, 2020-09-04 15:59
PR 24934 merged vstinner, 2021-03-19 17:42
PR 24980 merged vstinner, 2021-03-22 22:57
PR 24981 closed vstinner, 2021-03-22 23:01
PR 24982 merged vstinner, 2021-03-22 23:35
PR 24983 merged vstinner, 2021-03-22 23:48
PR 24985 merged vstinner, 2021-03-23 00:17
PR 24987 merged vstinner, 2021-03-23 00:26
PR 24996 merged vstinner, 2021-03-23 17:01
PR 24998 merged vstinner, 2021-03-23 18:49
Messages (19)
msg376374 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-04 15:36
Follow-up of bpo-40275.

While investigating a crash on AIX (bpo-40068), I noticed that test_threading crashed because the test imports the logging module, and the logging has a bug on AIX on fork. I created an issue to reduce the number of imports made by "import test.support":

https://bugs.python.org/issue40275
I would prefer to better isolate tests: test_threading should only test the threading module, not the logging module.

Thanks to the hard work of Hai Shi, "import test.support" now imports only 37 modules instead of 171! He split the 3200 lines of Lib/test/support/__init__.py into new helper submodules: bytecode, import, threading, socket, etc. For example, TESTFN now comes from test.support.os_helper.
Sadly, test.regrtest.save_env still imports asyncio and multiprocessing, and so in practice, running any test using "python -m test (...)" still imports around 233 modules :-(

I measured the number of imports done in practice using the following file, Lib/test/test_sys_modules.py:
----
import unittest
from test import support
import sys

class Tests(unittest.TestCase):
    def test_bug(self):
        modules = sorted(sys.modules)
        print("sys.modules:")
        print("")
        import pprint
        pprint.pprint(modules)
        print("")
        print("len(sys.modules):", len(modules))

def test_main():
    support.run_unittest(Tests)

if __name__ == "__main__":
    test_main()
----

master:

* ./python -m test test_sys_modules: 233 modules (multiprocessing, asyncio, etc.)
* ./python Lib/test/test_sys_modules.py: 95 modules

3.9:

* ./python -m test test_sys_modules: 232
* ./python Lib/test/test_sys_modules.py: 117

3.5:

* ./python -m test test_sys_modules: 167
* ./python Lib/test/test_sys_modules.py: 151

2.7:

* ./python -m test test_sys_modules: 170
* ./python Lib/test/test_sys_modules.py: 122
msg376376 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-04 15:38
If I hack test.libregrtest.runtest to not import test.libregrtest.save_env, test_sys_modules imports only 148 instead of 233 modules.
msg376379 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-04 16:04
In general, it's nice to have the following 4 checks:

* multiprocessing.process._dangling
* asyncio.events._event_loop_policy
* urllib.requests._url_tempfiles
* urllib.requests._opener

The problem is that because of these checks, **any** unit test file of the 424 Python test files import asyncio, multiprocessing and urllib. As a result, **any** unit test starts with around 233 imported modules. We are far from an "unit" test, since many modules have side effects.

I wrote PR 22089 to remove these checks. "import test.libregrtest" is reduces from 233 to only 149 imports (on Linux), which is way better.
msg376380 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-04 16:06
With PR 22089, test_sys_modules.py of msg376374 imports 152 imports rather than 233. It's better than Python 2.7 which imports 170 modules!
msg376420 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-09-05 00:24
On Windows with current master, the baseline for running anything with 1 import (">>>  import sys; len(sys.modules)") is 35 imported modules.  Adding "import unittest" increases this to 80.  What slightly puzzles me is that running 
---
import unittest
import sys

class Tests(unittest.TestCase):
    def test_bug(self):
        print("len(sys.modules):", len(sys.modules))

if __name__ == "__main__":
    unittest.main()
---
increases the number to 90.  Perhaps unittest has delayed imports.

The current startup number for IDLE is 162, which can result in a cold startup of several seconds.  I am thinking of trying to reduce this by delaying imports of modules that are not immediately used and might never be used.

For tests, I gather that side-effect issues are more important than startup time.
msg376426 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-09-05 07:38
You could save/restore this data only when corresponded modules was imported, like it was done in clear_caches() in refleak.py. For example:

    # Same for Process objects
    def get_multiprocessing_process__dangling(self):
        multiprocessing_process = sys.modules.get('multiprocessing.process')
        if not multiprocessing_process:
            return set()
        # Unjoined process objects can survive after process exits
        multiprocessing_process._cleanup()
        # This copies the weakrefs without making any strong reference
        return multiprocessing_process._dangling.copy()
    def restore_multiprocessing_process__dangling(self, saved):
        multiprocessing_process = sys.modules.get('multiprocessing.process')
        if not multiprocessing_process:
            return
        multiprocessing_process._dangling.clear()
        multiprocessing_process._dangling.update(saved)
msg376462 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-06 10:12
> You could save/restore this data only when corresponded modules was imported, like it was done in clear_caches() in refleak.py.

It was my first idea, but some large modules like multiprocessing and asyncio are only imported by tested when the test file is imported, whereas save_environment() is called (__enter__) before the import in libregrtest.runtest.
msg376463 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-09-06 12:48
>
> Yes, but they should be imported after running the test. Note that the
> default value in the example was changed from None to set().
msg389105 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-19 17:45
Serhiy: "You could save/restore this data only when corresponded modules was imported, like it was done in clear_caches() in refleak.py."

That's a very good idea! I implemented it in PR 24934. But I modified runtest() to use *two* saved_test_environment instance. One before the test module is imported, one after. The one before is needed to check if the import itself has side effect, for example if the module body has side effect. The second is to check if running tests has side effect. The second one is more likely to have modules imported. The first one may miss some bugs, but IMO it's an acceptable trade-off.
msg389347 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-22 22:52
New changeset 532e063fc2bf9e6e80550670ddc5dc5d2b1d2450 by Victor Stinner in branch 'master':
bpo-41718: regrtest saved_test_environment avoids imports (GH-24934)
https://github.com/python/cpython/commit/532e063fc2bf9e6e80550670ddc5dc5d2b1d2450
msg389350 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-22 23:17
New changeset 10417dd15f135c179cf4234d1abe506915d802ff by Victor Stinner in branch 'master':
bpo-41718: Reduce libregrtest runtest imports (GH-24980)
https://github.com/python/cpython/commit/10417dd15f135c179cf4234d1abe506915d802ff
msg389355 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-23 00:08
New changeset 0473fb222956063814b6beb5fd401f9eeaa8a56a by Victor Stinner in branch 'master':
bpo-41718: libregrtest runtest avoids import_helper (GH-24983)
https://github.com/python/cpython/commit/0473fb222956063814b6beb5fd401f9eeaa8a56a
msg389356 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-23 00:11
New changeset 30793e81bd90f3346e962435d49073bc588f067c by Victor Stinner in branch 'master':
bpo-41718: Disable support.testresult XML output by default (GH-24982)
https://github.com/python/cpython/commit/30793e81bd90f3346e962435d49073bc588f067c
msg389358 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-23 00:40
New changeset 9feae41c4f04ca27fd2c865807a5caeb50bf4fc4 by Victor Stinner in branch 'master':
bpo-41718: libregrtest avoids importing datetime (GH-24985)
https://github.com/python/cpython/commit/9feae41c4f04ca27fd2c865807a5caeb50bf4fc4
msg389360 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-23 00:47
len(sys.modules) of msg376374 test_sys_modules:

* Python 3.6: 184
* Python 3.7: 183
* Python 3.8: 221
* Python 3.9: 233
* master: 131

The master branch imports 102 less modules than Python 3.9 (233 => 131)! Almost the half.

asyncio, logging, multiprocessing, etc. are no longer always imported by default.
msg389395 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-23 16:42
New changeset d72e8d487553c103bf2742e229f8266b515fd951 by Victor Stinner in branch 'master':
bpo-41718: subprocess imports grp and pwd on demand (GH-24987)
https://github.com/python/cpython/commit/d72e8d487553c103bf2742e229f8266b515fd951
msg389401 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-23 18:23
New changeset bd9154a495434464283f74b660160f89930cd791 by Victor Stinner in branch 'master':
bpo-41718: runpy now imports pkgutil in functions (GH-24996)
https://github.com/python/cpython/commit/bd9154a495434464283f74b660160f89930cd791
msg389404 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-23 19:22
New changeset cd27af70d58161c59072e27a10e0e63dcbf0bccb by Victor Stinner in branch 'master':
bpo-41718: Update runpy startup time What's New (GH-24998)
https://github.com/python/cpython/commit/cd27af70d58161c59072e27a10e0e63dcbf0bccb
msg389406 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-23 19:24
Ok, the most important changes have been merged. Thanks everyone who helped on this large project! See also my summary email on python-dev:
https://mail.python.org/archives/list/python-dev@python.org/thread/I3OQTA3F66NQUN7CH2NHC5XZTO24QCIK/
History
Date User Action Args
2021-03-23 19:24:20vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg389406

stage: patch review -> resolved
2021-03-23 19:22:44vstinnersetmessages: + msg389404
2021-03-23 18:49:53vstinnersetpull_requests: + pull_request23756
2021-03-23 18:23:12vstinnersetmessages: + msg389401
2021-03-23 17:01:05vstinnersetpull_requests: + pull_request23754
2021-03-23 16:43:00vstinnersetmessages: + msg389395
2021-03-23 00:47:13vstinnersetmessages: + msg389360
2021-03-23 00:40:46vstinnersetmessages: + msg389358
2021-03-23 00:26:37vstinnersetpull_requests: + pull_request23747
2021-03-23 00:17:02vstinnersetpull_requests: + pull_request23745
2021-03-23 00:11:37vstinnersetmessages: + msg389356
2021-03-23 00:08:56vstinnersetmessages: + msg389355
2021-03-22 23:48:46vstinnersetpull_requests: + pull_request23743
2021-03-22 23:35:32vstinnersetpull_requests: + pull_request23742
2021-03-22 23:17:08vstinnersetmessages: + msg389350
2021-03-22 23:01:10vstinnersetpull_requests: + pull_request23741
2021-03-22 22:57:28vstinnersetpull_requests: + pull_request23740
2021-03-22 22:52:24vstinnersetmessages: + msg389347
2021-03-19 17:45:20vstinnersetmessages: + msg389105
2021-03-19 17:42:50vstinnersetpull_requests: + pull_request23695
2020-09-06 12:48:40serhiy.storchakasetmessages: + msg376463
2020-09-06 10:12:46vstinnersetmessages: + msg376462
2020-09-05 07:38:26serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg376426
2020-09-05 00:24:03terry.reedysetnosy: + terry.reedy
messages: + msg376420
2020-09-04 16:06:48vstinnersetmessages: + msg376380
2020-09-04 16:04:54vstinnersetnosy: + zach.ware, pablogsal
2020-09-04 16:04:42vstinnersetmessages: + msg376379
2020-09-04 15:59:16vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request21175
2020-09-04 15:38:56vstinnersetmessages: + msg376376
2020-09-04 15:36:12vstinnercreate