classification
Title: test.regrtest has way too many imports
Type: Stage: patch review
Components: Tests Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: pablogsal, serhiy.storchaka, shihai1991, terry.reedy, vstinner, zach.ware
Priority: normal Keywords: patch

Created on 2020-09-04 15:36 by vstinner, last changed 2020-09-06 12:48 by serhiy.storchaka.

Pull Requests
URL Status Linked Edit
PR 22089 open vstinner, 2020-09-04 15:59
Messages (8)
msg376374 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-04 15:36
Follow-up of bpo-40275.

While investigating a crash on AIX (bpo-40068), I noticed that test_threading crashed because the test imports the logging module, and the logging has a bug on AIX on fork. I created an issue to reduce the number of imports made by "import test.support":

https://bugs.python.org/issue40275
I would prefer to better isolate tests: test_threading should only test the threading module, not the logging module.

Thanks to the hard work of Hai Shi, "import test.support" now imports only 37 modules instead of 171! He split the 3200 lines of Lib/test/support/__init__.py into new helper submodules: bytecode, import, threading, socket, etc. For example, TESTFN now comes from test.support.os_helper.
Sadly, test.regrtest.save_env still imports asyncio and multiprocessing, and so in practice, running any test using "python -m test (...)" still imports around 233 modules :-(

I measured the number of imports done in practice using the following file, Lib/test/test_sys_modules.py:
----
import unittest
from test import support
import sys

class Tests(unittest.TestCase):
    def test_bug(self):
        modules = sorted(sys.modules)
        print("sys.modules:")
        print("")
        import pprint
        pprint.pprint(modules)
        print("")
        print("len(sys.modules):", len(modules))

def test_main():
    support.run_unittest(Tests)

if __name__ == "__main__":
    test_main()
----

master:

* ./python -m test test_sys_modules: 233 modules (multiprocessing, asyncio, etc.)
* ./python Lib/test/test_sys_modules.py: 95 modules

3.9:

* ./python -m test test_sys_modules: 232
* ./python Lib/test/test_sys_modules.py: 117

3.5:

* ./python -m test test_sys_modules: 167
* ./python Lib/test/test_sys_modules.py: 151

2.7:

* ./python -m test test_sys_modules: 170
* ./python Lib/test/test_sys_modules.py: 122
msg376376 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-04 15:38
If I hack test.libregrtest.runtest to not import test.libregrtest.save_env, test_sys_modules imports only 148 instead of 233 modules.
msg376379 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-04 16:04
In general, it's nice to have the following 4 checks:

* multiprocessing.process._dangling
* asyncio.events._event_loop_policy
* urllib.requests._url_tempfiles
* urllib.requests._opener

The problem is that because of these checks, **any** unit test file of the 424 Python test files import asyncio, multiprocessing and urllib. As a result, **any** unit test starts with around 233 imported modules. We are far from an "unit" test, since many modules have side effects.

I wrote PR 22089 to remove these checks. "import test.libregrtest" is reduces from 233 to only 149 imports (on Linux), which is way better.
msg376380 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-04 16:06
With PR 22089, test_sys_modules.py of msg376374 imports 152 imports rather than 233. It's better than Python 2.7 which imports 170 modules!
msg376420 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-09-05 00:24
On Windows with current master, the baseline for running anything with 1 import (">>>  import sys; len(sys.modules)") is 35 imported modules.  Adding "import unittest" increases this to 80.  What slightly puzzles me is that running 
---
import unittest
import sys

class Tests(unittest.TestCase):
    def test_bug(self):
        print("len(sys.modules):", len(sys.modules))

if __name__ == "__main__":
    unittest.main()
---
increases the number to 90.  Perhaps unittest has delayed imports.

The current startup number for IDLE is 162, which can result in a cold startup of several seconds.  I am thinking of trying to reduce this by delaying imports of modules that are not immediately used and might never be used.

For tests, I gather that side-effect issues are more important than startup time.
msg376426 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-09-05 07:38
You could save/restore this data only when corresponded modules was imported, like it was done in clear_caches() in refleak.py. For example:

    # Same for Process objects
    def get_multiprocessing_process__dangling(self):
        multiprocessing_process = sys.modules.get('multiprocessing.process')
        if not multiprocessing_process:
            return set()
        # Unjoined process objects can survive after process exits
        multiprocessing_process._cleanup()
        # This copies the weakrefs without making any strong reference
        return multiprocessing_process._dangling.copy()
    def restore_multiprocessing_process__dangling(self, saved):
        multiprocessing_process = sys.modules.get('multiprocessing.process')
        if not multiprocessing_process:
            return
        multiprocessing_process._dangling.clear()
        multiprocessing_process._dangling.update(saved)
msg376462 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-06 10:12
> You could save/restore this data only when corresponded modules was imported, like it was done in clear_caches() in refleak.py.

It was my first idea, but some large modules like multiprocessing and asyncio are only imported by tested when the test file is imported, whereas save_environment() is called (__enter__) before the import in libregrtest.runtest.
msg376463 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-09-06 12:48
>
> Yes, but they should be imported after running the test. Note that the
> default value in the example was changed from None to set().
History
Date User Action Args
2020-09-06 12:48:40serhiy.storchakasetmessages: + msg376463
2020-09-06 10:12:46vstinnersetmessages: + msg376462
2020-09-05 07:38:26serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg376426
2020-09-05 00:24:03terry.reedysetnosy: + terry.reedy
messages: + msg376420
2020-09-04 16:06:48vstinnersetmessages: + msg376380
2020-09-04 16:04:54vstinnersetnosy: + zach.ware, pablogsal
2020-09-04 16:04:42vstinnersetmessages: + msg376379
2020-09-04 15:59:16vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request21175
2020-09-04 15:38:56vstinnersetmessages: + msg376376
2020-09-04 15:36:12vstinnercreate