Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test.regrtest has way too many imports #85884

Closed
vstinner opened this issue Sep 4, 2020 · 19 comments
Closed

test.regrtest has way too many imports #85884

vstinner opened this issue Sep 4, 2020 · 19 comments
Labels
3.10 only security fixes tests Tests in the Lib/test dir

Comments

@vstinner
Copy link
Member

vstinner commented Sep 4, 2020

BPO 41718
Nosy @terryjreedy, @vstinner, @zware, @serhiy-storchaka, @pablogsal, @shihai1991
PRs
  • bpo-41718: Reduce test.libregrtest imports #22089
  • bpo-41718: regrtest saved_test_environment avoids imports #24934
  • bpo-41718: Reduce libregrtest runtest imports #24980
  • bpo-41718: importlib no longer imports warnings #24981
  • bpo-41718: Disable support.testresult XML output by default #24982
  • bpo-41718: libregrtest runtest avoids import_helper #24983
  • bpo-41718: libregrtest avoids importing datetime #24985
  • bpo-41718: subprocess imports grp and pwd on demand #24987
  • bpo-41718: runpy now imports pkgutil in functions #24996
  • bpo-41718: Update runpy startup time What's New #24998
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-03-23.19:24:20.978>
    created_at = <Date 2020-09-04.15:36:12.920>
    labels = ['tests', '3.10']
    title = 'test.regrtest has way too many imports'
    updated_at = <Date 2021-03-23.19:24:20.978>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2021-03-23.19:24:20.978>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-03-23.19:24:20.978>
    closer = 'vstinner'
    components = ['Tests']
    creation = <Date 2020-09-04.15:36:12.920>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 41718
    keywords = ['patch']
    message_count = 19.0
    messages = ['376374', '376376', '376379', '376380', '376420', '376426', '376462', '376463', '389105', '389347', '389350', '389355', '389356', '389358', '389360', '389395', '389401', '389404', '389406']
    nosy_count = 6.0
    nosy_names = ['terry.reedy', 'vstinner', 'zach.ware', 'serhiy.storchaka', 'pablogsal', 'shihai1991']
    pr_nums = ['22089', '24934', '24980', '24981', '24982', '24983', '24985', '24987', '24996', '24998']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue41718'
    versions = ['Python 3.10']

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 4, 2020

    Follow-up of bpo-40275.

    While investigating a crash on AIX (bpo-40068), I noticed that test_threading crashed because the test imports the logging module, and the logging has a bug on AIX on fork. I created an issue to reduce the number of imports made by "import test.support":

    https://bugs.python.org/issue40275
    I would prefer to better isolate tests: test_threading should only test the threading module, not the logging module.

    Thanks to the hard work of Hai Shi, "import test.support" now imports only 37 modules instead of 171! He split the 3200 lines of Lib/test/support/init.py into new helper submodules: bytecode, import, threading, socket, etc. For example, TESTFN now comes from test.support.os_helper.
    Sadly, test.regrtest.save_env still imports asyncio and multiprocessing, and so in practice, running any test using "python -m test (...)" still imports around 233 modules :-(

    I measured the number of imports done in practice using the following file, Lib/test/test_sys_modules.py:
    ----

    import unittest
    from test import support
    import sys
    
    class Tests(unittest.TestCase):
        def test_bug(self):
            modules = sorted(sys.modules)
            print("sys.modules:")
            print("")
            import pprint
            pprint.pprint(modules)
            print("")
            print("len(sys.modules):", len(modules))
    
    def test_main():
        support.run_unittest(Tests)
    
    if __name__ == "__main__":
        test_main()

    master:

    3.9:

    3.5:

    2.7:

    @vstinner vstinner added 3.10 only security fixes tests Tests in the Lib/test dir labels Sep 4, 2020
    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 4, 2020

    If I hack test.libregrtest.runtest to not import test.libregrtest.save_env, test_sys_modules imports only 148 instead of 233 modules.

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 4, 2020

    In general, it's nice to have the following 4 checks:

    • multiprocessing.process._dangling
    • asyncio.events._event_loop_policy
    • urllib.requests._url_tempfiles
    • urllib.requests._opener

    The problem is that because of these checks, **any** unit test file of the 424 Python test files import asyncio, multiprocessing and urllib. As a result, **any** unit test starts with around 233 imported modules. We are far from an "unit" test, since many modules have side effects.

    I wrote PR 22089 to remove these checks. "import test.libregrtest" is reduces from 233 to only 149 imports (on Linux), which is way better.

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 4, 2020

    With PR 22089, test_sys_modules.py of msg376374 imports 152 imports rather than 233. It's better than Python 2.7 which imports 170 modules!

    @terryjreedy
    Copy link
    Member

    On Windows with current master, the baseline for running anything with 1 import (">>>  import sys; len(sys.modules)") is 35 imported modules.  Adding "import unittest" increases this to 80.  What slightly puzzles me is that running 

    import unittest
    import sys
    
    class Tests(unittest.TestCase):
        def test_bug(self):
            print("len(sys.modules):", len(sys.modules))
    
    if __name__ == "__main__":
        unittest.main()

    increases the number to 90. Perhaps unittest has delayed imports.

    The current startup number for IDLE is 162, which can result in a cold startup of several seconds. I am thinking of trying to reduce this by delaying imports of modules that are not immediately used and might never be used.

    For tests, I gather that side-effect issues are more important than startup time.

    @serhiy-storchaka
    Copy link
    Member

    You could save/restore this data only when corresponded modules was imported, like it was done in clear_caches() in refleak.py. For example:

        # Same for Process objects
        def get_multiprocessing_process__dangling(self):
            multiprocessing_process = sys.modules.get('multiprocessing.process')
            if not multiprocessing_process:
                return set()
            # Unjoined process objects can survive after process exits
            multiprocessing_process._cleanup()
            # This copies the weakrefs without making any strong reference
            return multiprocessing_process._dangling.copy()
        def restore_multiprocessing_process__dangling(self, saved):
            multiprocessing_process = sys.modules.get('multiprocessing.process')
            if not multiprocessing_process:
                return
            multiprocessing_process._dangling.clear()
            multiprocessing_process._dangling.update(saved)

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 6, 2020

    You could save/restore this data only when corresponded modules was imported, like it was done in clear_caches() in refleak.py.

    It was my first idea, but some large modules like multiprocessing and asyncio are only imported by tested when the test file is imported, whereas save_environment() is called (enter) before the import in libregrtest.runtest.

    @serhiy-storchaka
    Copy link
    Member

    Yes, but they should be imported after running the test. Note that the
    default value in the example was changed from None to set().

    @vstinner
    Copy link
    Member Author

    Serhiy: "You could save/restore this data only when corresponded modules was imported, like it was done in clear_caches() in refleak.py."

    That's a very good idea! I implemented it in PR 24934. But I modified runtest() to use *two* saved_test_environment instance. One before the test module is imported, one after. The one before is needed to check if the import itself has side effect, for example if the module body has side effect. The second is to check if running tests has side effect. The second one is more likely to have modules imported. The first one may miss some bugs, but IMO it's an acceptable trade-off.

    @vstinner
    Copy link
    Member Author

    New changeset 532e063 by Victor Stinner in branch 'master':
    bpo-41718: regrtest saved_test_environment avoids imports (GH-24934)
    532e063

    @vstinner
    Copy link
    Member Author

    New changeset 10417dd by Victor Stinner in branch 'master':
    bpo-41718: Reduce libregrtest runtest imports (GH-24980)
    10417dd

    @vstinner
    Copy link
    Member Author

    New changeset 0473fb2 by Victor Stinner in branch 'master':
    bpo-41718: libregrtest runtest avoids import_helper (GH-24983)
    0473fb2

    @vstinner
    Copy link
    Member Author

    New changeset 30793e8 by Victor Stinner in branch 'master':
    bpo-41718: Disable support.testresult XML output by default (GH-24982)
    30793e8

    @vstinner
    Copy link
    Member Author

    New changeset 9feae41 by Victor Stinner in branch 'master':
    bpo-41718: libregrtest avoids importing datetime (GH-24985)
    9feae41

    @vstinner
    Copy link
    Member Author

    len(sys.modules) of msg376374 test_sys_modules:

    • Python 3.6: 184
    • Python 3.7: 183
    • Python 3.8: 221
    • Python 3.9: 233
    • master: 131

    The master branch imports 102 less modules than Python 3.9 (233 => 131)! Almost the half.

    asyncio, logging, multiprocessing, etc. are no longer always imported by default.

    @vstinner
    Copy link
    Member Author

    New changeset d72e8d4 by Victor Stinner in branch 'master':
    bpo-41718: subprocess imports grp and pwd on demand (GH-24987)
    d72e8d4

    @vstinner
    Copy link
    Member Author

    New changeset bd9154a by Victor Stinner in branch 'master':
    bpo-41718: runpy now imports pkgutil in functions (GH-24996)
    bd9154a

    @vstinner
    Copy link
    Member Author

    New changeset cd27af7 by Victor Stinner in branch 'master':
    bpo-41718: Update runpy startup time What's New (GH-24998)
    cd27af7

    @vstinner
    Copy link
    Member Author

    Ok, the most important changes have been merged. Thanks everyone who helped on this large project! See also my summary email on python-dev:
    https://mail.python.org/archives/list/python-dev@python.org/thread/I3OQTA3F66NQUN7CH2NHC5XZTO24QCIK/

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes tests Tests in the Lib/test dir
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants