Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regrtest: use process groups #82683

Closed
vstinner opened this issue Oct 16, 2019 · 12 comments
Closed

regrtest: use process groups #82683

vstinner opened this issue Oct 16, 2019 · 12 comments
Labels
3.9 only security fixes tests Tests in the Lib/test dir

Comments

@vstinner
Copy link
Member

BPO 38502
Nosy @db3l, @vstinner
PRs
  • bpo-38502: regrtest uses process groups if available #16829
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-01-07.12:18:19.602>
    created_at = <Date 2019-10-16.22:46:46.844>
    labels = ['tests', '3.9']
    title = 'regrtest: use process groups'
    updated_at = <Date 2020-04-14.23:05:24.084>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2020-04-14.23:05:24.084>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-01-07.12:18:19.602>
    closer = 'vstinner'
    components = ['Tests']
    creation = <Date 2019-10-16.22:46:46.844>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 38502
    keywords = ['patch']
    message_count = 12.0
    messages = ['354818', '354819', '354821', '354822', '354899', '354900', '354923', '354926', '355060', '359504', '366409', '366458']
    nosy_count = 2.0
    nosy_names = ['db3l', 'vstinner']
    pr_nums = ['16829']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue38502'
    versions = ['Python 3.9']

    @vstinner
    Copy link
    Member Author

    Problem.

    When regrtest is interrupted by CTRL+c and regrtest multiprocessing code (-jN command line option) is used, regrtest randomly fails to stop come TestWorkerProcess threads.

    The problem is that only the direct child process is kill by SIGKILL. If the test spawns a grandchild process, this one will not be killed.

    Moreover, the grandchild process inherits the child process stdout and stderr. But regrtest mltiprocessing uses pipes for the child process stdout and stderr.

    At the end, Popen.communicate() hangs randomly in the main regrtest process (in a TestWorkerProcess thread) until the child process *and* the grandchild process completes. Except that the main regrtest does not directly kill the grandchild process.

    => see bpo-38207

    Solution.

    I propose to:

    (1) use Popen() with start_new_session=True: a worker process calls setsid() to create a new process group.
    (2) Don't kill worker processes, but kill the process groups of worker processes.

    Attached PR implements this solution.

    @vstinner vstinner added 3.9 only security fixes tests Tests in the Lib/test dir labels Oct 16, 2019
    @vstinner
    Copy link
    Member Author

    Example of the problem: sometimes, interrupting a multiprocessing test hangs. regrtest fails to interrupt a TestWorkerProcess thread.

    $ ./python -m test test_multiprocessing_fork -j10 -F --timeout=60 --slowest
    0:00:00 load avg: 1.09 Run tests in parallel using 10 child processes
    ^C
    Kill <TestWorkerProcess #1 running test=test_multiprocessing_fork pid=4504 time=2.3 sec>
    Kill <TestWorkerProcess #2 running test=test_multiprocessing_fork pid=4503 time=2.3 sec>
    Kill <TestWorkerProcess #3 running test=test_multiprocessing_fork pid=4506 time=2.3 sec>
    Kill <TestWorkerProcess #4 running test=test_multiprocessing_fork pid=4505 time=2.3 sec>
    Kill <TestWorkerProcess #5 running test=test_multiprocessing_fork pid=4510 time=2.3 sec>
    Kill <TestWorkerProcess #6 running test=test_multiprocessing_fork pid=4509 time=2.3 sec>
    Kill <TestWorkerProcess #7 running test=test_multiprocessing_fork pid=4513 time=2.3 sec>
    Kill <TestWorkerProcess #8 running test=test_multiprocessing_fork pid=4514 time=2.3 sec>
    Kill <TestWorkerProcess #9 running test=test_multiprocessing_fork pid=4516 time=2.3 sec>
    Kill <TestWorkerProcess #10 running test=test_multiprocessing_fork pid=4517 time=2.3 sec>
    0:00:03 load avg: 1.88 Waiting for <TestWorkerProcess #1 running test=test_multiprocessing_fork pid=4504 time=3.3 sec> thread for 1.0 sec
    0:00:04 load avg: 1.88 Waiting for <TestWorkerProcess #1 running test=test_multiprocessing_fork pid=4504 time=4.3 sec> thread for 2.0 sec
    0:00:05 load avg: 1.73 Waiting for <TestWorkerProcess #1 running test=test_multiprocessing_fork pid=4504 time=5.3 sec> thread for 3.0 sec
    (...)

    With my PR, I cannot reproduce the "Waiting for ..." issue anymore.

    @vstinner
    Copy link
    Member Author

    See also bpo-18969: "test suite: enable faulthandler timeout in assert_python". This issue proposes to use killpg() to send a signal to all Python processes spawned by regrtest: trigger faulthandler to dump the Python traceback of all Python threads.

    @vstinner
    Copy link
    Member Author

    See also bpo-12413 "make faulthandler dump traceback of child processes" which looks like a duplicate of bpo-18969.

    @vstinner
    Copy link
    Member Author

    New changeset ecb035c by Victor Stinner in branch 'master':
    bpo-38502: regrtest uses process groups if available (GH-16829)
    ecb035c

    @vstinner
    Copy link
    Member Author

    I will wait at least one day to see how buildbots like this change before backporting it to 3.7 and 3.8.

    @db3l
    Copy link
    Contributor

    db3l commented Oct 18, 2019

    I don't know for sure that this is the cause but both 3.x builds following this commit on my bolen-ubuntu worker (Ubuntu 18.04.3) have had test_pty crash in the first attempt, with the retry succeeding. For example https://buildbot.python.org/all/#/builders/141/builds/2679

    In spot checking, the same behavior seems to be occurring on many other non-Windows builders as well, presumably those for which it gets used.

    There doesn't seem to be anything useful produced by the first crash. Just an entry for "test_pty crashed (Exit code -1)" in the test log.

    @db3l
    Copy link
    Contributor

    db3l commented Oct 18, 2019

    I can recreate this manually by running regrtest.py against test_pty. Crashes with any "-j#" option, but fine when run sequentially. Removing the process group change avoids the crash.

    With the process group change in place, the trigger point appears to be the final "os.close(master_fd)" in PtyTest.test_basic. Commenting that out avoids the crash.

    @vstinner
    Copy link
    Member Author

    I don't know for sure that this is the cause but both 3.x builds following this commit on my bolen-ubuntu worker (Ubuntu 18.04.3) have had test_pty crash in the first attempt, with the retry succeeding. For example https://buildbot.python.org/all/#/builders/141/builds/2679

    I created bpo-38547: "test_pty fails when using setsid()".

    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 7, 2020

    The test_pty has been fixed.

    regrtest now calls setsid() when using multiprocessing (-jN), but only in the main branch. It's a nice feature, but I'm not comfortable about backporting it yet. Maybe if tests become too unstable on other branches, we may backport the feature to 3.7 and 3.8.

    @vstinner vstinner closed this as completed Jan 7, 2020
    @vstinner
    Copy link
    Member Author

    New changeset 67b8a1f by Victor Stinner in branch '3.8':
    [3.8] Update libregrtest from master (GH-19516)
    67b8a1f

    @vstinner
    Copy link
    Member Author

    New changeset b894b66 by Victor Stinner in branch '3.7':
    Update libregrtest from master (GH-19517)
    b894b66

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes tests Tests in the Lib/test dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants