This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients vstinner
Date 2022-02-10.23:09:16
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1644534557.28.0.000333882929265.issue46716@roundup.psfhosted.org>
In-reply-to
Content
regrtest was run with --timeout 900 on AMD64 Windows11 3.x: timeout confirmed by "(timeout: 15 min, worker timeout: 20 min)" log. But then test_subprocss was only stopped after "4 hour 55 min".

If the regrtest main process is able to display an update 2x per minutes (every 30 sec), it should be able to stop the test worker process (test_subprocess) after 20 minutes. How is it possible that the process took so long?

There are multiple guards:

* (1) in the worker process: _runtest() calls faulthandler.dump_traceback_later(ns.timeout, exit=True)
* (2) libregrtest/runtest_mp.py: TestWorkerProcess._run_process() thread uses popen.communicate(timeout=self.timeout)
* (3) faulthandler.dump_traceback_later(MAIN_PROCESS_TIMEOUT, exit=True): kill the parent process if it is blocked for longer than 5 minutes

Guards (1) and (2) didn't work.

Maybe the parent process should implement a 4th guard using the 20 minute timeout: almost 5 hours is way longer than 20 minutes!


C:\buildbot\3.x.kloth-win11\build>"C:\buildbot\3.x.kloth-win11\build\PCbuild\amd64\python_d.exe"  -u -Wd -E -bb -m test  -uall -rwW --slowest --timeout 1200 --fail-env-changed -j1 -j2 --junit-xml test-results.xml -j40 --timeout 900
== CPython 3.11.0a5+ (main, Feb 10 2022, 04:03:24) [MSC v.1930 64 bit (AMD64)]
== Windows-10-10.0.22000-SP0 little-endian
== cwd: C:\buildbot\3.x.kloth-win11\build\build\test_python_5732�
== CPU count: 32
== encodings: locale=cp1252, FS=utf-8
Using random seed 6320493
0:00:00 Run tests in parallel using 40 child processes (timeout: 15 min, worker timeout: 20 min)
(...)
0:03:13 load avg: 0.76 [431/432] test_multiprocessing_spawn passed (3 min 13 sec) -- running: test_subprocess (3 min 11 sec)
0:03:43 load avg: 0.46 running: test_subprocess (3 min 41 sec)
(...)
4:53:17 load avg: 0.00 running: test_subprocess (4 hour 53 min)
4:53:47 load avg: 0.00 running: test_subprocess (4 hour 53 min)
4:54:17 load avg: 0.09 running: test_subprocess (4 hour 54 min)
4:54:47 load avg: 0.35 running: test_subprocess (4 hour 54 min)
4:55:17 load avg: 0.48 running: test_subprocess (4 hour 55 min)
4:55:46 load avg: 0.50 [432/432/1] test_subprocess timed out (4 hour 55 min) (4 hour 55 min)

== Tests result: FAILURE ==

397 tests OK.

10 slowest tests:
- test_subprocess: 4 hour 55 min
- test_multiprocessing_spawn: 3 min 13 sec
- test_concurrent_futures: 2 min 46 sec
- test_peg_generator: 2 min 32 sec
- test_compileall: 1 min 34 sec
- test_unparse: 1 min 31 sec
- test_distutils: 1 min 23 sec
- test_asyncio: 1 min 22 sec
- test_tokenize: 1 min 8 sec
- test_io: 1 min 5 sec

1 test failed:
    test_subprocess
History
Date User Action Args
2022-02-10 23:09:17vstinnersetrecipients: + vstinner
2022-02-10 23:09:17vstinnersetmessageid: <1644534557.28.0.000333882929265.issue46716@roundup.psfhosted.org>
2022-02-10 23:09:17vstinnerlinkissue46716 messages
2022-02-10 23:09:16vstinnercreate