Issue37531
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2019-07-09 19:34 by nanjekyejoannah, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
kill_timeout.py | vstinner, 2019-08-22 10:44 | |||
test_kill.py | vstinner, 2019-08-22 10:48 |
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 14679 | closed | nanjekyejoannah, 2019-07-10 12:16 | |
PR 15072 | merged | vstinner, 2019-08-14 10:46 | |
PR 15279 | merged | vstinner, 2019-08-14 13:56 | |
PR 15280 | merged | vstinner, 2019-08-14 14:04 | |
PR 15345 | merged | vstinner, 2019-08-20 17:21 | |
PR 15419 | merged | vstinner, 2019-08-23 10:00 | |
PR 15871 | merged | vstinner, 2019-09-10 15:07 | |
PR 15874 | merged | vstinner, 2019-09-10 15:11 | |
PR 16220 | merged | vstinner, 2019-09-17 07:22 | |
PR 16223 | merged | miss-islington, 2019-09-17 10:15 | |
PR 16224 | merged | miss-islington, 2019-09-17 10:15 | |
PR 16247 | merged | vstinner, 2019-09-17 23:35 | |
PR 16282 | merged | vstinner, 2019-09-19 10:06 | |
PR 16285 | merged | vstinner, 2019-09-19 14:41 | |
PR 16289 | merged | vstinner, 2019-09-19 16:19 | |
PR 16659 | merged | vstinner, 2019-10-08 14:44 | |
PR 16676 | merged | miss-islington, 2019-10-09 13:33 | |
PR 16677 | merged | miss-islington, 2019-10-09 13:34 | |
PR 16827 | merged | vstinner, 2019-10-16 22:00 |
Messages (51) | |||
---|---|---|---|
msg347576 - (view) | Author: Joannah Nanjekye (nanjekyejoannah) * ![]() |
Date: 2019-07-09 19:34 | |
Reported by Victor Stinner. regrtest has a --timeout parameter which calls faulthandler.dump_traceback_later(timeout), but sometimes it's not enough to kill a test. regrtest should kill a worker process if it runs longer than --timeout seconds. * https://bugs.python.org/issue37313 : test_concurrent_futures ran for 25 hours whereas regrtest was run using --timeout 900 * regrtest ran for 4 days whereas it was run with --timeout=900: https://mail.python.org/pipermail/python-buildbots/2019-June/000285.html |
|||
msg349689 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-08-14 12:18 | |
New changeset b0c8369c603633f445ccbb5ca7a8742145ff9eec by Victor Stinner in branch 'master': bpo-37531: Fix regrtest timeout for subprocesses (GH-15072) https://github.com/python/cpython/commit/b0c8369c603633f445ccbb5ca7a8742145ff9eec |
|||
msg349703 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-08-14 14:30 | |
New changeset 93bee6a8d878ff952e245614c567c7c6bb7a0398 by Victor Stinner in branch '3.7': [3.7] bpo-37531: Fix regrtest timeout for subprocesses (GH-15072) (GH-15280) https://github.com/python/cpython/commit/93bee6a8d878ff952e245614c567c7c6bb7a0398 |
|||
msg349704 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-08-14 14:31 | |
New changeset d85c5670ff1611202a25c9e0967148e72c114de9 by Victor Stinner in branch '3.8': [3.8] bpo-37531: Fix regrtest timeout for subprocesses (GH-15072) (GH-15279) https://github.com/python/cpython/commit/d85c5670ff1611202a25c9e0967148e72c114de9 |
|||
msg349708 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-08-14 14:44 | |
Thanks Joannah! That's a nice enhancement which should help to stabilize buildbots even more. The next step is bpo-37711: "regrtest: re-run failed tests in subprocesses". |
|||
msg350009 - (view) | Author: Jeremy Kloth (jkloth) * | Date: 2019-08-20 14:57 | |
It seems that an issue still exists. https://buildbot.python.org/all/#/builders/130/builds/1050 has been running for nearly 9 hours at this point. I can leave it "stuck" if there is some diagnostics that would be beneficial. But I would prefer not to have this impeding other testing for too long. |
|||
msg350012 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-08-20 15:33 | |
I just tested the 3.7 branch manually: regrtest works as expected. I'm using this local changes: diff --git a/Lib/test/libregrtest/runtest.py b/Lib/test/libregrtest/runtest.py index e7dce180cb..7d996d6918 100644 --- a/Lib/test/libregrtest/runtest.py +++ b/Lib/test/libregrtest/runtest.py @@ -119,7 +119,7 @@ def _runtest(ns, test_name): use_timeout = (ns.timeout is not None) if use_timeout: - faulthandler.dump_traceback_later(ns.timeout, exit=True) + pass #faulthandler.dump_traceback_later(ns.timeout, exit=True) start_time = time.perf_counter() try: diff --git a/Lib/test/test_os.py b/Lib/test/test_os.py index 8032da0530..e21399c840 100644 --- a/Lib/test/test_os.py +++ b/Lib/test/test_os.py @@ -89,6 +89,7 @@ class FileTests(unittest.TestCase): tearDown = setUp def test_access(self): + import time; time.sleep(3600) f = os.open(support.TESTFN, os.O_CREAT|os.O_RDWR) os.close(f) self.assertTrue(os.access(support.TESTFN, os.W_OK)) Then I ran: vstinner@apu$ ./python -m test -j2 test_os -m test_access --timeout=3 Run tests in parallel using 2 child processes 0:00:04 load avg: 0.59 [1/1/1] test_os timed out (4 sec 512 ms) == Tests result: FAILURE == 1 test failed: test_os Total duration: 4 sec 545 ms Tests result: FAILURE As you can see, test_os is killed after 4.5 seconds, as expected (it uses timeout x 1.5, since faulthandler is supposed to be first watchdog with timeout seconds). -- Maybe something is wrong with test_concurrent_futures on Windows. Maybe subprocess.Popen.kill() is unable to kill the process immediately for an unknown reason? Or maybe subprocess.Popen.communicate(timeout=self.timeout) didn't fail with TimeoutExpired? Sadly, libregrtest is quite when it kills a worker process because of timeout, since the following output should be enough to understand what happened. But here the worker was not killed, or failed to be killed, or something else... |
|||
msg350014 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-08-20 15:39 | |
> I can leave it "stuck" if there is some diagnostics that would be beneficial. Can you check the process hierarchy? I would like to know how many worker processes are still running under the main regrtest process. I expect to see exactly one. I don't know how to investigate more on such issue on Windows. |
|||
msg350017 - (view) | Author: Jeremy Kloth (jeremy.kloth) | Date: 2019-08-20 15:55 | |
> Can you check the process hierarchy? I would like to know how many worker processes are still running under the main regrtest process. I expect to see exactly one. I don't know how to investigate more on such issue on Windows. There are no child processes of the main test process. There are 5 orphaned processes from multiprocessing. Their parent process is gone (assuming the parent_pid argument is correct). There is also an orphaned typeperf.exe process, probably also from the killed process. |
|||
msg350020 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-08-20 16:04 | |
(I reopen the issue.) "There are no child processes of the main test process. There are 5 orphaned processes from multiprocessing. Their parent process is gone (assuming the parent_pid argument is correct)." Oh wow, that's really strange. I would expect that regrtest main process has one worker process: "test_concurrent_futures". https://buildbot.python.org/all/#/builders/130/builds/1050 C:\buildbot.python.org\3.7.kloth-win64\build>"C:\buildbot.python.org\3.7.kloth-win64\build\PCbuild\amd64\python_d.exe" -u -Wd -E -bb -m test -uall -rwW --slowest --timeout 1200 --fail-env-changed -j1 -j2 -j4 --timeout 900 == CPython 3.7.4+ (heads/3.7:02c1457, Aug 20 2019, 00:03:57) [MSC v.1900 64 bit (AMD64)] == Windows-7-6.1.7601-SP1 little-endian == cwd: C:\buildbot.python.org\3.7.kloth-win64\build\build\test_python_5524 == CPU count: 4 == encodings: locale=cp1252, FS=utf-8 Using random seed 1058030 Run tests in parallel using 4 child processes 0:00:01 load avg: 0.00 [ 1/416] test_call passed 0:00:02 load avg: 0.00 [ 2/416] test_genexps passed ... 0:12:46 load avg: 2.30 [412/416] test_threaded_import passed -- running: test_concurrent_futures (8 min 42 sec) 0:12:47 load avg: 2.30 [413/416] test_email passed -- running: test_concurrent_futures (8 min 43 sec) 0:12:50 load avg: 2.19 [414/416] test_bytes passed -- running: test_concurrent_futures (8 min 46 sec) 0:12:55 load avg: 2.02 [415/416] test_timeout passed -- running: test_concurrent_futures (8 min 52 sec) running: test_concurrent_futures (9 min 22 sec) running: test_concurrent_futures (9 min 52 sec) running: test_concurrent_futures (10 min 22 sec) ... running: test_concurrent_futures (9 hour 51 min) running: test_concurrent_futures (9 hour 51 min) running: test_concurrent_futures (9 hour 52 min) running: test_concurrent_futures (9 hour 52 min) running: test_concurrent_futures (9 hour 53 min) |
|||
msg350023 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-08-20 17:26 | |
I wrote PR 15345 to enhance regrtest timeout: more logs, more timeouts. |
|||
msg350044 - (view) | Author: Jeremy Kloth (jkloth) * | Date: 2019-08-21 02:39 | |
Additional logging on failures is always welcome. Might I suggest that, in this case, regrtest treats this action as a hard fail so as to not got lost in the other transient failures (test_asyncio). By that I mean, either to not re-run, or to still be "failed" even after a successful re-run as this test_concurrent mishap is rare enough that I doubt it would happen twice in one build. Now for a brain dump. The process tree at this time: python.exe (buildbot) . cmd.exe (test step) .. python_d.exe (regrtest main) ... typepref.exe (load) ... <already terminated process> .... typepref.exe (load for test_concurrent_futures) .... python_d.exe (multiprocessing pool) .... python_d.exe (multiprocessing pool) .... python_d.exe (multiprocessing pool) .... python_d.exe (multiprocessing pool) .... python_d.exe (multiprocessing pool) 1. To see how process termination played out, I killed the main regrtest process. This did not let the buildbot complete, but just sit there without any output. Also the typepref.exe process (and the failed multiprocessing pool processes) still lingered. 2. I then terminated the children* of the already terminated process. No change to the buildbot. [*] Windows doesn't really have the concept of a process tree 3. I then terminated the final typepref.exe process. The buildbot finally finished its test step. Given that typepref.exe also prevents completion, I believe the problem is in how we are spawning subprocesses. Maybe a process handle is being inherited by the spawned processes preventing it from fully terminating? I mention the process handle as it was listed as an open handle in Process Explorer for the regrtest process even though it had been terminated prior. |
|||
msg350057 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-08-21 09:57 | |
> .... typepref.exe (load for test_concurrent_futures) I don't understand why you get 2 typepref.exe process: only the main process should spawn one. Worker processes don't need it and should not spawn it. |
|||
msg350058 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-08-21 09:59 | |
New changeset de2d9eed8bc628533e1628b843cc4c7a5010f6e5 by Victor Stinner in branch 'master': bpo-37531: Enhance regrtest multiprocess timeout (GH-15345) https://github.com/python/cpython/commit/de2d9eed8bc628533e1628b843cc4c7a5010f6e5 |
|||
msg350060 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-08-21 10:21 | |
Oh, interesting: test_regrtest.test_multiprocessing_timeout() failed with a timeout on x86 Windows7 3.x. This build tested commit d33e46d17d33f9b918846982c02ddc17d897c9bc which is older than my "bpo-37531: Enhance regrtest multiprocess timeout (GH-15345)" change (older than commit de2d9eed8bc628533e1628b843cc4c7a5010f6e5). https://buildbot.python.org/all/#/builders/58/builds/2889 running: test_regrtest (32 min 8 sec) running: test_regrtest (32 min 38 sec) running: test_regrtest (33 min 8 sec) running: test_regrtest (33 min 38 sec) running: test_regrtest (34 min 8 sec) running: test_regrtest (34 min 38 sec) 1:19:18 load avg: 1.19 [419/419/1] test_regrtest crashed (Exit code 1) Timeout (0:35:00)! Thread 0x00000ddc (most recent call first): File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\subprocess.py", line 1353 in _readerthread File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\threading.py", line 876 in run File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\threading.py", line 938 in _bootstrap_inner File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\threading.py", line 896 in _bootstrap Thread 0x00000e68 (most recent call first): File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\threading.py", line 1033 in _wait_for_tstate_lock File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\threading.py", line 1017 in join File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\subprocess.py", line 1382 in _communicate File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\subprocess.py", line 1015 in communicate File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\subprocess.py", line 491 in run File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\test_regrtest.py", line 504 in run_command File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\test_regrtest.py", line 529 in run_python File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\test_regrtest.py", line 680 in run_tests File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\test_regrtest.py", line 1178 in test_multiprocessing_timeout File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\unittest\case.py", line 611 in _callTestMethod File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\unittest\case.py", line 654 in run File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\unittest\case.py", line 714 in __call__ File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\unittest\suite.py", line 122 in run File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\unittest\suite.py", line 84 in __call__ File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\unittest\suite.py", line 122 in run File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\unittest\suite.py", line 84 in __call__ File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\unittest\suite.py", line 122 in run File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\unittest\suite.py", line 84 in __call__ File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\unittest\runner.py", line 176 in run File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\support\__init__.py", line 1996 in _run_suite File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\support\__init__.py", line 2092 in run_unittest File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\libregrtest\runtest.py", line 209 in _test_module File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\libregrtest\runtest.py", line 234 in _runtest_inner2 File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\libregrtest\runtest.py", line 270 in _runtest_inner File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\libregrtest\runtest.py", line 140 in _runtest File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\libregrtest\runtest.py", line 193 in runtest File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\libregrtest\runtest_mp.py", line 66 in run_tests_worker File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\libregrtest\main.py", line 642 in _main File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\libregrtest\main.py", line 628 in main File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\libregrtest\main.py", line 695 in main File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\regrtest.py", line 43 in _main File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\regrtest.py", line 47 in <module> File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\runpy.py", line 85 in _run_code == Tests result: FAILURE == |
|||
msg350061 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-08-21 10:24 | |
> x86 Windows7 3.x > https://buildbot.python.org/all/#/builders/58/builds/2889 This buildbot looks slow: the 3 latest builds (2886, 2887, 2888) took around 1 hour. Example: 10 slowest tests: - test_multiprocessing_spawn: 15 min 31 sec - test_zipfile: 8 min 45 sec - test_concurrent_futures: 6 min 32 sec - test_pickle: 4 min 7 sec - test_regrtest: 3 min 47 sec - test_lzma: 3 min 13 sec - test_tarfile: 3 min 9 sec - test_capi: 2 min 54 sec - test_asyncio: 2 min 49 sec - test_venv: 1 min 57 sec Total duration: 1 hour 4 min |
|||
msg350132 - (view) | Author: Jeremy Kloth (jkloth) * | Date: 2019-08-21 23:22 | |
I happened to catch a stuck build prior to the process being killed: https://buildbot.python.org/all/#/builders//builds/2887 In short, the PR doesn't change the problem. The regrtest main will wait indefinitely on the successfully killed process. I have some ideas to try, but creating a reproducer is not exactly easy... |
|||
msg350181 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-08-22 10:44 | |
Oh, I found an issue with regrtest and Windows processes. On Windows, when a process dies (crash or whatever), Popen.communicate() continues to hang in the parent if the died process has at least one child process which is still alive. Test attached kill_timeout.py: --- vstinner@WIN C:\vstinner\python\master>python x.py Running Debug|x64 interpreter... parent pid 1600 child1 pid 2184 child2 pid 4516 communicate(): timeout (1) child1 killed wait() returned: returncode 1 communicate() timeout (2) --- Calling Popen.communicate() hangs even if the child process has been killed, whereas wait() completes immediately. On Windows, Popen.communicate() is implemented with threads calling this function on each pipe: def _readerthread(self, fh, buffer): buffer.append(fh.read()) fh.close() It seems like self.stdout.read() hangs even after the child process has been killed. |
|||
msg350183 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-08-22 10:48 | |
The issue can be reproduced with regrtest: * Copy attached test_kill.py to Lib/test/ * Run: python -m test test_kill -v -j2 --timeout=5 I also applied the following changes to regrtest: diff --git a/Lib/test/libregrtest/runtest.py b/Lib/test/libregrtest/runtest.py index e7dce180cb..7d996d6918 100644 --- a/Lib/test/libregrtest/runtest.py +++ b/Lib/test/libregrtest/runtest.py @@ -119,7 +119,7 @@ def _runtest(ns, test_name): use_timeout = (ns.timeout is not None) if use_timeout: - faulthandler.dump_traceback_later(ns.timeout, exit=True) + pass #faulthandler.dump_traceback_later(ns.timeout, exit=True) start_time = time.perf_counter() try: diff --git a/Lib/test/libregrtest/runtest_mp.py b/Lib/test/libregrtest/runtest_mp.py index c22479b797..51352497ae 100644 --- a/Lib/test/libregrtest/runtest_mp.py +++ b/Lib/test/libregrtest/runtest_mp.py @@ -19,10 +19,10 @@ from test.libregrtest.utils import format_duration # Display the running tests if nothing happened last N seconds -PROGRESS_UPDATE = 30.0 # seconds +PROGRESS_UPDATE = 1.0 # seconds # Time to wait until a worker completes: should be immediate -JOIN_TIMEOUT = 30.0 # seconds +JOIN_TIMEOUT = 5.0 # seconds def must_stop(result, ns): @@ -305,7 +305,7 @@ class MultiprocessRunner: self.pending = MultiprocessIterator(self.regrtest.tests) if self.ns.timeout is not None: self.worker_timeout = self.ns.timeout * 1.5 - self.main_timeout = self.ns.timeout * 2.0 + self.main_timeout = self.ns.timeout * 10.0 else: self.worker_timeout = None self.main_timeout = None |
|||
msg350191 - (view) | Author: Eryk Sun (eryksun) * ![]() |
Date: 2019-08-22 15:00 | |
> It seems like self.stdout.read() hangs even after the child process > has been killed. This is an issue when the standard handles are inherited or duplicated to a grandchild process, and so on. In the case of Popen(sys.executable), the system is duplicating the standard handles implicitly because sys.executable is a console process (assuming it's python[_d].exe), among other criteria. An additional reference on the write side prevents the pipe from closing. fh.read() in the grandparent won't return as long as there's a writer that could potentially write more data. |
|||
msg350233 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-08-22 22:12 | |
> This is an issue when the standard handles are inherited or duplicated to a grandchild process, and so on. In the case of Popen(sys.executable), the system is duplicating the standard handles implicitly because sys.executable is a console process (assuming it's python[_d].exe), among other criteria. An additional reference on the write side prevents the pipe from closing. fh.read() in the grandparent won't return as long as there's a writer that could potentially write more data. Is there a way to workaround that? |
|||
msg350246 - (view) | Author: Eryk Sun (eryksun) * ![]() |
Date: 2019-08-23 03:04 | |
> Is there a way to workaround that? For Windows, subprocess could have a _read_all(file) method that special cases a pipe. The read loop for a pipe would check whether the child has exited. Then call _winapi.PeekNamedPipe on the handle (from get_osfhandle), and do a raw read of the available bytes. If the child has exited or PeekNamedPipe fails (EPIPE), then break, join the partial reads, decode and translate newlines if it's text mode, and return. |
|||
msg350272 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-08-23 10:27 | |
New changeset 767434c39c8f3c6a8af1b3282d8382ccf809fe21 by Victor Stinner in branch 'master': bpo-37531: Fix regrtest _timedout() function on timeout (GH-15419) https://github.com/python/cpython/commit/767434c39c8f3c6a8af1b3282d8382ccf809fe21 |
|||
msg350329 - (view) | Author: Eryk Sun (eryksun) * ![]() |
Date: 2019-08-23 19:19 | |
Alternatively, instead of special casing the file type and spinning on PeekNamedPipe, the workaround could be based on a multiple-object wait that includes the child process handle. In this case, communicate() would always call _communicate in Windows, regardless of the timeout or number of pipes -- because simplistically calling either self.stdout.read() or self.stderr.read() could hang. The implementation would be moderately complicated. If we stop waiting on the reader threads because the process exited, we can give the threads a short time to finish reading and close the files -- maybe 250 ms is enough. But if they haven't exited at this time, we can't simply raise a TimeoutExpired exception if the call hasn't actually timed out. To finish the _communicate call, we would have to cancel the pending reads and join() the threads. We can force a reader thread to exit by canceling the read via WINAPI CancelIoEx. However, _readerthread has to be modified to support this. It could be implemented as a loop that calls _winapi.ReadFile to read the output in chunks that get appended to the buffer list. The read loop would break for either ERROR_BROKEN_PIPE or ERROR_OPERATION_ABORTED (canceled). The final step in _communicate would be to concatenate the partial reads. If it's text mode, it would also have to decode the bytes and translate newlines. The POSIX implementation of _communicate has to do this, so we already have a _translate_newlines method for this case. Note that _winapi.WaitForMultipleObjects is interruptible in the main thread via Ctrl+C, which is a bonus improvement since Thread.join() can't be interrupted in Windows. |
|||
msg350578 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-08-26 22:13 | |
Eryk Sun > Alternatively, instead of special casing the file type and spinning on PeekNamedPipe, the workaround could be based on a multiple-object wait that includes the child process handle. (...) Well, maybe we could do something to enhance regrtest, but the initial bug described in this issue should be fixed by my commit de2d9eed8bc628533e1628b843cc4c7a5010f6e5 which added a timeout to the second Popen.communicate() call, but also to the Popen.wait() call (moreover, "with popen:" is no longer used). Both functions use a timeout of 30 seconds which should be short enough to prevent the main regrtest process to block for hours. When a process is killed and stdout.read() never completes, the stdout.read() thread will likely never completes. But I consider that it's a separated bug that should be addressed in a separated issue. Jeremy Kloth: > In short, the PR doesn't change the problem. The regrtest main will wait indefinitely on the successfully killed process. Again, my commit de2d9eed8bc628533e1628b843cc4c7a5010f6e5 should fix this behavior. -- I'm not sure why test_regrtest.test_multiprocessing_timeout() failed with a timeout on x86 Windows7 3.x; see msg350060. Maybe my commit 767434c39c8f3c6a8af1b3282d8382ccf809fe21 will give more information if the test fails again on this buildbot. I will backport regrtest changes to other 3.7 and 3.8 branches. I wait a few more days to see if buildbots are stable with latest regrtest changes. |
|||
msg351399 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-09-09 10:01 | |
Crap, test_regrtest.test_multiprocessing_timeout() hangs sometimes (randomly), and then support.temp_cwd() fails on rmtree() with "PermissionError: [WinError 32] The process cannot access the file because it is being used by another process". running: test_regrtest (14 min 24 sec) running: test_regrtest (14 min 54 sec) 0:24:35 load avg: 2.02 [419/419/1] test_regrtest crashed (Exit code 1) Timeout (0:15:00)! Thread 0x000017d0 (most recent call first): File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\subprocess.py", line 1353 in _readerthread File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\threading.py", line 882 in run File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\threading.py", line 944 in _bootstrap_inner File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\threading.py", line 902 in _bootstrap Thread 0x00000a30 (most recent call first): File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\threading.py", line 1039 in _wait_for_tstate_lock File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\threading.py", line 1023 in join File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\subprocess.py", line 1382 in _communicate File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\subprocess.py", line 1015 in communicate File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\subprocess.py", line 491 in run File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_regrtest.py", line 504 in run_command File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_regrtest.py", line 529 in run_python File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_regrtest.py", line 680 in run_tests File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_regrtest.py", line 1178 in test_multiprocessing_timeout (...) 1 re-run test: test_regrtest Total duration: 25 min 20 sec Tests result: FAILURE then SUCCESS Traceback (most recent call last): File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\support\__init__.py", line 1013, in temp_dir yield path File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\support\__init__.py", line 1065, in temp_cwd yield cwd_dir File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\libregrtest\main.py", line 628, in main self._main(tests, kwargs) File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\libregrtest\main.py", line 690, in _main sys.exit(0) SystemExit: 0 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\support\__init__.py", line 330, in _force_run return func(*args) PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\buildbot.python.org\\3.x.kloth-win64\\build\\build\\test_python_4376\\test_python_worker_5136' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\runpy.py", line 192, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\__main__.py", line 2, in <module> main() File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\libregrtest\main.py", line 695, in main Regrtest().main(tests=tests, **kwargs) File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\libregrtest\main.py", line 628, in main self._main(tests, kwargs) File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\contextlib.py", line 131, in __exit__ self.gen.throw(type, value, traceback) File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\support\__init__.py", line 1065, in temp_cwd yield cwd_dir File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\contextlib.py", line 131, in __exit__ self.gen.throw(type, value, traceback) File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\support\__init__.py", line 1018, in temp_dir rmtree(path) File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\support\__init__.py", line 452, in rmtree _rmtree(path) File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\support\__init__.py", line 393, in _rmtree _waitfor(_rmtree_inner, path, waitall=True) File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\support\__init__.py", line 341, in _waitfor func(pathname) File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\support\__init__.py", line 390, in _rmtree_inner _force_run(fullname, os.rmdir, fullname) File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\support\__init__.py", line 336, in _force_run return func(*args) PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\buildbot.python.org\\3.x.kloth-win64\\build\\build\\test_python_4376\\test_python_worker_5136' |
|||
msg351517 - (view) | Author: Jeremy Kloth (jkloth) * | Date: 2019-09-09 16:18 | |
Well, the kill timeout doesn't seem to be working, at least completely: https://buildbot.python.org/all/#/builders/40/builds/3012 The worker process has been killed (line 562), but regrtest is still waiting. |
|||
msg351713 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-09-10 15:54 | |
New changeset 494b61aeec8e583755c8ff3a147a58657353bbc3 by Victor Stinner in branch '3.7': [3.7] bpo-37531: Enhance regrtest multiprocess timeout (GH-15345) (GH-15874) https://github.com/python/cpython/commit/494b61aeec8e583755c8ff3a147a58657353bbc3 |
|||
msg351714 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-09-10 15:54 | |
New changeset d42a4fdc630c54352701a466a9e512bee68b5c48 by Victor Stinner in branch '3.8': bpo-37531: Enhance regrtest multiprocess timeout (GH-15345) (GH-15871) https://github.com/python/cpython/commit/d42a4fdc630c54352701a466a9e512bee68b5c48 |
|||
msg351716 - (view) | Author: Jeremy Kloth (jkloth) * | Date: 2019-09-10 15:56 | |
Another day, another stuck test_concurrent_futures... https://buildbot.python.org/all/#/builders/40/builds/3030 The test process is again killed (line 568) but the processes from the multiprocess pool are still alive. Once I manually kill those pool processes, regrtest resumes as it should. |
|||
msg351815 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-09-11 10:55 | |
See also bpo-37424. |
|||
msg352273 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-09-13 09:54 | |
https://buildbot.python.org/all/#/builders/58/builds/2987 1:06:55 load avg: 3.39 [343/419/1] test_regrtest crashed (Exit code 1) Timeout (0:35:00)! Thread 0x00000cdc (most recent call first): File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\subprocess.py", line 1366 in _readerthread File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\threading.py", line 882 in run File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\threading.py", line 944 in _bootstrap_inner File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\threading.py", line 902 in _bootstrap Thread 0x00000b24 (most recent call first): File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\threading.py", line 1039 in _wait_for_tstate_lock File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\threading.py", line 1023 in join File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\subprocess.py", line 1395 in _communicate File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\subprocess.py", line 1024 in communicate File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\subprocess.py", line 491 in run File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\test_regrtest.py", line 504 in run_command File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\test_regrtest.py", line 529 in run_python File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\test_regrtest.py", line 680 in run_tests File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\test_regrtest.py", line 1178 in test_multiprocessing_timeout ... |
|||
msg352619 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-09-17 08:08 | |
New changeset 46b0b81220a23bc4aee5ba3ba67e8cf1b5df7960 by Victor Stinner in branch 'master': bpo-37531: regrtest main process uses shorter timeout (GH-16220) https://github.com/python/cpython/commit/46b0b81220a23bc4aee5ba3ba67e8cf1b5df7960 |
|||
msg352633 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-09-17 12:34 | |
New changeset 6591b4bbb1c0b9c26b99e4b2dba1e5cc8546732d by Victor Stinner (Miss Islington (bot)) in branch '3.7': bpo-37531: regrtest main process uses shorter timeout (GH-16220) (GH-16223) https://github.com/python/cpython/commit/6591b4bbb1c0b9c26b99e4b2dba1e5cc8546732d |
|||
msg352632 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-09-17 12:34 | |
New changeset 5f1590d5e679f4dd0b611ef54ae512f137e78f1b by Victor Stinner (Miss Islington (bot)) in branch '3.8': bpo-37531: regrtest main process uses shorter timeout (GH-16220) (GH-16224) https://github.com/python/cpython/commit/5f1590d5e679f4dd0b611ef54ae512f137e78f1b |
|||
msg352674 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-09-17 23:31 | |
I created bpo-38207 "subprocess: On Windows, Popen.kill() + Popen.communicate() is blocking until all processes using the pipe close the pipe". |
|||
msg352679 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-09-17 23:58 | |
New changeset b9877cd2cc47b6f3512c171814c4f630286279b9 by Victor Stinner in branch 'master': bpo-37531: Skip test_regrtest.test_multiprocessing_timeout() on Windows (GH-16247) https://github.com/python/cpython/commit/b9877cd2cc47b6f3512c171814c4f630286279b9 |
|||
msg352726 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-09-18 13:30 | |
Oh wait, test_regrtest.test_multiprocessing_timeout() now also hangs on FreeBSD? :-( AMD64 FreeBSD CURRENT Shared 3.8: https://buildbot.python.org/all/#/builders/212/builds/226 running: test_regrtest (24 min 52 sec) 0:32:54 load avg: 0.26 [423/423/1] test_regrtest crashed (Exit code 1) Timeout (0:25:00)! Thread 0x0000000800ac0000 (most recent call first): File "/usr/home/buildbot/python/3.8.koobs-freebsd-current/build/Lib/selectors.py", line 415 in select File "/usr/home/buildbot/python/3.8.koobs-freebsd-current/build/Lib/subprocess.py", line 1866 in _communicate File "/usr/home/buildbot/python/3.8.koobs-freebsd-current/build/Lib/subprocess.py", line 1024 in communicate File "/usr/home/buildbot/python/3.8.koobs-freebsd-current/build/Lib/subprocess.py", line 491 in run File "/usr/home/buildbot/python/3.8.koobs-freebsd-current/build/Lib/test/test_regrtest.py", line 504 in run_command File "/usr/home/buildbot/python/3.8.koobs-freebsd-current/build/Lib/test/test_regrtest.py", line 529 in run_python File "/usr/home/buildbot/python/3.8.koobs-freebsd-current/build/Lib/test/test_regrtest.py", line 676 in run_tests File "/usr/home/buildbot/python/3.8.koobs-freebsd-current/build/Lib/test/test_regrtest.py", line 1174 in test_multiprocessing_timeout File "/usr/home/buildbot/python/3.8.koobs-freebsd-current/build/Lib/unittest/case.py", line 633 in _callTestMethod |
|||
msg352730 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-09-18 14:26 | |
It seems like the new regrtest design doesn't work as I expected. Calling popen.kill() + popen.stdout.close() + popen.stderr.close() from a thread B does not always interrupt popen.communicate() in thread A. See https://bugs.python.org/issue38207#msg352729 for an example on Linux where communicate() still blocks until all child processes complete. I see different options: * Revert changes to restore regrtest old design which didn't have these new issues * Find a way to fix bpo-38207 on all platforms: be able to interrupt communicate() as soon as the process is killed and/or when all pipes are closed. * Call communicate() with shorter timeout to workaround the blocking communicate() issue (bpo-38207). * Maybe experiment asyncio which supports asynchronous subprocess. asyncio subprocess uses overlapped operations which can be cancelled. So maybe it isn't affected by bpo-38207. |
|||
msg352796 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-09-19 14:38 | |
New changeset b2dd2dd6e9212ba6b03885b998736a180ef283d6 by Victor Stinner in branch 'master': bpo-37531: Skip test_regrtest.test_multiprocessing_timeout() on all platforms (GH-16282) https://github.com/python/cpython/commit/b2dd2dd6e9212ba6b03885b998736a180ef283d6 |
|||
msg352804 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-09-19 16:05 | |
New changeset fb7746d5d10ec4a34198da672018ba15f5667079 by Victor Stinner in branch '3.8': bpo-37531: sync regrtest with master branch (GH-16285) https://github.com/python/cpython/commit/fb7746d5d10ec4a34198da672018ba15f5667079 |
|||
msg352806 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-09-19 16:40 | |
New changeset 5e1400a6bcbb3350a6665176980a2b8343075c63 by Victor Stinner in branch '3.7': bpo-37531: sync regrtest with master branch (GH-16285) (GH-16289) https://github.com/python/cpython/commit/5e1400a6bcbb3350a6665176980a2b8343075c63 |
|||
msg354078 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-10-07 11:02 | |
Oh, regrtest main process was killed by faulthandler watchdog: it was blocked in _get_result() for longer than 1 minute. Extract of _get_result(): use_faulthandler = (self.ns.timeout is not None) timeout = PROGRESS_UPDATE while True: if use_faulthandler: faulthandler.dump_traceback_later(timeout * 2.0, exit=True) # wait for a thread try: return self.output.get(timeout=timeout) except queue.Empty: pass # display progress running = get_running(self.workers) if running and not self.ns.pgo: self.log('running: %s' % ', '.join(running)) self.ns.timeout is supposed to be 1500. I don't understand why regrtest got a timeout of 60 seconds? https://buildbot.python.org/all/#/builders/168/builds/1583 LD_LIBRARY_PATH=/usr/home/buildbot/python/3.x.koobs-freebsd-current/build ./python ./Tools/scripts/run_tests.py -j 1 -u all -W --slowest --fail-env-changed --timeout=1500 -j2 -j4 == CPython 3.9.0a0 (heads/master:ed8efd8e2c, Oct 7 2019, 21:38:27) [Clang 8.0.1 (tags/RELEASE_801/final 366581)] == FreeBSD-13.0-CURRENT-amd64-64bit-ELF little-endian == cwd: /usr/home/buildbot/python/3.x.koobs-freebsd-current/build/build/test_python_67537 == CPU count: 4 == encodings: locale=UTF-8, FS=utf-8 Using random seed 1307598 0:00:00 load avg: 3.41 Run tests in parallel using 4 child processes (...) 0:00:11 load avg: 3.55 [ 12/419] test_extcall passed 0:00:13 load avg: 3.55 [ 13/419] test_pydoc passed 0:00:14 load avg: 3.55 [ 14/419] test_devpoll skipped test_devpoll skipped -- test works only on Solaris OS family 0:00:15 load avg: 3.55 [ 15/419] test_class passed 0:00:17 load avg: 3.50 [ 16/419] test_ntpath passed Timeout (0:01:00)! Thread 0x0000000802d30c00 (most recent call first): File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/selectors.py", line 415 in select File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/subprocess.py", line 1964 in _communicate File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/subprocess.py", line 1113 in communicate File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/runtest_mp.py", line 199 in _run_process File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/runtest_mp.py", line 228 in _runtest File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/runtest_mp.py", line 265 in run File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/threading.py", line 944 in _bootstrap_inner File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/threading.py", line 902 in _bootstrap Thread 0x0000000802d2fd00 (most recent call first): File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/selectors.py", line 415 in select File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/subprocess.py", line 1964 in _communicate File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/subprocess.py", line 1113 in communicate File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/runtest_mp.py", line 199 in _run_process File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/runtest_mp.py", line 228 in _runtest File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/runtest_mp.py", line 265 in run File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/threading.py", line 944 in _bootstrap_inner File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/threading.py", line 902 in _bootstrap Thread 0x0000000802d2f800 (most recent call first): File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/selectors.py", line 415 in select File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/subprocess.py", line 1964 in _communicate File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/subprocess.py", line 1113 in communicate File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/runtest_mp.py", line 199 in _run_process File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/runtest_mp.py", line 228 in _runtest File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/runtest_mp.py", line 265 in run File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/threading.py", line 944 in _bootstrap_inner File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/threading.py", line 902 in _bootstrap Thread 0x0000000802d30200 (most recent call first): File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/selectors.py", line 415 in select File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/subprocess.py", line 1964 in _communicate File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/subprocess.py", line 1113 in communicate File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/runtest_mp.py", line 199 in _run_process File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/runtest_mp.py", line 228 in _runtest File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/runtest_mp.py", line 265 in run File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/threading.py", line 944 in _bootstrap_inner File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/threading.py", line 902 in _bootstrap Thread 0x0000000800acc000 (most recent call first): File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/threading.py", line 307 in wait File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/queue.py", line 179 in get File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/runtest_mp.py", line 362 in _get_result File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/runtest_mp.py", line 413 in run_tests File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/runtest_mp.py", line 434 in run_tests_multiprocess File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/main.py", line 513 in run_tests File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/main.py", line 687 in _main File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/main.py", line 634 in main File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/main.py", line 712 in main File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/__main__.py", line 2 in <module> File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/runpy.py", line 85 in _run_code File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/runpy.py", line 192 in _run_module_as_main *** Error code 1 |
|||
msg354088 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-10-07 12:13 | |
> I don't understand why regrtest got a timeout of 60 seconds? Oh, that comes from "timeout = PROGRESS_UPDATE" with "PROGRESS_UPDATE = 30.0". But it means that the main process was stuck for longer than 60 seconds!? The main process is supposed to write an update "running: ..." every 30 seconds. This buildbot is very slow. Maybe 30 and 60 seconds limits are too low? |
|||
msg354223 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-10-08 16:45 | |
New changeset 0ec618af98ac250a91ee9c91f8569e6df6772758 by Victor Stinner in branch 'master': bpo-37531: regrtest ignores output on timeout (GH-16659) https://github.com/python/cpython/commit/0ec618af98ac250a91ee9c91f8569e6df6772758 |
|||
msg354270 - (view) | Author: miss-islington (miss-islington) | Date: 2019-10-09 13:52 | |
New changeset 598bfa4d457d61431142ad99ecbb9bd10cf314e6 by Miss Islington (bot) in branch '3.7': bpo-37531: regrtest ignores output on timeout (GH-16659) https://github.com/python/cpython/commit/598bfa4d457d61431142ad99ecbb9bd10cf314e6 |
|||
msg354272 - (view) | Author: miss-islington (miss-islington) | Date: 2019-10-09 14:02 | |
New changeset 359a1975cbca488ccd5ea13bd7268d7e88664078 by Miss Islington (bot) in branch '3.8': bpo-37531: regrtest ignores output on timeout (GH-16659) https://github.com/python/cpython/commit/359a1975cbca488ccd5ea13bd7268d7e88664078 |
|||
msg354277 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-10-09 15:06 | |
The initial issue has been fixed in branches 3.7, 3.8 and master. I close the issue. Thanks to everybody who help to make these changes possible and helped to debug regressions. It's not perfect, regrtest has to workaround bpo-38207 bug, but at least regrtest should no longer runs for several hours or days. > regrtest should kill a worker process if it runs longer than --timeout seconds. The main regrtest process now kills a worker process running longer than timeout seconds. In practice, it's timeout x 1.5, to give time to the faulthandler watchdog to kill the process. If regrtest hits bpo-38207 bug, regrtest takes 2 min 30 sec to exit in the worst case. > But it means that the main process was stuck for longer than 60 seconds!? The main process is supposed to write an update "running: ..." every 30 seconds. This buildbot is very slow. Maybe 30 and 60 seconds limits are too low? I changed this timeout from 1 minute to 5 minutes. > Calling popen.kill() + popen.stdout.close() + popen.stderr.close() from a thread B does not always interrupt popen.communicate() in thread A. I propose to continue the discussion in bpo-38207 "subprocess: On Windows, Popen.kill() + Popen.communicate() is blocking until all processes using the pipe close the pipe". I implemented workarounds in regrtest so regrtest doesn't block *forever*: it only waits 30 seconds until a worker thread completes, and I added a timeout of 2 minutes on Python exit of the main regrtest process to prevent blocking in threading._shutdown(), because of the blocked threads. > I don't understand why you get 2 typeperf.exe process: only the main process should spawn one. Worker processes don't need it and should not spawn it. I didn't check on my Windows VM if I have the issue. If someone sees more than one typeperf.exe process, please open a separated issue. Note: typeperf.exe is the correct spelling (typepref.exe is a typo ;-)). |
|||
msg354817 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-10-16 22:29 | |
New changeset a661392f8fb5ac4fc095aa1845d1eb7a25c4e9be by Victor Stinner in branch 'master': bpo-37531: regrtest now catchs ProcessLookupError (GH-16827) https://github.com/python/cpython/commit/a661392f8fb5ac4fc095aa1845d1eb7a25c4e9be |
|||
msg366410 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-04-14 17:51 | |
New changeset 67b8a1f0f0f78ec38b8626fa9f5b2f5a55c17e15 by Victor Stinner in branch '3.8': [3.8] Update libregrtest from master (GH-19516) https://github.com/python/cpython/commit/67b8a1f0f0f78ec38b8626fa9f5b2f5a55c17e15 |
|||
msg366459 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-04-14 23:05 | |
New changeset b894b669c98cc365b84cbb8d20f531f1d0686f59 by Victor Stinner in branch '3.7': Update libregrtest from master (GH-19517) https://github.com/python/cpython/commit/b894b669c98cc365b84cbb8d20f531f1d0686f59 |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:17 | admin | set | github: 81712 |
2020-04-14 23:05:24 | vstinner | set | messages: + msg366459 |
2020-04-14 17:51:25 | vstinner | set | messages: + msg366410 |
2019-10-16 22:29:18 | vstinner | set | messages: + msg354817 |
2019-10-16 22:00:51 | vstinner | set | pull_requests: + pull_request16377 |
2019-10-09 15:06:10 | vstinner | set | status: open -> closed resolution: fixed messages: + msg354277 stage: patch review -> resolved |
2019-10-09 14:02:18 | miss-islington | set | messages: + msg354272 |
2019-10-09 13:52:38 | miss-islington | set | nosy:
+ miss-islington messages: + msg354270 |
2019-10-09 13:34:02 | miss-islington | set | pull_requests: + pull_request16259 |
2019-10-09 13:33:55 | miss-islington | set | pull_requests: + pull_request16257 |
2019-10-08 16:45:53 | vstinner | set | messages: + msg354223 |
2019-10-08 14:44:39 | vstinner | set | pull_requests: + pull_request16242 |
2019-10-08 00:35:11 | koobs | set | nosy:
+ koobs |
2019-10-07 12:13:12 | vstinner | set | messages: + msg354088 |
2019-10-07 11:02:41 | vstinner | set | messages: + msg354078 |
2019-09-19 16:40:50 | vstinner | set | messages: + msg352806 |
2019-09-19 16:19:03 | vstinner | set | pull_requests: + pull_request15874 |
2019-09-19 16:05:12 | vstinner | set | messages: + msg352804 |
2019-09-19 14:41:15 | vstinner | set | pull_requests: + pull_request15870 |
2019-09-19 14:38:23 | vstinner | set | messages: + msg352796 |
2019-09-19 10:06:00 | vstinner | set | pull_requests: + pull_request15867 |
2019-09-18 14:26:01 | vstinner | set | messages: + msg352730 |
2019-09-18 13:30:40 | vstinner | set | messages: + msg352726 |
2019-09-17 23:58:32 | vstinner | set | messages: + msg352679 |
2019-09-17 23:35:04 | vstinner | set | pull_requests: + pull_request15843 |
2019-09-17 23:31:22 | vstinner | set | messages: + msg352674 |
2019-09-17 12:34:17 | vstinner | set | messages: + msg352632 |
2019-09-17 12:34:17 | vstinner | set | messages: + msg352633 |
2019-09-17 10:15:43 | miss-islington | set | pull_requests: + pull_request15824 |
2019-09-17 10:15:36 | miss-islington | set | pull_requests: + pull_request15823 |
2019-09-17 08:08:30 | vstinner | set | messages: + msg352619 |
2019-09-17 07:22:02 | vstinner | set | pull_requests: + pull_request15821 |
2019-09-13 09:54:35 | vstinner | set | messages: + msg352273 |
2019-09-11 10:55:34 | vstinner | set | messages: + msg351815 |
2019-09-10 15:56:56 | jkloth | set | messages: + msg351716 |
2019-09-10 15:54:54 | vstinner | set | messages: + msg351714 |
2019-09-10 15:54:43 | vstinner | set | messages: + msg351713 |
2019-09-10 15:11:07 | vstinner | set | pull_requests: + pull_request15514 |
2019-09-10 15:07:08 | vstinner | set | pull_requests: + pull_request15513 |
2019-09-09 16:18:18 | jkloth | set | messages: + msg351517 |
2019-09-09 10:01:14 | vstinner | set | messages: + msg351399 |
2019-08-26 22:13:36 | vstinner | set | messages: + msg350578 |
2019-08-23 19:19:06 | eryksun | set | messages: + msg350329 |
2019-08-23 10:27:27 | vstinner | set | messages: + msg350272 |
2019-08-23 10:00:53 | vstinner | set | pull_requests: + pull_request15119 |
2019-08-23 03:04:25 | eryksun | set | messages: + msg350246 |
2019-08-22 22:12:36 | vstinner | set | messages: + msg350233 |
2019-08-22 15:00:41 | eryksun | set | messages: + msg350191 |
2019-08-22 10:48:57 | vstinner | set | files:
+ test_kill.py messages: + msg350183 |
2019-08-22 10:44:50 | vstinner | set | files:
+ kill_timeout.py messages: + msg350181 |
2019-08-21 23:22:52 | jkloth | set | messages: + msg350132 |
2019-08-21 10:24:01 | vstinner | set | messages: + msg350061 |
2019-08-21 10:21:13 | vstinner | set | messages: + msg350060 |
2019-08-21 09:59:23 | vstinner | set | messages: + msg350058 |
2019-08-21 09:57:15 | vstinner | set | messages: + msg350057 |
2019-08-21 02:39:36 | jkloth | set | nosy:
+ eryksun messages: + msg350044 |
2019-08-20 17:26:08 | vstinner | set | messages: + msg350023 |
2019-08-20 17:21:28 | vstinner | set | stage: resolved -> patch review pull_requests: + pull_request15062 |
2019-08-20 16:04:04 | vstinner | set | status: closed -> open resolution: fixed -> (no value) messages: + msg350020 |
2019-08-20 15:55:09 | jeremy.kloth | set | nosy:
+ jeremy.kloth messages: + msg350017 |
2019-08-20 15:39:55 | vstinner | set | messages: + msg350014 |
2019-08-20 15:33:04 | vstinner | set | messages: + msg350012 |
2019-08-20 14:57:04 | jkloth | set | messages: + msg350009 |
2019-08-14 14:44:39 | vstinner | set | status: open -> closed stage: patch review -> resolved resolution: fixed components: + Tests versions: + Python 3.7, Python 3.8 |
2019-08-14 14:44:09 | vstinner | set | messages: + msg349708 |
2019-08-14 14:31:35 | vstinner | set | messages: + msg349704 |
2019-08-14 14:30:59 | vstinner | set | messages: + msg349703 |
2019-08-14 14:04:08 | vstinner | set | pull_requests: + pull_request15004 |
2019-08-14 13:56:14 | vstinner | set | pull_requests: + pull_request15002 |
2019-08-14 12:18:57 | vstinner | set | messages: + msg349689 |
2019-08-14 10:46:31 | vstinner | set | pull_requests: + pull_request14996 |
2019-07-30 01:20:12 | jkloth | set | nosy:
+ jkloth |
2019-07-29 23:48:28 | vstinner | link | issue37313 superseder |
2019-07-10 12:16:31 | nanjekyejoannah | set | keywords:
+ patch stage: needs patch -> patch review pull_requests: + pull_request14491 |
2019-07-09 19:34:22 | nanjekyejoannah | create |