classification
Title: test_concurrent_futures: reap_children() warnings on RHEL7 and RHEL8 buildbots
Type: Stage: resolved
Components: Tests Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: miss-islington, pablogsal, vstinner
Priority: normal Keywords: patch

Created on 2019-10-21 11:20 by vstinner, last changed 2020-04-23 22:44 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 17640 closed vstinner, 2019-12-17 13:36
PR 17641 merged vstinner, 2019-12-17 14:04
PR 17647 merged vstinner, 2019-12-17 22:21
PR 17652 merged vstinner, 2019-12-18 13:54
PR 17653 closed miss-islington, 2019-12-18 14:50
PR 17654 closed miss-islington, 2019-12-18 14:50
PR 17655 merged vstinner, 2019-12-18 16:37
PR 17656 merged miss-islington, 2019-12-18 20:31
PR 19689 merged vstinner, 2020-04-23 21:35
Messages (19)
msg355051 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-21 11:20
https://buildbot.python.org/all/#/builders/254/builds/162

test_no_stale_references (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.56s ok
test_ressources_gced_in_workers (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 1.90s ok
test_shutdown_race_issue12456 (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.54s ok
test_submit (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.52s ok
test_submit_keyword (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.57s ok
test_traceback (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.48s ok
test_context_manager_shutdown (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolShutdownTest) ... 0.03s ok
test_del_shutdown (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolShutdownTest) ... 0.02s ok
test_hang_issue12364 (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolShutdownTest) ... 1.45s ok
test_interpreter_shutdown (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolShutdownTest) ... 1.62s Warning -- reap_children() reaped child process 28936
ok
msg355053 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-21 11:20
It looks similar to bpo-38448: "test_concurrent_futures: reap_children() reaped child process 26487 on AMD64 RHEL8 Refleaks 3.x"
msg355079 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-21 15:47
I ran "./python -m test --fail-env-changed -w -j50 -F test_concurrent_futures" on my laptop for 5 minutes: system load of 220,57 with a peak of 403 processes... but I failed to reproduce the issue :-(
msg355107 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-21 22:33
The system load was at 10.02 when the bug occurred:

> https://buildbot.python.org/all/#/builders/254/builds/162

...
0:04:11 load avg: 10.02 [230/419] test_pkgutil passed -- running: test_io (2 min 46 sec), test_capi (1 min 12 sec), test_concurrent_futures (2 min 43 sec), test_mailbox (2 min 40 sec), test_tokenize (30.6 sec)
0:04:12 load avg: 10.02 [231/419] test_getpass passed -- running: test_io (2 min 47 sec), test_capi (1 min 13 sec), test_concurrent_futures (2 min 43 sec), test_mailbox (2 min 41 sec), test_tokenize (31.1 sec)
0:04:12 load avg: 10.02 [232/419] test_sys passed -- running: test_io (2 min 47 sec), test_capi (1 min 13 sec), test_concurrent_futures (2 min 43 sec), test_mailbox (2 min 41 sec), test_tokenize (31.3 sec)
0:04:12 load avg: 10.02 [233/419] test_winconsoleio skipped -- running: test_io (2 min 48 sec), test_capi (1 min 13 sec), test_concurrent_futures (2 min 44 sec), test_mailbox (2 min 41 sec), test_tokenize (31.7 sec)
test_winconsoleio skipped -- test only relevant on win32
0:04:13 load avg: 10.02 [234/419/1] test_concurrent_futures failed (env changed) (2 min 44 sec) -- running: test_io (2 min 48 sec), test_capi (1 min 14 sec), test_mailbox (2 min 42 sec), test_tokenize (32.0 sec)
...
test_crash (test.test_concurrent_futures.ProcessPoolSpawnExecutorDeadlockTest) ... 4.22s ok
test_shutdown_deadlock (test.test_concurrent_futures.ProcessPoolSpawnExecutorDeadlockTest) ... 1.03s ok
test_initializer (test.test_concurrent_futures.ProcessPoolSpawnFailingInitializerTest) ... 0.51s ok
test_initializer (test.test_concurrent_futures.ProcessPoolSpawnInitializerTest) ... 0.54s ok
test_free_reference (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.67s ok
test_killed_child (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.49s ok
test_map (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.56s ok
test_map_chunksize (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.46s ok
test_map_exception (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.46s ok
test_map_timeout (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 6.48s ok
test_max_workers_negative (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.57s ok
test_max_workers_too_large (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... skipped 'Windows-only process limit'
test_no_stale_references (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.56s ok
test_ressources_gced_in_workers (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 1.90s ok
test_shutdown_race_issue12456 (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.54s ok
test_submit (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.52s ok
test_submit_keyword (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.57s ok
test_traceback (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.48s ok
test_context_manager_shutdown (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolShutdownTest) ... 0.03s ok
test_del_shutdown (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolShutdownTest) ... 0.02s ok
test_hang_issue12364 (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolShutdownTest) ... 1.45s ok
test_interpreter_shutdown (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolShutdownTest) ... 1.62s 
Warning -- reap_children() reaped child process 28936
ok
test_processes_terminate (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolShutdownTest) ... 0.47s ok
test_run_after_shutdown (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolShutdownTest) ... 0.00s ok
msg355310 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-24 09:18
New warning on AMD64 RHEL8 LTO + PGO 3.x:
https://buildbot.python.org/all/#/builders/284/builds/171

...
test_killed_child (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.26s ok
test_map (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.29s ok
test_map_chunksize (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.31s ok
test_map_exception (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.30s ok
test_map_timeout (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 6.29s ok
test_max_workers_negative (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.32s ok
test_max_workers_too_large (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... skipped 'Windows-only process limit'
test_no_stale_references (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.29s ok
test_ressources_gced_in_workers (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 1.58s ok
test_shutdown_race_issue12456 (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.30s Warning -- reap_children() reaped child process 10340
ok
test_submit (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.29s ok
test_submit_keyword (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.31s ok
...
msg355312 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-24 09:19
I closed bpo-38448 as duplicate of this issue. Copy of its only message:

AMD64 RHEL8 Refleaks 3.x:
https://buildbot.python.org/all/#/builders/272/builds/23

0:27:13 load avg: 4.88 [416/419/1] test_concurrent_futures failed (env changed) (17 min 11 sec) -- running: test_capi (7 min 28 sec), test_gdb (8 min 49 sec), test_asyncio (23 min 23 sec)
beginning 6 repetitions
123456
.Warning -- reap_children() reaped child process 26487
.....
Warning -- multiprocessing.process._dangling was modified by test_concurrent_futures
  Before: set()
  After:  {<weakref at 0x7fdc08f44e30; to 'SpawnProcess' at 0x7fdc0a467c30>}
msg357095 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-11-20 18:23
AMD64 Debian PGO 3.x:
https://buildbot.python.org/all/#/builders/47/builds/3851

test_submit_keyword (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 0.62s Warning -- reap_children() reaped child process 32627
Warning -- multiprocessing.process._dangling was modified by test_concurrent_futures
msg357404 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-11-24 17:58
Ran 168 tests in 160.415s
OK (skipped=3)
Warning -- multiprocessing.process._dangling was modified by test_concurrent_futures
  Before: set()
  After:  {<weakref at 0x7fd136d66540; to 'ForkProcess' at 0x7fd136cd4a90>}

Another failure in AMD64 RHEL8 LTO + PGO 3.x

https://buildbot.python.org/all/#/builders/284/builds/300/steps/5/logs/stdio
msg357929 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-06 17:38
New failure on AMD64 RHEL7 LTO 3.x:
https://buildbot.python.org/all/#/builders/103/builds/17

test_interpreter_shutdown
(test.test_concurrent_futures.ProcessPoolSpawnProcessPoolShutdownTest)
... 1.61s Warning -- reap_children() reaped child process 9934
msg358547 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-17 15:07
I managed to reproduce the issue on the RHEL8 worker using:

* terminal 1: ./python -u -m test --fail-env-changed test_concurrent_futures -v -m ProcessPoolSpawnProcessPoolExecutorTest -F -j 15
* terminal 2: ./python -m test -j4 -u all -F -r

I can still reproduce the issue with PR 17641 fix, so it's not enough.

I also applied PR 17640 and 2 extra changes:

diff --git a/Lib/multiprocessing/managers.py b/Lib/multiprocessing/managers.py
index 1f9c2daa25..dbbf84b3d2 100644
--- a/Lib/multiprocessing/managers.py
+++ b/Lib/multiprocessing/managers.py
@@ -661,7 +661,7 @@ class BaseManager(object):
             except Exception:
                 pass
 
-            process.join(timeout=1.0)
+            process.join(timeout=0.1)
             if process.is_alive():
                 util.info('manager still alive')
                 if hasattr(process, 'terminate'):
diff --git a/Lib/test/support/__init__.py b/Lib/test/support/__init__.py
index 215bab8131..110c7f945e 100644
--- a/Lib/test/support/__init__.py
+++ b/Lib/test/support/__init__.py
@@ -2399,6 +2399,7 @@ def reap_children():
 
     # Reap all our dead child processes so we don't leave zombies around.
     # These hog resources and might be causing some of the buildbots to die.
+    time.sleep(0.5)
     while True:
         try:
             # Read the exit status of any child process which already completed
msg358548 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-17 15:15
Example of failure, with additional debug prints:

                                                                                                                                                                
0:01:02 load avg: 42.13 [ 38/1] test_concurrent_futures failed (env changed)                                                                                                                                                                  
test_no_stale_references (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 2.00s reap_children                                                                                                                       
ok                                                                                                                                                                                                                                            
test_ressources_gced_in_workers (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 3.86s reap_children                                                                                                                
ok                                                                                                                                                                                                                                            
test_shutdown_race_issue12456 (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 2.22s reap_children                                                                                                                  
ok                                                                                                                                                                                                                                            
test_submit (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 2.26s reap_children                                                                                                                                    
ok                                                                                                                                                                                                                                            
test_submit_keyword (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 1.59s reap_children                                                                                                                            
ok                                                                                                                                                                                                                                            
test_traceback (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest) ... 1.60s reap_children                                                                                                                                 
ok                                                                                                                                                                                                                                           
MP cleanup: cleanup process                                                                                                   
MP cleanup: stop forkserver                                                                                                   
MP cleanup: stop res track                                                                                                  
MP cleanup: run finalizers                                                                                                  
reap_children                                                                                                                 
                                                                                                                              
----------------------------------------------------------------------                                                        
                                                                                                                            
Ran 6 tests in 17.210s                                                                                                      
                                                                                                                              
OK                                                                                                                            
reap_children                                                                                                                 
Warning -- reap_children() reaped child process 4992                                                                        
Warning -- multiprocessing.process._dangling was modified by test_concurrent_futures                                        
  Before: set()                                                                                                               
  After:  {<weakref at 0x7fedc87bc7d0; to 'SpawnProcess' at 0x7fedc7abcd70>}
msg358549 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-17 15:20
I can reproduce the issue with the following match file:

test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest.test_no_stale_references
test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest.test_ressources_gced_in_workers
test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest.test_shutdown_race_issue12456
msg358560 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-17 17:37
New changeset 9707e8e22d80ca97bf7a9812816701cecde6d226 by Victor Stinner in branch 'master':
bpo-38546: multiprocessing tests stop the resource tracker (GH-17641)
https://github.com/python/cpython/commit/9707e8e22d80ca97bf7a9812816701cecde6d226
msg358617 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-18 09:11
New changeset 35acb3597208e10a101140474adec86859d57f61 by Victor Stinner in branch '3.8':
bpo-38546: multiprocessing tests stop the resource tracker (GH-17641) (GH-17647)
https://github.com/python/cpython/commit/35acb3597208e10a101140474adec86859d57f61
msg358627 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-18 14:50
New changeset 673c39331f844a80c465efd7cff88ac55c432bfb by Victor Stinner in branch 'master':
bpo-38546: Fix concurrent.futures test_ressources_gced_in_workers() (GH-17652)
https://github.com/python/cpython/commit/673c39331f844a80c465efd7cff88ac55c432bfb
msg358630 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-18 17:05
> bpo-38546: Fix concurrent.futures test_ressources_gced_in_workers() (GH-17652)

I tested manually on the RHEL8 worker and I confirm that this change prevents the reap_children() warning.

I close the issue. The change will be shortly backported to 3.7 and 3.8.

--

I'm not sure if PR 17640 is useful, let's discuss it on the PR directly.
msg358644 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-18 20:31
New changeset b0eb046cbd0dbb7b17f16aad6de20fac5305f387 by Victor Stinner in branch '3.8':
bpo-38546: Fix concurrent.futures test_ressources_gced_in_workers() (GH-17652) (GH-17655)
https://github.com/python/cpython/commit/b0eb046cbd0dbb7b17f16aad6de20fac5305f387
msg358646 - (view) Author: miss-islington (miss-islington) Date: 2019-12-18 20:51
New changeset b8bbdf049b0472b8edc4298bfa61e62e3a584e98 by Miss Islington (bot) in branch '3.7':
bpo-38546: Fix concurrent.futures test_ressources_gced_in_workers() (GH-17652) (GH-17655)
https://github.com/python/cpython/commit/b8bbdf049b0472b8edc4298bfa61e62e3a584e98
msg367156 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-23 22:44
New changeset fd32a0e2ee445dd7156323d216627112e66b0a69 by Victor Stinner in branch '3.7':
[3.7] bpo-38546: Backport multiprocessing tests fixes from master (GH-19689)
https://github.com/python/cpython/commit/fd32a0e2ee445dd7156323d216627112e66b0a69
History
Date User Action Args
2020-04-23 22:44:12vstinnersetmessages: + msg367156
2020-04-23 21:35:26vstinnersetpull_requests: + pull_request19009
2019-12-18 20:51:16miss-islingtonsetnosy: + miss-islington
messages: + msg358646
2019-12-18 20:31:01vstinnersetmessages: + msg358644
2019-12-18 20:31:01miss-islingtonsetpull_requests: + pull_request17123
2019-12-18 17:05:24vstinnersetversions: + Python 3.7, Python 3.8
2019-12-18 17:05:09vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg358630

stage: patch review -> resolved
2019-12-18 16:37:39vstinnersetpull_requests: + pull_request17122
2019-12-18 14:50:42miss-islingtonsetpull_requests: + pull_request17121
2019-12-18 14:50:35miss-islingtonsetpull_requests: + pull_request17120
2019-12-18 14:50:12vstinnersetmessages: + msg358627
2019-12-18 13:54:08vstinnersetpull_requests: + pull_request17119
2019-12-18 09:11:21vstinnersetmessages: + msg358617
2019-12-17 22:21:52vstinnersetpull_requests: + pull_request17115
2019-12-17 17:37:45vstinnersetmessages: + msg358560
2019-12-17 15:20:40vstinnersetmessages: + msg358549
2019-12-17 15:15:31vstinnersetmessages: + msg358548
2019-12-17 15:07:13vstinnersetmessages: + msg358547
2019-12-17 14:04:41vstinnersetpull_requests: + pull_request17109
2019-12-17 13:36:00vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request17108
2019-12-06 17:38:16vstinnersetmessages: + msg357929
2019-11-24 17:58:51pablogsalsetnosy: + pablogsal
messages: + msg357404
2019-11-20 18:23:00vstinnersetmessages: + msg357095
2019-10-24 09:19:40vstinnersetmessages: + msg355312
title: test_concurrent_futures: test_interpreter_shutdown() warning "reap_children() reaped child process" on AMD64 RHEL7 3.x -> test_concurrent_futures: reap_children() warnings on RHEL7 and RHEL8 buildbots
2019-10-24 09:18:56vstinnerlinkissue38448 superseder
2019-10-24 09:18:32vstinnersetmessages: + msg355310
2019-10-21 22:33:42vstinnersetmessages: + msg355107
2019-10-21 15:47:12vstinnersetmessages: + msg355079
2019-10-21 11:20:37vstinnersetmessages: + msg355053
2019-10-21 11:20:00vstinnercreate