classification
Title: test_multiprocessing_spawn leaves processes running in background
Type: behavior Stage: resolved
Components: Tests Versions: Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: davin, haypo, martin.panter, pitrou, serhiy.storchaka
Priority: normal Keywords:

Created on 2016-04-15 04:04 by martin.panter, last changed 2017-08-10 15:36 by haypo. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 2841 merged haypo, 2017-07-24 10:31
PR 2842 merged haypo, 2017-07-24 10:49
PR 2850 merged haypo, 2017-07-24 17:52
PR 2855 merged haypo, 2017-07-24 23:53
PR 2879 merged haypo, 2017-07-26 02:27
PR 3064 merged haypo, 2017-08-10 15:08
Messages (21)
msg263442 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-15 04:04
I noticed that this test leaves processes running in the background for a moment after the parent Python process exits. They disappear pretty quickly, but even so, it seems like a bad design. However I am not familiar with the multiprocessing module, so maybe this is unavoidable.

$ ps
  PID TTY          TIME CMD
  597 pts/2    00:00:01 bash
13423 pts/2    00:00:00 ps
$ python3.5 -m test test_multiprocessing_spawn; ps
[1/1] test_multiprocessing_spawn
1 test OK.
  PID TTY          TIME CMD
  597 pts/2    00:00:01 bash
13429 pts/2    00:00:00 python3.5
13475 pts/2    00:00:00 python3.5
15066 pts/2    00:00:00 ps
msg263461 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-04-15 09:06
Using this patch, I can see the list of "python" processes slowly growing:

diff -r ad5b079565ad Lib/unittest/case.py
--- a/Lib/unittest/case.py      Tue Apr 12 23:15:44 2016 -0700
+++ b/Lib/unittest/case.py      Fri Apr 15 11:06:07 2016 +0200
@@ -10,6 +10,7 @@ import warnings
 import collections
 import contextlib
 import traceback
+import os
 
 from . import result
 from .util import (strclass, safe_repr, _count_diff_all_purpose,
@@ -590,6 +591,7 @@ class TestCase(object):
         expecting_failure = expecting_failure_class or expecting_failure_method
         outcome = _Outcome(result)
         try:
+            os.system("ps")
             self._outcome = outcome
 
             with outcome.testPartExecutor(self):
@@ -614,6 +616,7 @@ class TestCase(object):
                         self._addUnexpectedSuccess(result)
                 else:
                     result.addSuccess(self)
+            os.system("ps")
             return result
         finally:
             result.stopTest(self)
msg298877 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-07-22 22:01
I'm not able to reproduce.  Can you try again with latest master?
msg298952 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-07-24 10:43
I'm not sure that the issue is really fixed. I applied my msg263461 patch and I still see the list of "python" processes growing but then also a little bit decreasing, and some zombi <defunct> processes.

The problem is that it's hard to guess if a growing "ps" list is a bug or not, since we only cleanup things in tearDownClass(), not in tearDown(). So it's ok to keep some child processes between two unit tests.

I wrote a first change to help debugging the issue. My change adds more warnings and calls support.reap_children():
https://github.com/python/cpython/pull/2841

With this PR, the following test fails:
---
haypo@selma$ ./python -u -m test -v --fail-env-changed --match=test.test_multiprocessing_spawn.WithThreadsTestPool.test_release_task_refs --match=test.test_multiprocessing_spawn.WithProcessesTestLogging.test_level test_multiprocessing_spawn 
== CPython 3.7.0a0 (heads/master:b364d9f, Jul 24 2017, 11:06:33) [GCC 6.3.1 20161221 (Red Hat 6.3.1-1)]
== Linux-4.11.9-200.fc25.x86_64-x86_64-with-fedora-25-Twenty_Five little-endian
== hash algorithm: siphash24 64bit
== cwd: /home/haypo/prog/python/master/build/test_python_24433
== CPU count: 4
== encodings: locale=UTF-8, FS=utf-8
Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1, isolated=0)
Run tests sequentially
0:00:00 load avg: 0.27 [1/1] test_multiprocessing_spawn
test_level (test.test_multiprocessing_spawn.WithProcessesTestLogging) ... ok
Warning -- reap_children() reaped child process 24437
test_release_task_refs (test.test_multiprocessing_spawn.WithThreadsTestPool) ... ok
Warning -- reap_children() reaped child process 24438
Dangling processes: {<Process(Process-2, started daemon)>, <Process(Process-1, started daemon)>}

----------------------------------------------------------------------
Ran 2 tests in 1.369s

OK
Warning -- multiprocessing.process._dangling was modified by test_multiprocessing_spawn
  Before: <_weakrefset.WeakSet object at 0x7fdc24f70058>
  After:  <_weakrefset.WeakSet object at 0x7fdc2a666260> 
test_multiprocessing_spawn failed (env changed)

1 test altered the execution environment:
    test_multiprocessing_spawn

Total duration: 2 sec
Tests result: ENV CHANGED
---
msg298954 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-07-24 10:52
> With this PR, the following test fails:

Ah, it seems like test_level() alone is enough to reproduce the bug and the following PR fixes it:
https://github.com/python/cpython/pull/2842
msg298955 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-07-24 11:00
With PR 2841 + PR 2842, there is still at least one test which leaks processes:

---
$ ./python -m test --match=test_notify_all -v test_multiprocessing_spawn
== CPython 3.7.0a0 (heads/master:b364d9f, Jul 24 2017, 11:06:33) [GCC 6.3.1 20161221 (Red Hat 6.3.1-1)]
== Linux-4.11.9-200.fc25.x86_64-x86_64-with-fedora-25-Twenty_Five little-endian
== hash algorithm: siphash24 64bit
== cwd: /home/haypo/prog/python/master/build/test_python_28469
== CPU count: 4
== encodings: locale=UTF-8, FS=utf-8
Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1, isolated=0)
Run tests sequentially
0:00:00 load avg: 0.44 [1/1] test_multiprocessing_spawn
test_notify_all (test.test_multiprocessing_spawn.WithManagerTestCondition) ... ok
test_notify_all (test.test_multiprocessing_spawn.WithProcessesTestCondition) ... ok
Warning -- reap_children() reaped child process 28553
Warning -- reap_children() reaped child process 28557
Warning -- reap_children() reaped child process 28559
Warning -- reap_children() reaped child process 28561
test_notify_all (test.test_multiprocessing_spawn.WithThreadsTestCondition) ... ok
Dangling processes: {<Process(Process-11, started daemon)>, <Process(Process-12, started daemon)>, <Process(Process-13, started daemon)>, <Process(Process-9, started daemon)>}

----------------------------------------------------------------------
Ran 3 tests in 3.772s

OK
Warning -- multiprocessing.process._dangling was modified by test_multiprocessing_spawn
  Before: <_weakrefset.WeakSet object at 0x7f2385200400>
  After:  <_weakrefset.WeakSet object at 0x7f238a8ff260> 
test_multiprocessing_spawn failed (env changed)

1 test altered the execution environment:
    test_multiprocessing_spawn

Total duration: 4 sec
Tests result: SUCCESS
---

I looked at the code of the test: it's full of "p.daemon = True" and "t.daemon = True", whereas these processes and threads are not tracked in a list to be cleared manually, so I'm not surprised of the warnings.
msg298957 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-07-24 11:02
New changeset 06634950c553f8df83330ed468c11483b857b7dc by Victor Stinner in branch 'master':
bpo-26762: Avoid daemon process in _test_multiprocessing (#2842)
https://github.com/python/cpython/commit/06634950c553f8df83330ed468c11483b857b7dc
msg298959 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-07-24 11:13
If you want to use --fail-env-changed with PR 2841, for example to use test.bisect, you may want this extra patch:

diff --git a/Lib/test/_test_multiprocessing.py b/Lib/test/_test_multiprocessing.py
index f2342c3..5f9e5c3 100644
--- a/Lib/test/_test_multiprocessing.py
+++ b/Lib/test/_test_multiprocessing.py
@@ -4352,6 +4352,7 @@ class ManagerMixin(object):
                       "children after %s seconds"
                       % (multiprocessing.active_children(), dt),
                       file=sys.stderr)
+                test.support.environment_altered = True
                 break
 
         gc.collect()                       # do garbage collection
@@ -4359,6 +4360,7 @@ class ManagerMixin(object):
             # This is not really an error since some tests do not
             # ensure that all processes which hold a reference to a
             # managed object have been joined.
+            test.support.environment_altered = True
             print('Warning -- Shared objects which still exist at manager '
                   'shutdown:')
             print(cls.manager._debug_info())
msg298988 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-07-24 16:23
Antoine Pitrou: "I'm not able to reproduce.  Can you try again with latest master?"

Oh, maybe I found why... There is a tearDownModule() function which hides bugs. Try following change:

diff --git a/Lib/test/_test_multiprocessing.py b/Lib/test/_test_multiprocessing.py
index 88e2eb3..ae77468 100644
--- a/Lib/test/_test_multiprocessing.py
+++ b/Lib/test/_test_multiprocessing.py
@@ -4432,8 +4432,8 @@ def install_tests_in_module_dict(remote_globs, start_method):
     def tearDownModule():
         multiprocessing.set_start_method(old_start_method[0], force=True)
         # pause a bit so we don't get warning about dangling threads/processes
-        time.sleep(0.5)
-        multiprocessing.process._cleanup()
+        # time.sleep(0.5)
+        # multiprocessing.process._cleanup()
         gc.collect()
         tmp = set(multiprocessing.process._dangling) - set(dangling[0])
         if tmp:
msg299004 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-07-24 17:46
I just created bpo-31019 "multiprocessing.Pool should join "dead" processes" to fix dangling processes related to pool, like test.test_multiprocessing_spawn.WithProcessesTestPool.test_context.
msg299008 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-07-24 18:06
I rewrote PR 2841 to detect more bugs.
msg299009 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-07-24 18:09
PR 2850 + PR 2849 (of bpo-31019) fix most dangling process/thread, but one specific test still leaks:

test.test_multiprocessing_spawn.WithProcessesTestPool.test_traceback

I will try to fix it once other bugs are fixed.
msg299010 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-07-24 18:10
Using PR 2841, I wrote a tool to run each test in subprocess to check if a test method leaks. I found these methods:

 ./python -m test --fail-env-changed test_multiprocessing_spawn -v --match test.test_multiprocessing_spawn.WithProcessesTestBarrier.test_thousand
 ./python -m test --fail-env-changed test_multiprocessing_spawn -v --match test.test_multiprocessing_spawn.WithProcessesTestCondition.test_notify
 ./python -m test --fail-env-changed test_multiprocessing_spawn -v --match test.test_multiprocessing_spawn.WithProcessesTestEvent.test_event
 ./python -m test --fail-env-changed test_multiprocessing_spawn -v --match test.test_multiprocessing_spawn.WithProcessesTestPool.test_context
 ./python -m test --fail-env-changed test_multiprocessing_spawn -v --match test.test_multiprocessing_spawn.WithProcessesTestProcess.test_close
 ./python -m test --fail-env-changed test_multiprocessing_spawn -v --match test.test_multiprocessing_spawn.WithProcessesTestQueue.test_fork
 ./python -m test --fail-env-changed test_multiprocessing_spawn -v --match test.test_multiprocessing_spawn.WithProcessesTestQueue.test_qsize
 ./python -m test --fail-env-changed test_multiprocessing_spawn -v --match test.test_multiprocessing_spawn.WithProcessesTestQueue.test_queue_feeder_donot_stop_onexc
 ./python -m test --fail-env-changed test_multiprocessing_spawn -v --match test.test_multiprocessing_spawn.WithProcessesTestPool.test_traceback
msg299024 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-07-24 22:33
New changeset d7e64d9934d86aa6173229de5af5fe908662a33a by Victor Stinner in branch 'master':
test_multiprocessing: Fix dangling process/thread (#2850)
https://github.com/python/cpython/commit/d7e64d9934d86aa6173229de5af5fe908662a33a
msg299033 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-07-24 23:34
> test.test_multiprocessing_spawn.WithProcessesTestPool.test_traceback

Oh, this test only "leaks" dangling processes and threads because PR 2841 was too strict. The problem is that multiprocessing.Pool has thread attributes which hold references to the pool: reference cycle. I don't see any easy way to break these cycles; since I don't understand if a poll is still supposed to be usable after terminate() or not.

Anyway, my intent here was to do a first cleanup. So I modified my PR 2841 to call support.gc_collect() to break reference cycles.

If someone wants to break reference cycles, please open a new issue. IMHO this issue already became enough complex :-)
msg299034 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-07-24 23:55
New changeset ffb49408f0780ae80a553208aa133bc5bb3ba129 by Victor Stinner in branch 'master':
test_multiprocessing detects dangling per test case (#2841)
https://github.com/python/cpython/commit/ffb49408f0780ae80a553208aa133bc5bb3ba129
msg299036 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-07-25 00:40
New changeset b4c52966c810b5c5e088fceff403247f610b7d13 by Victor Stinner in branch 'master':
bpo-26762: test_multiprocessing close more queues (#2855)
https://github.com/python/cpython/commit/b4c52966c810b5c5e088fceff403247f610b7d13
msg299037 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-07-25 01:05
Ok, I pushed 3 changes. Only one test still has a minor issue:

test.test_multiprocessing_spawn.WithProcessesTestPool.test_context

The bpo-31019 fixes it.

Later, we will be able to modify ManagerMixin.tearDownClass() to emit a warning immediately if a manager leaks a process, and not have this kind loop to sleep until all processes completed.
msg299188 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-07-26 02:48
New changeset d0adfb25c5082046133a18fd185375508c1c334f by Victor Stinner in branch '3.6':
[3.6] bpo-26762, bpo-31019: Backport multiprocessing fixes from master to 3.6 (#2879)
https://github.com/python/cpython/commit/d0adfb25c5082046133a18fd185375508c1c334f
msg299190 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-07-26 02:53
Ok, now test_multiprocessing_spawn should not "leak" processes nor threads anymore. I fixed Python 3.6 and master. I close the issue.

I don't think that it's worth it to fix tests in 2.7, and 3.5 is entering security-only mode.

Note: I will backport bpo-31019 to Python 2.7, since this issue changes the code, not tests.
msg300104 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-08-10 15:36
New changeset 957d0e9b59bd27ca7c473560634d8b5dbe66338c by Victor Stinner in branch 'master':
bpo-26762: _test_multiprocessing reports dangling (#3064)
https://github.com/python/cpython/commit/957d0e9b59bd27ca7c473560634d8b5dbe66338c
History
Date User Action Args
2017-08-10 15:36:53hayposetmessages: + msg300104
2017-08-10 15:08:53hayposetpull_requests: + pull_request3101
2017-07-26 02:53:22hayposetstatus: open -> closed
resolution: fixed
messages: + msg299190

stage: resolved
2017-07-26 02:48:58hayposetmessages: + msg299188
2017-07-26 02:27:15hayposetpull_requests: + pull_request2930
2017-07-25 01:05:43hayposetmessages: + msg299037
2017-07-25 00:40:59hayposetmessages: + msg299036
2017-07-24 23:55:56hayposetmessages: + msg299034
2017-07-24 23:53:41hayposetpull_requests: + pull_request2906
2017-07-24 23:34:27hayposetmessages: + msg299033
2017-07-24 22:33:58hayposetmessages: + msg299024
2017-07-24 18:10:20hayposetmessages: + msg299010
2017-07-24 18:09:12hayposetmessages: + msg299009
2017-07-24 18:06:06hayposetmessages: + msg299008
2017-07-24 17:52:59hayposetpull_requests: + pull_request2899
2017-07-24 17:46:57hayposetmessages: + msg299004
2017-07-24 16:23:50hayposetmessages: + msg298988
2017-07-24 11:13:06hayposetmessages: + msg298959
2017-07-24 11:02:23hayposetmessages: + msg298957
2017-07-24 11:00:19hayposetmessages: + msg298955
2017-07-24 10:52:45hayposetmessages: + msg298954
2017-07-24 10:49:37hayposetpull_requests: + pull_request2886
2017-07-24 10:43:31hayposetmessages: + msg298952
2017-07-24 10:31:26hayposetpull_requests: + pull_request2885
2017-07-22 22:01:46pitrousetnosy: + pitrou
messages: + msg298877
2016-04-15 09:06:37hayposetmessages: + msg263461
2016-04-15 05:48:25serhiy.storchakasetnosy: + haypo, serhiy.storchaka
2016-04-15 04:06:05ned.deilysetnosy: + davin
2016-04-15 04:04:25martin.pantercreate