Issue 36867: Make semaphore_tracker track other system resources

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/81048

classification

Title:	Make semaphore_tracker track other system resources
Type:	resource usage	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.8

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	Olivier.Grisel, pablogsal, pierreglaser, pitrou, vstinner, xtreak
Priority:	normal	Keywords:	patch

Created on 2019-05-09 17:36 by pierreglaser, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 13222	merged	pierreglaser, 2019-05-09 18:12
PR 13276	merged	pierreglaser, 2019-05-13 11:27
PR 13290	merged	pitrou, 2019-05-13 17:09
PR 13292	merged	pierreglaser, 2019-05-13 17:25

Messages (15)
msg341987 - (view)	Author: Pierre Glaser (pierreglaser) *	Date: 2019-05-09 17:36
Hi all, Olivier Grisel, Thomas Moreau and myself are currently working on increasing the range of action of the semaphore_tracker in Python. multiprocessing.semaphore_tracker is a little known module, that launches a server process used to track the life cycle of semaphores created in a python session, and potentially cleanup those semaphores after all python processes of the session terminated. Normally, python processes cleanup semaphores they create. This is however not not guaranteed if the processes get violently interrupted (using for example the bash command "killall python") A note on why the semaphore_tracker was introduced: Cleaning up semaphores after termination is important because the system only supports a limited number of named semaphores, and they will not be automatically removed till the next reboot. Now, Python 3.8 introduces shared memory segments creation. Shared memory is another sensitive global system resource. Currently, unexpected termination of processes that created memory segments will result in leaking those memory segments. This can be problematic for large compute clusters with many users and that are rebooted rarely. For this reason, we expanded the semaphore_tracker to also track shared memory segments, and renamed it resource_tracker. Shared memory segments get automatically tracked by the resource tracker when they are created. This is a first, self-contained fix. (1) Additionally, supporting shared memory tracking led to a more generic design for the resource_tracker. The resource_tracker can be now easily extended to track arbitrary resource types. A public API could potentially be exposed for users willing to track other types. One for example may want to add tracking for temporary folders creating during python sessions. Another use case is the one of joblib, which is a widely-used parallel-computing package, and also the backend of scikit-learn. Joblib relies heavily on memmapping. A public API could extend the resource_tracker to track memmap-ed objects with very little code. Therefore, this issue serves two purposes: - referencing the semaphore_tracker enhancement mentioned in (1) - discussing a potentially public resource_tracker API.
msg342133 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2019-05-10 20:59
New changeset f22cc69b012f52882d434a5c44a004bc3aa5c33c by Antoine Pitrou (Pierre Glaser) in branch 'master': bpo-36867: Make semaphore_tracker track other system resources (GH-13222) https://github.com/python/cpython/commit/f22cc69b012f52882d434a5c44a004bc3aa5c33c
msg342201 - (view)	Author: Pierre Glaser (pierreglaser) *	Date: 2019-05-11 15:58
Shared memory segments are now tracked by the brand new resource_tracker! Thanks Antoine for the review. Does anyone have an opinion on introducing a public API for users to make the resource_tracker track resources of their choice? What We have in mind is: - making public the existing resource_tracker.register/unregister - adding a new function, resource_tracker.make_trackable(resource_type, cleanup_func), where: * resource_type is a string (an identifier that will be used each time a resource of the type resource_type needs tracking, via the call resource_tracker.register(resource_name, resource_type) * cleanup_func must be a callable taking a single string as argument (the name of the resource that needs tracking). This function will be called after the end of a process for reach resource of type resource_type the process did not clean properly Under the hood, make_trackable simply populates resource_tracker._CLEANUP_FUNCS with new items. Here is a simple example: import os import resource_tracker import shutil from multiprocessing import util class ClassCreatingAFolder: """Class where each instance creates a temporary folder. Each temporary folder is supposed to exist for the duration of the instance that created it. """ def __init__(self, folder_name): self.folder_name = folder_name os.mkdir(folder_name) # any instance normally garbage-collected should remove its folder, and # notice the resource_tracker that its folder was correctly removed. util.Finalize(self, ClassCreatingAFolder.cleanup, args=(folder_name,)) # If this session quits abruptly, the finalizer will not be called for # the instances of ClassCreatingAFolder that were still alive # before the shutdown. The resource_tracker comes into play, and removes # the folders associated to each of these resources. resource_tracker.register( folder_name, # argument to shutil.rmtree "ClassCreatingAFolder") @staticmethod def cleanup(folder_name): resource_tracker.unregister(folder_name, "ClassCreatingAFolder") shutil.rmtree(folder_name) # Tell the resource_tracker how to cleanup resources created by # ClassCreatingAFolder instances resource_tracker.make_trackable("ClassCreatingAFolder", shutil.rmtree) Typical resources that can be made trackable include memmaped objects, temporary folders. Our use-case is joblib that has its own mmap type that we would like to track using the semaphore_tracker. Any thoughts?
msg342246 - (view)	Author: Karthikeyan Singaravelan (xtreak) *	Date: 2019-05-12 08:27
Since the commit there is a warning in CI and locally while running tests. Travis CI : * https://travis-ci.org/python/cpython/jobs/531304357#L2099 (test_multiprocessing_forkserver) * https://travis-ci.org/python/cpython/jobs/531304357#L2296 (test_multiprocessing_fork) Before commit : ./python.exe -X tracemalloc -m test --fail-env-changed test_multiprocessing_forkserver Run tests sequentially 0:00:00 load avg: 4.71 [1/1] test_multiprocessing_forkserver test_multiprocessing_forkserver passed in 2 min 21 sec == Tests result: SUCCESS == 1 test OK. Total duration: 2 min 21 sec Tests result: SUCCESS After commit f22cc69b012f52882d434a5c44a004bc3aa5c33c : ./python.exe -X tracemalloc -m test --fail-env-changed test_multiprocessing_forkserver Run tests sequentially 0:00:00 load avg: 4.08 [1/1] test_multiprocessing_forkserver /Users/karthikeyansingaravelan/stuff/python/cpython/Lib/multiprocessing/resource_tracker.py:198: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' /Users/karthikeyansingaravelan/stuff/python/cpython/Lib/unittest/case.py:683: ResourceWarning: unclosed file <_io.BufferedReader name=7> testMethod() Object allocated at (most recent call last): File "/Users/karthikeyansingaravelan/stuff/python/cpython/Lib/subprocess.py", lineno 819 self.stdout = io.open(c2pread, 'rb', bufsize) test_multiprocessing_forkserver passed in 2 min 20 sec == Tests result: SUCCESS == 1 test OK. Total duration: 2 min 20 sec Tests result: SUCCESS == Tests result: SUCCESS == 1 test OK. Total duration: 2 min 26 sec Tests result: SUCCESS
msg342307 - (view)	Author: Pierre Glaser (pierreglaser) *	Date: 2019-05-13 10:42
Yes, one test I wrote in an unrelated commit does not unlink a memory segment. Now the ResourceTracker complains. Fixing it now.
msg342308 - (view)	Author: Pierre Glaser (pierreglaser) *	Date: 2019-05-13 10:47
Actually, I was properly unlinking the shared_memory segments. The warning messages are due to bad interactions between the ResourceTracker and the SharedMemoryManager object. In this particular case, it's easy to change a little bit the problematic test to avoid the warnings. I will focus on solving those bad interactions right after.
msg342309 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-05-13 10:58
test_shared_memory_cleaned_after_process_termination() uses time as a weak synchronization primitive: # killing abruptly processes holding reference to a shared memory # segment should not leak the given memory segment. p.terminate() p.wait() time.sleep(1.0) # wait for the OS to collect the segment with self.assertRaises(FileNotFoundError): smm = shared_memory.SharedMemory(name, create=False) Would it be possible to use a more reliable synchronization? Such test usually fail randomly. https://pythondev.readthedocs.io/unstable_tests.html
msg342319 - (view)	Author: Olivier Grisel (Olivier.Grisel) *	Date: 2019-05-13 12:50
As Victor said, the `time.sleep(1.0)` might lead to Heisen failures. I am not sure how to write proper strong synchronization in this case but we could instead go for something intermediate such as the following pattern: ... p.terminate() p.wait() for i in range(60): try: shared_memory.SharedMemory(name, create=False) except FileNotFoundError: # the OS successfully collected the segment as expected break time.sleep(1.0) # wait for the OS to collect the segment else: raise AssertionError(f"Failed to collect shared_memory segment {name}") What do you think?
msg342321 - (view)	Author: Pierre Glaser (pierreglaser) *	Date: 2019-05-13 12:54
We can do that, or maybe we can try to wait on the `resource_tracker's` pid?
msg342326 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-05-13 13:09
I like Olivier's pattern. Maybe we can slowly increase the sleep to stop shortly if the resource goes away shortly. deadline = time.monotonic() + 60.0 sleep = 0.010 while ...: if ....: break if time.monotonic() > deadline: ... assert error ... sleep = min(sleep * 2, 5.0) time.sleep(1.0) It's kind of a common pattern. Maybe it should be an helper in test.support module. > We can do that, or maybe we can try to wait on the `resource_tracker's` pid? I prefer to make sure that the resource goes away without inspecting multiprocessing internals.
msg342365 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2019-05-13 17:20
New changeset 50466c66509de556a8579172f82d1abb1560d8e4 by Antoine Pitrou (Pierre Glaser) in branch 'master': bpo-36867: DOC update multiprocessing.rst (GH-13289) https://github.com/python/cpython/commit/50466c66509de556a8579172f82d1abb1560d8e4
msg342374 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2019-05-13 19:15
New changeset b1dfcad6f0d3a52c9ac31fb9763fc7962a84b27c by Antoine Pitrou (Pierre Glaser) in branch 'master': bpo-36867: Create the resource_tracker before launching SharedMemoryManagers (GH-13276) https://github.com/python/cpython/commit/b1dfcad6f0d3a52c9ac31fb9763fc7962a84b27c
msg342749 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-05-17 18:20
New changeset cbe72d842646ded2454784679231e3d1e6252e72 by Victor Stinner (Pierre Glaser) in branch 'master': bpo-36867: _test_multiprocessing: avoid weak sync primitive (GH-13292) https://github.com/python/cpython/commit/cbe72d842646ded2454784679231e3d1e6252e72
msg342752 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-05-17 18:22
It seems like all known bugs are fixed, I close again the issue. Thanks!
msg345322 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-06-12 11:47
The new test is not reliable, see: bpo-37244 "test_multiprocessing_forkserver: test_resource_tracker() failed on x86 Gentoo Refleaks 3.8".

History
Date	User	Action	Args
2022-04-11 14:59:14	admin	set	github: 81048
2019-06-12 11:47:54	vstinner	set	messages: + msg345322
2019-05-17 18:22:48	vstinner	set	status: open -> closed resolution: fixed messages: + msg342752 stage: patch review -> resolved
2019-05-17 18:20:30	vstinner	set	messages: + msg342749
2019-05-13 19:15:51	pitrou	set	messages: + msg342374
2019-05-13 17:25:47	pierreglaser	set	pull_requests: + pull_request13203
2019-05-13 17:20:53	pitrou	set	messages: + msg342365
2019-05-13 17:09:06	pitrou	set	pull_requests: + pull_request13200
2019-05-13 13:09:26	vstinner	set	messages: + msg342326
2019-05-13 12:54:01	pierreglaser	set	messages: + msg342321
2019-05-13 12:50:35	Olivier.Grisel	set	nosy: + Olivier.Grisel messages: + msg342319
2019-05-13 11:27:10	pierreglaser	set	pull_requests: + pull_request13182
2019-05-13 10:58:46	vstinner	set	nosy: + vstinner messages: + msg342309
2019-05-13 10:47:12	pierreglaser	set	messages: + msg342308
2019-05-13 10:42:26	pierreglaser	set	messages: + msg342307
2019-05-12 08:27:47	xtreak	set	nosy: + xtreak messages: + msg342246
2019-05-11 15:58:49	pierreglaser	set	messages: + msg342201
2019-05-10 20:59:12	pitrou	set	messages: + msg342133
2019-05-09 18:12:28	pierreglaser	set	keywords: + patch stage: patch review pull_requests: + pull_request13132
2019-05-09 17:36:01	pierreglaser	create