Message 341987 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	pierreglaser
Recipients	pablogsal, pierreglaser, pitrou
Date	2019-05-09.17:36:00
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1557423361.05.0.0948606859067.issue36867@roundup.psfhosted.org>
In-reply-to

Content
Hi all, Olivier Grisel, Thomas Moreau and myself are currently working on increasing the range of action of the semaphore_tracker in Python. multiprocessing.semaphore_tracker is a little known module, that launches a server process used to track the life cycle of semaphores created in a python session, and potentially cleanup those semaphores after all python processes of the session terminated. Normally, python processes cleanup semaphores they create. This is however not not guaranteed if the processes get violently interrupted (using for example the bash command "killall python") A note on why the semaphore_tracker was introduced: Cleaning up semaphores after termination is important because the system only supports a limited number of named semaphores, and they will not be automatically removed till the next reboot. Now, Python 3.8 introduces shared memory segments creation. Shared memory is another sensitive global system resource. Currently, unexpected termination of processes that created memory segments will result in leaking those memory segments. This can be problematic for large compute clusters with many users and that are rebooted rarely. For this reason, we expanded the semaphore_tracker to also track shared memory segments, and renamed it resource_tracker. Shared memory segments get automatically tracked by the resource tracker when they are created. This is a first, self-contained fix. (1) Additionally, supporting shared memory tracking led to a more generic design for the resource_tracker. The resource_tracker can be now easily extended to track arbitrary resource types. A public API could potentially be exposed for users willing to track other types. One for example may want to add tracking for temporary folders creating during python sessions. Another use case is the one of joblib, which is a widely-used parallel-computing package, and also the backend of scikit-learn. Joblib relies heavily on memmapping. A public API could extend the resource_tracker to track memmap-ed objects with very little code. Therefore, this issue serves two purposes: - referencing the semaphore_tracker enhancement mentioned in (1) - discussing a potentially public resource_tracker API.

Hi all,

Olivier Grisel, Thomas Moreau and myself are currently working on increasing
the range of action of the semaphore_tracker in Python.

multiprocessing.semaphore_tracker is a little known module, that launches a
server process used to track the life cycle of semaphores created in a python
session, and potentially cleanup those semaphores after all python processes of
the session terminated. Normally, python processes cleanup semaphores they
create. This is however not not guaranteed if the processes get violently
interrupted (using for example the bash command "killall python")

A note on why the semaphore_tracker was introduced: Cleaning up semaphores
after termination is important because the system only supports a limited
number of named semaphores, and they will not be automatically removed till the
next reboot.

Now, Python 3.8 introduces shared memory segments creation. Shared memory is
another sensitive global system resource. Currently, unexpected termination of
processes that created memory segments will result in leaking those memory
segments. This can be problematic for large compute clusters with many users
and that are rebooted rarely.

For this reason, we expanded the semaphore_tracker to also track shared memory
segments, and renamed it resource_tracker. Shared memory segments get
automatically tracked by the resource tracker when they are created. This is a
first, self-contained fix. (1)

Additionally, supporting shared memory tracking led to a more generic design
for the resource_tracker. The resource_tracker can be now easily extended
to track arbitrary resource types.
A public API could potentially be exposed for users willing to track other
types.  One for example may want to add tracking for temporary folders creating
during python sessions.  Another use case is the one of joblib, which
is a widely-used parallel-computing package, and also the backend of
scikit-learn. Joblib relies heavily on memmapping. A public API could extend
the resource_tracker to track memmap-ed objects with very little code.

Therefore, this issue serves two purposes:
- referencing the semaphore_tracker enhancement mentioned in (1)
- discussing a potentially public resource_tracker API.

History
Date	User	Action	Args
2019-05-09 17:36:01	pierreglaser	set	recipients: + pierreglaser, pitrou, pablogsal
2019-05-09 17:36:01	pierreglaser	set	messageid: <1557423361.05.0.0948606859067.issue36867@roundup.psfhosted.org>
2019-05-09 17:36:00	pierreglaser	link	issue36867 messages
2019-05-09 17:36:00	pierreglaser	create