Author pierreglaser
Recipients pablogsal, pierreglaser, pitrou
Date 2019-05-09.17:36:00
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1557423361.05.0.0948606859067.issue36867@roundup.psfhosted.org>
In-reply-to
Content
Hi all,

Olivier Grisel, Thomas Moreau and myself are currently working on increasing
the range of action of the semaphore_tracker in Python.

multiprocessing.semaphore_tracker is a little known module, that launches a
server process used to track the life cycle of semaphores created in a python
session, and potentially cleanup those semaphores after all python processes of
the session terminated. Normally, python processes cleanup semaphores they
create. This is however not not guaranteed if the processes get violently
interrupted (using for example the bash command "killall python")

A note on why the semaphore_tracker was introduced: Cleaning up semaphores
after termination is important because the system only supports a limited
number of named semaphores, and they will not be automatically removed till the
next reboot.

Now, Python 3.8 introduces shared memory segments creation. Shared memory is
another sensitive global system resource. Currently, unexpected termination of
processes that created memory segments will result in leaking those memory
segments. This can be problematic for large compute clusters with many users
and that are rebooted rarely.

For this reason, we expanded the semaphore_tracker to also track shared memory
segments, and renamed it resource_tracker. Shared memory segments get
automatically tracked by the resource tracker when they are created. This is a
first, self-contained fix. (1)

Additionally, supporting shared memory tracking led to a more generic design
for the resource_tracker. The resource_tracker can be now easily extended
to track arbitrary resource types.
A public API could potentially be exposed for users willing to track other
types.  One for example may want to add tracking for temporary folders creating
during python sessions.  Another use case is the one of joblib, which
is a widely-used parallel-computing package, and also the backend of
scikit-learn. Joblib relies heavily on memmapping. A public API could extend
the resource_tracker to track memmap-ed objects with very little code.

Therefore, this issue serves two purposes:
- referencing the semaphore_tracker enhancement mentioned in (1)
- discussing a potentially public resource_tracker API.
History
Date User Action Args
2019-05-09 17:36:01pierreglasersetrecipients: + pierreglaser, pitrou, pablogsal
2019-05-09 17:36:01pierreglasersetmessageid: <1557423361.05.0.0948606859067.issue36867@roundup.psfhosted.org>
2019-05-09 17:36:00pierreglaserlinkissue36867 messages
2019-05-09 17:36:00pierreglasercreate