classification
Title: multiprocessing.Pool._worker_handler(): use SIGCHLD to be notified on worker exit
Type: Stage: patch review
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: davin, pablogsal, pitrou, vstinner
Priority: normal Keywords: patch, patch, patch

Created on 2018-12-14 11:27 by vstinner, last changed 2019-01-09 23:40 by pablogsal.

Pull Requests
URL Status Linked Edit
PR 11488 open pablogsal, 2019-01-09 23:40
PR 11488 open pablogsal, 2019-01-09 23:40
PR 11488 open pablogsal, 2019-01-09 23:40
Messages (9)
msg331797 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-14 11:27
Currently, multiprocessing.Pool._worker_handler() checks every 100 ms if a worker exited using time.sleep(0.1). It causes a latency if worker exit frequently and the pool has to execute a large number of tasks.

Worst case:
---
import multiprocessing
import time
CONCURRENCY = 1
NTASK = 100
def noop():
    pass
with multiprocessing.Pool(CONCURRENCY, maxtasksperchild=1) as pool:
    start_time = time.monotonic()
    results = [pool.apply_async(noop, ()) for _ in range(NTASK)]
    for result in results:
        result.get()
    dt = time.monotonic() - start_time
    pool.terminate()
    pool.join()
print("Total: %.1f sec" % dt)
---

Output:
---
Total: 10.2 sec
---

The worst case is a pool of 1 process, each worker only executes a single task and the task does nothing (minimize task execution time): the latency is 100 ms per task, which means 10 seconds for 100 tasks.

Using SIGCHLD signal to be notified when a worker completes would allow to avoid polling: reduce the latency and reduce CPU usage (the thread doesn't have to be awaken every 100 ms anymore).
msg331799 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-14 11:33
asyncio uses SIGCHLD signal to be notified when a child process completes. SafeChildWatcher calls os.waitpid(pid, os.WNOHANG) on each child process, whereas FastChildWatcher() uses os.waitpid(-1, os.WNOHANG).
msg331801 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-14 11:33
See also bpo-35479: multiprocessing.Pool.join() always takes at least 100 ms.
msg331803 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-12-14 11:39
How do you use SIGCHLD on Windows?

There is actually a portable (and robust) solution: use Process.sentinel
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process.sentinel

There is another issue: Pool is currently subclassed by ThreadPool. You'll probably have to make the two implementations diverge a bit.
msg331804 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-14 11:48
> How do you use SIGCHLD on Windows?

I'm only proposing to use a signal when it's available, on UNIX. So have multiple implementations of the function, depending on the ability to get notified on completion without polling.

On Windows, maybe we could use a dedicated thread to set an event once WaitForSingleObject/WaitForMultipleObjects completes?

The design of my bpo-35479 change is to replace polling with one or multiple events. Maybe we can use an event to wakeup _worker_handler() when something happens, but have different wants to signal this event.

I have to investigate how Process.sentinel can be used here.

I might be interesting to use asyncio internally, but I'm not sure if it's possible ;-)
msg331807 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-12-14 12:05
Using asyncio internally would be an interesting long-term goal, at least for the process pool version.

Perhaps a first step is to find out how to await a multiprocessing Connection or Queue, or make async versions of these classes.
msg331810 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-12-14 12:08
> I have to investigate how Process.sentinel can be used here.

Look how concurrent.futures uses it:
https://github.com/python/cpython/blob/master/Lib/concurrent/futures/process.py#L348

This also means:
1) we could redirect people to ProcessPoolExecutor instead of trying to backport all its features into multiprocessing.Pool
2) we could try to refactor the ProcessPoolExecutor implementation into a common backend for both ProcessPoolExecutor and multiprocessing.Pool
msg333348 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-01-09 22:13
@Antoine Do you think we should start planning one of these long term solutions or we should start trying to use Process.sentinel as a short term solution for this particular issue?
msg333350 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2019-01-09 22:20
If using Process.sentinel looks easy enough, then go for it.  Existing users of multiprocessing.Pool will benefit.
History
Date User Action Args
2019-01-09 23:40:36pablogsalsetkeywords: + patch
stage: patch review
pull_requests: + pull_request11003
2019-01-09 23:40:27pablogsalsetkeywords: + patch
stage: (no value)
pull_requests: + pull_request11002
2019-01-09 23:40:19pablogsalsetkeywords: + patch
stage: (no value)
pull_requests: + pull_request11001
2019-01-09 22:20:29pitrousetmessages: + msg333350
2019-01-09 22:13:08pablogsalsetmessages: + msg333348
2018-12-14 12:08:59pitrousetmessages: + msg331810
2018-12-14 12:05:47pitrousetmessages: + msg331807
2018-12-14 11:56:25vstinnerlinkissue35479 dependencies
2018-12-14 11:48:02vstinnersetmessages: + msg331804
2018-12-14 11:39:03pitrousetmessages: + msg331803
2018-12-14 11:33:41vstinnersetmessages: + msg331801
2018-12-14 11:33:11vstinnersetmessages: + msg331799
2018-12-14 11:27:54vstinnercreate