This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: multiprocessing.Pool._worker_handler(): use SIGCHLD to be notified on worker exit
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: arekm, davin, pablogsal, pitrou, stefanor, vstinner
Priority: normal Keywords: patch, patch, patch

Created on 2018-12-14 11:27 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 11488 merged pablogsal, 2019-01-09 23:40
PR 11488 merged pablogsal, 2019-01-09 23:40
PR 11488 merged pablogsal, 2019-01-09 23:40
Messages (12)
msg331797 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-14 11:27
Currently, multiprocessing.Pool._worker_handler() checks every 100 ms if a worker exited using time.sleep(0.1). It causes a latency if worker exit frequently and the pool has to execute a large number of tasks.

Worst case:
---
import multiprocessing
import time
CONCURRENCY = 1
NTASK = 100
def noop():
    pass
with multiprocessing.Pool(CONCURRENCY, maxtasksperchild=1) as pool:
    start_time = time.monotonic()
    results = [pool.apply_async(noop, ()) for _ in range(NTASK)]
    for result in results:
        result.get()
    dt = time.monotonic() - start_time
    pool.terminate()
    pool.join()
print("Total: %.1f sec" % dt)
---

Output:
---
Total: 10.2 sec
---

The worst case is a pool of 1 process, each worker only executes a single task and the task does nothing (minimize task execution time): the latency is 100 ms per task, which means 10 seconds for 100 tasks.

Using SIGCHLD signal to be notified when a worker completes would allow to avoid polling: reduce the latency and reduce CPU usage (the thread doesn't have to be awaken every 100 ms anymore).
msg331799 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-14 11:33
asyncio uses SIGCHLD signal to be notified when a child process completes. SafeChildWatcher calls os.waitpid(pid, os.WNOHANG) on each child process, whereas FastChildWatcher() uses os.waitpid(-1, os.WNOHANG).
msg331801 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-14 11:33
See also bpo-35479: multiprocessing.Pool.join() always takes at least 100 ms.
msg331803 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-12-14 11:39
How do you use SIGCHLD on Windows?

There is actually a portable (and robust) solution: use Process.sentinel
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process.sentinel

There is another issue: Pool is currently subclassed by ThreadPool. You'll probably have to make the two implementations diverge a bit.
msg331804 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-14 11:48
> How do you use SIGCHLD on Windows?

I'm only proposing to use a signal when it's available, on UNIX. So have multiple implementations of the function, depending on the ability to get notified on completion without polling.

On Windows, maybe we could use a dedicated thread to set an event once WaitForSingleObject/WaitForMultipleObjects completes?

The design of my bpo-35479 change is to replace polling with one or multiple events. Maybe we can use an event to wakeup _worker_handler() when something happens, but have different wants to signal this event.

I have to investigate how Process.sentinel can be used here.

I might be interesting to use asyncio internally, but I'm not sure if it's possible ;-)
msg331807 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-12-14 12:05
Using asyncio internally would be an interesting long-term goal, at least for the process pool version.

Perhaps a first step is to find out how to await a multiprocessing Connection or Queue, or make async versions of these classes.
msg331810 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-12-14 12:08
> I have to investigate how Process.sentinel can be used here.

Look how concurrent.futures uses it:
https://github.com/python/cpython/blob/master/Lib/concurrent/futures/process.py#L348

This also means:
1) we could redirect people to ProcessPoolExecutor instead of trying to backport all its features into multiprocessing.Pool
2) we could try to refactor the ProcessPoolExecutor implementation into a common backend for both ProcessPoolExecutor and multiprocessing.Pool
msg333348 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-01-09 22:13
@Antoine Do you think we should start planning one of these long term solutions or we should start trying to use Process.sentinel as a short term solution for this particular issue?
msg333350 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2019-01-09 22:20
If using Process.sentinel looks easy enough, then go for it.  Existing users of multiprocessing.Pool will benefit.
msg338106 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-03-16 22:34
New changeset 7c994549dcffd0d9d3bb37475e6374f356e7240e by Pablo Galindo in branch 'master':
bpo-35493: Use Process.sentinel instead of sleeping for polling worker status in multiprocessing.Pool (#11488)
https://github.com/python/cpython/commit/7c994549dcffd0d9d3bb37475e6374f356e7240e
msg361465 - (view) Author: Stefano Rivera (stefanor) * Date: 2020-02-06 01:20
This change seems to be causing a deadlock in multiprocessing shut-down: bpo-38501
msg364179 - (view) Author: Arkadiusz Miśkiewicz (arekm) Date: 2020-03-14 15:01
And also https://bugs.python.org/issue38744 on Linux and FreeBSD
History
Date User Action Args
2022-04-11 14:59:09adminsetgithub: 79674
2021-09-21 22:26:17vstinnerunlinkissue35479 dependencies
2020-03-14 15:01:39arekmsetnosy: + arekm
messages: + msg364179
2020-02-06 01:20:27stefanorsetnosy: + stefanor
messages: + msg361465
2019-03-16 22:36:14pablogsalsetkeywords: patch, patch, patch
status: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-03-16 22:34:26pablogsalsetmessages: + msg338106
2019-01-09 23:40:36pablogsalsetkeywords: + patch
stage: patch review
pull_requests: + pull_request11003
2019-01-09 23:40:27pablogsalsetkeywords: + patch
stage: (no value)
pull_requests: + pull_request11002
2019-01-09 23:40:19pablogsalsetkeywords: + patch
stage: (no value)
pull_requests: + pull_request11001
2019-01-09 22:20:29pitrousetmessages: + msg333350
2019-01-09 22:13:08pablogsalsetmessages: + msg333348
2018-12-14 12:08:59pitrousetmessages: + msg331810
2018-12-14 12:05:47pitrousetmessages: + msg331807
2018-12-14 11:56:25vstinnerlinkissue35479 dependencies
2018-12-14 11:48:02vstinnersetmessages: + msg331804
2018-12-14 11:39:03pitrousetmessages: + msg331803
2018-12-14 11:33:41vstinnersetmessages: + msg331801
2018-12-14 11:33:11vstinnersetmessages: + msg331799
2018-12-14 11:27:54vstinnercreate