classification
Title: multiprocessing.Pool.join() always takes at least 100 ms
Type: Stage: patch review
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: 35478 35493 Superseder:
Assigned To: Nosy List: vstinner
Priority: normal Keywords: patch

Created on 2018-12-13 00:36 by vstinner, last changed 2019-09-14 05:40 by thatiparthy.

Pull Requests
URL Status Linked Edit
PR 11136 closed vstinner, 2018-12-13 00:53
PR 10564 closed thatiparthy, 2019-09-14 05:40
Messages (5)
msg331726 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-13 00:36
The join() method of multiprocessing.Pool calls self._worker_handler.join(): it's a thread running _handle_workers(). The core of this thread function is:

        while thread._state == RUN or (pool._cache and thread._state != TERMINATE):
            pool._maintain_pool()
            time.sleep(0.1)

I understand that the delay of 100 ms is used to check regularly the stop condition changed. This sleep causes a mandatory delay of 100 ms on Pool.join().
msg331727 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-13 01:14
Attached PR 11136 modify _worker_handler() loop to wait on threading.Event events, so Pool.join() completes as soon as possible.

Example:
---
import multiprocessing
import time

def the_test():
    start_time = time.monotonic()
    pool = multiprocessing.Pool(1)
    res = pool.apply_async(int, ("1",))
    pool.close()
    #pool.terminate()
    pool.join()
    dt = time.monotonic() - start_time
    print("%.3f sec" % dt)

the_test()
---

Minimum timing with _handle_results() using:

* current code (time.sleep(0.1)): min 0.132 sec
* time.sleep(1.0): min 1.033 sec
* my PR using events (wait(0.1)): min 0.033 sec

Currently, join() minimum timing depends on _handle_results() sleep() duration (100 ms).

With my PR, it completes as soon as possible: when state change and/or when a result is set.

My PR still requires an hardcoded delay of 100 ms to workaround bpo-35478 bug: results are never set if the pool is terminated.
msg331794 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-14 11:06
My PR 11136 doesn't work: _maintain_pool() should be called frequently to check when a worker completed. Polling worker exit status seems inefficient :-(

asyncio uses SIGCHLD signal to be notified when a child process completes. SafeChildWatcher calls os.waitpid(pid, os.WNOHANG) on each child process, whereas FastChildWatcher() uses os.waitpid(-1, os.WNOHANG).
msg331800 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-14 11:33
> My PR 11136 doesn't work: _maintain_pool() should be called frequently to check when a worker completed. Polling worker exit status seems inefficient :-(

I created bpo-35493: "multiprocessing.Pool._worker_handler(): use SIGCHLD to be notified on worker exit".
msg331805 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-14 11:56
_worker_handler has two issues:

* It polls the worker status every status every 100 ms: I created bpo-35493 to investigate how to avoid that
* After close() or terminate() has been called, it loops until self._cache is empty. I would like to use result.wait(), but a result never completes after terminate(): I created bpo-35478 to see if tasks can be unblocked in that case (to ensure that result.wait() completes after terminate().
History
Date User Action Args
2019-09-14 05:40:45thatiparthysetpull_requests: + pull_request15744
2018-12-14 11:56:25vstinnersetdependencies: + multiprocessing: ApplyResult.get() hangs if the pool is terminated, multiprocessing.Pool._worker_handler(): use SIGCHLD to be notified on worker exit
messages: + msg331805
2018-12-14 11:33:27vstinnersetmessages: + msg331800
2018-12-14 11:06:45vstinnersetmessages: + msg331794
2018-12-13 01:14:17vstinnersetmessages: + msg331727
2018-12-13 00:53:48vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request10368
2018-12-13 00:36:54vstinnercreate