Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiprocessing.Pool._worker_handler(): use SIGCHLD to be notified on worker exit #79674

Closed
vstinner opened this issue Dec 14, 2018 · 12 comments
Closed
Labels
3.8 only security fixes stdlib Python modules in the Lib dir

Comments

@vstinner
Copy link
Member

BPO 35493
Nosy @arekm, @pitrou, @vstinner, @stefanor, @applio, @pablogsal
PRs
  • bpo-35493: Use Process.sentinel instead of sleeping for polling worker status in multiprocessing.Pool #11488
  • bpo-35493: Use Process.sentinel instead of sleeping for polling worker status in multiprocessing.Pool #11488
  • bpo-35493: Use Process.sentinel instead of sleeping for polling worker status in multiprocessing.Pool #11488
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-03-16.22:36:14.592>
    created_at = <Date 2018-12-14.11:27:54.050>
    labels = ['3.8', 'library']
    title = 'multiprocessing.Pool._worker_handler(): use SIGCHLD to be notified on worker exit'
    updated_at = <Date 2020-03-14.15:01:39.718>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2020-03-14.15:01:39.718>
    actor = 'arekm'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-03-16.22:36:14.592>
    closer = 'pablogsal'
    components = ['Library (Lib)']
    creation = <Date 2018-12-14.11:27:54.050>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 35493
    keywords = ['patch', 'patch', 'patch']
    message_count = 12.0
    messages = ['331797', '331799', '331801', '331803', '331804', '331807', '331810', '333348', '333350', '338106', '361465', '364179']
    nosy_count = 6.0
    nosy_names = ['arekm', 'pitrou', 'vstinner', 'stefanor', 'davin', 'pablogsal']
    pr_nums = ['11488', '11488', '11488']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue35493'
    versions = ['Python 3.8']

    @vstinner
    Copy link
    Member Author

    Currently, multiprocessing.Pool._worker_handler() checks every 100 ms if a worker exited using time.sleep(0.1). It causes a latency if worker exit frequently and the pool has to execute a large number of tasks.

    Worst case:
    ---

    import multiprocessing
    import time
    CONCURRENCY = 1
    NTASK = 100
    def noop():
        pass
    with multiprocessing.Pool(CONCURRENCY, maxtasksperchild=1) as pool:
        start_time = time.monotonic()
        results = [pool.apply_async(noop, ()) for _ in range(NTASK)]
        for result in results:
            result.get()
        dt = time.monotonic() - start_time
        pool.terminate()
        pool.join()
    print("Total: %.1f sec" % dt)

    Output:
    ---
    Total: 10.2 sec
    ---

    The worst case is a pool of 1 process, each worker only executes a single task and the task does nothing (minimize task execution time): the latency is 100 ms per task, which means 10 seconds for 100 tasks.

    Using SIGCHLD signal to be notified when a worker completes would allow to avoid polling: reduce the latency and reduce CPU usage (the thread doesn't have to be awaken every 100 ms anymore).

    @vstinner vstinner added 3.8 only security fixes stdlib Python modules in the Lib dir labels Dec 14, 2018
    @vstinner
    Copy link
    Member Author

    asyncio uses SIGCHLD signal to be notified when a child process completes. SafeChildWatcher calls os.waitpid(pid, os.WNOHANG) on each child process, whereas FastChildWatcher() uses os.waitpid(-1, os.WNOHANG).

    @vstinner
    Copy link
    Member Author

    See also bpo-35479: multiprocessing.Pool.join() always takes at least 100 ms.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 14, 2018

    How do you use SIGCHLD on Windows?

    There is actually a portable (and robust) solution: use Process.sentinel
    https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process.sentinel

    There is another issue: Pool is currently subclassed by ThreadPool. You'll probably have to make the two implementations diverge a bit.

    @vstinner
    Copy link
    Member Author

    How do you use SIGCHLD on Windows?

    I'm only proposing to use a signal when it's available, on UNIX. So have multiple implementations of the function, depending on the ability to get notified on completion without polling.

    On Windows, maybe we could use a dedicated thread to set an event once WaitForSingleObject/WaitForMultipleObjects completes?

    The design of my bpo-35479 change is to replace polling with one or multiple events. Maybe we can use an event to wakeup _worker_handler() when something happens, but have different wants to signal this event.

    I have to investigate how Process.sentinel can be used here.

    I might be interesting to use asyncio internally, but I'm not sure if it's possible ;-)

    @pitrou
    Copy link
    Member

    pitrou commented Dec 14, 2018

    Using asyncio internally would be an interesting long-term goal, at least for the process pool version.

    Perhaps a first step is to find out how to await a multiprocessing Connection or Queue, or make async versions of these classes.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 14, 2018

    I have to investigate how Process.sentinel can be used here.

    Look how concurrent.futures uses it:
    https://github.com/python/cpython/blob/master/Lib/concurrent/futures/process.py#L348

    This also means:

    1. we could redirect people to ProcessPoolExecutor instead of trying to backport all its features into multiprocessing.Pool
    2. we could try to refactor the ProcessPoolExecutor implementation into a common backend for both ProcessPoolExecutor and multiprocessing.Pool

    @pablogsal
    Copy link
    Member

    @antoine Do you think we should start planning one of these long term solutions or we should start trying to use Process.sentinel as a short term solution for this particular issue?

    @pitrou
    Copy link
    Member

    pitrou commented Jan 9, 2019

    If using Process.sentinel looks easy enough, then go for it. Existing users of multiprocessing.Pool will benefit.

    @pablogsal
    Copy link
    Member

    New changeset 7c99454 by Pablo Galindo in branch 'master':
    bpo-35493: Use Process.sentinel instead of sleeping for polling worker status in multiprocessing.Pool (bpo-11488)
    7c99454

    @stefanor
    Copy link
    Mannequin

    stefanor mannequin commented Feb 6, 2020

    This change seems to be causing a deadlock in multiprocessing shut-down: bpo-38501

    @arekm
    Copy link
    Mannequin

    arekm mannequin commented Mar 14, 2020

    And also https://bugs.python.org/issue38744 on Linux and FreeBSD

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants