Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concurrent.futures.ProcessPoolExecutor can deadlock when tcmalloc is used #90622

Closed
yilei mannequin opened this issue Jan 21, 2022 · 8 comments · Fixed by #91598, #92499, #92495 or #92497
Closed

concurrent.futures.ProcessPoolExecutor can deadlock when tcmalloc is used #90622

yilei mannequin opened this issue Jan 21, 2022 · 8 comments · Fixed by #91598, #92499, #92495 or #92497
Assignees
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@yilei
Copy link
Mannequin

yilei mannequin commented Jan 21, 2022

BPO 46464
Nosy @gpshead, @pitrou, @methane, @yilei
PRs
  • gh-90622: Prevent concurrent.futures.process deadlock due to fork() after spawning a thread. #30847
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gpshead'
    closed_at = None
    created_at = <Date 2022-01-21.23:32:53.148>
    labels = ['3.11', 'library', '3.9', '3.10', 'performance']
    title = 'concurrent.futures.ProcessPoolExecutor can deadlock when tcmalloc is used'
    updated_at = <Date 2022-01-25.11:06:46.406>
    user = 'https://github.com/yilei'

    bugs.python.org fields:

    activity = <Date 2022-01-25.11:06:46.406>
    actor = 'pitrou'
    assignee = 'gregory.p.smith'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2022-01-21.23:32:53.148>
    creator = 'yilei'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 46464
    keywords = ['patch', '3.9regression']
    message_count = 7.0
    messages = ['411208', '411213', '411451', '411543', '411544', '411545', '411574']
    nosy_count = 4.0
    nosy_names = ['gregory.p.smith', 'pitrou', 'methane', 'yilei']
    pr_nums = ['30847']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue46464'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @yilei
    Copy link
    Mannequin Author

    yilei mannequin commented Jan 21, 2022

    When Python is built and linked with tcmalloc, using ProcessPoolExecutor may deadlock. Here is a reproducible example:

    $ cat t.py
    from concurrent import futures
    import sys
    def work(iteration, item):
      sys.stdout.write(f'working: iteration={iteration}, item={item}\n')
      sys.stdout.flush()
    for i in range(0, 10000):
        with futures.ProcessPoolExecutor(max_workers=2) as executor:
            executor.submit(work, i, 1)
            executor.submit(work, i, 2)
    
    $ python t.py
    working: iteration=0, item=1
    working: iteration=0, item=2
    working: iteration=1, item=1
    working: iteration=1, item=2
    ...
    working: iteration=3631, item=1
    working: iteration=3631, item=2
    <hang here>

    The child process fails to finish. It's more likely to reproduce when the system is busy.

    With some bisect search internally, this commit 1ac6e37 "triggered" the deadlock threshold with tcmalloc.

    @yilei yilei mannequin added 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir labels Jan 21, 2022
    @gpshead
    Copy link
    Member

    gpshead commented Jan 21, 2022

    For context, the fundamental problem is that os.fork() is unsafe to use in any process with threads. concurrent/futures/process.py violates this. So long as its design involves a thread, it can never be guaranteed not to deadlock.

    POSIX forbids execution of most code after fork() has been called. Returning to the CPython interpreter after os.fork() in the child process is never safe unless the parent process had no threads at fork time.

    The change that triggered the issue moved from doing all of the os.fork() calls for the workers up front *before the thread was spawned* to doing them on demand after the thread was running.

    The bugfix for this is simply a rollback to revert the bpo-39207 change. It was a performance enhancement, but it causes a new deadlock in code that didn't already have one when thread tuned malloc implementations are used.

    The only way to safely launch worker processes on demand is to spawn a worker launcher process spawned prior to any thread creation that remains idle, with a sole job of spawn new worker processes for us. That sounds complicated. That'd be a feature. Lets go with the bugfix first.

    @gpshead gpshead self-assigned this Jan 21, 2022
    @gpshead gpshead added the performance Performance or resource usage label Jan 21, 2022
    @gpshead gpshead self-assigned this Jan 21, 2022
    @gpshead gpshead added the performance Performance or resource usage label Jan 21, 2022
    @gpshead
    Copy link
    Member

    gpshead commented Jan 24, 2022

    https://bugs.python.org/issue44733 for 3.11 attempts to build upon the dynamic spawning ability and will need reverting unless a non-thread+fork implementation is provided.

    @methane
    Copy link
    Member

    methane commented Jan 25, 2022

    The only way to safely launch worker processes on demand is to spawn a worker launcher process spawned prior to any thread creation that remains idle, with a sole job of spawn new worker processes for us. That sounds complicated. That'd be a feature. Lets go with the bugfix first.

    fork is not the only way to launch worker process. We have spawn. And sapwn is the default for macOS since Python 3.8.

    Simple reverting seems not good for macOS users, since they need to pay cost for both of pre-spawning and spawn.
    Can't we just pre-spawn only when fork is used?

    @gpshead
    Copy link
    Member

    gpshead commented Jan 25, 2022

    It sounds like we need to introspect the mp_context= passed to ProcessPoolExecutor (and it's default when None) to raise an error when max_tasks_per_child is incompatible with it.

    @gpshead
    Copy link
    Member

    gpshead commented Jan 25, 2022

    and similarly, the dynamic spawning could be kept when the mp_context is spawn (and possibly forkserver).

    @pitrou
    Copy link
    Member

    pitrou commented Jan 25, 2022

    Indeed, it seems this should only be disabled when the "fork" model is used, especially as the optimization is mostly valuable when spawning a worker is expensive.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    gpshead added a commit to gpshead/cpython that referenced this issue Apr 16, 2022
    …spawn via fork.
    
    This avoids potential deadlocks in the child processes due to forking from
    a multithreaded process.
    gpshead added a commit that referenced this issue May 6, 2022
    …1587)
    
    Prevent `max_tasks_per_child` use with a "fork" mp_context to avoid deadlocks.
    
    Also defaults to "spawn" when no mp_context is supplied for safe convenience.
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 8, 2022
    …ethod. (pythonGH-91598)
    
    Do not spawn ProcessPool workers on demand when they spawn via fork.
    
    This avoids potential deadlocks in the child processes due to forking from
    a multithreaded process.
    (cherry picked from commit ebb37fc)
    
    Co-authored-by: Gregory P. Smith <greg@krypto.org>
    gpshead added a commit that referenced this issue May 8, 2022
    …#91598)
    
    Do not spawn ProcessPool workers on demand when they spawn via fork.
    
    This avoids potential deadlocks in the child processes due to forking from
    a multithreaded process.
    gpshead added a commit to gpshead/cpython that referenced this issue May 8, 2022
    … fork method. (pythonGH-91598)
    
    Do not spawn ProcessPool workers on demand when they spawn via fork.
    
    This avoids potential deadlocks in the child processes due to forking from
    a multithreaded process..
    (cherry picked from commit ebb37fc)
    
    Co-authored-by: Gregory P. Smith <greg@krypto.org>
    gpshead added a commit that referenced this issue May 8, 2022
    …method. (GH-91598) (#92497)
    
    Do not spawn ProcessPool workers on demand when they spawn via fork.
    
    This avoids potential deadlocks in the child processes due to forking from
    a multithreaded process..
    (cherry picked from commit ebb37fc)
    
    Co-authored-by: Gregory P. Smith <greg@krypto.org>
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 8, 2022
    … fork method. (pythonGH-91598) (pythonGH-92497)
    
    Do not spawn ProcessPool workers on demand when they spawn via fork.
    
    This avoids potential deadlocks in the child processes due to forking from
    a multithreaded process..
    (cherry picked from commit ebb37fc)
    
    Co-authored-by: Gregory P. Smith <greg@krypto.org>
    (cherry picked from commit b795376)
    
    Co-authored-by: Gregory P. Smith <greg@krypto.org>
    gpshead added a commit that referenced this issue May 8, 2022
    …GH-91598) (#92495)
    
    Do not spawn ProcessPool workers on demand when they spawn via fork.
    
    This avoids potential deadlocks in the child processes due to forking from
    a multithreaded process.
    (cherry picked from commit ebb37fc)
    
    Co-authored-by: Gregory P. Smith <greg@krypto.org>
    @gpshead gpshead reopened this May 8, 2022
    gpshead added a commit that referenced this issue May 8, 2022
    …method. (GH-91598) (GH-92497) (#92499)
    
    Do not spawn ProcessPool workers on demand when they spawn via fork.
    
    This avoids potential deadlocks in the child processes due to forking from
    a multithreaded process..
    (cherry picked from commit ebb37fc)
    
    Co-authored-by: Gregory P. Smith <greg@krypto.org>
    (cherry picked from commit b795376)
    
    Co-authored-by: Gregory P. Smith <greg@krypto.org>
    
    Co-authored-by: Gregory P. Smith <greg@krypto.org>
    @gpshead gpshead closed this as completed May 8, 2022
    @gpshead
    Copy link
    Member

    gpshead commented May 8, 2022

    Note that there is still one deadlock possibility that I left as a comment in the code. If the "fork" mp_context spawn method is used and one of the worker processes dies, a new one will be forked after multiprocessing has already started its executor manager thread. mp_context "forkserver" and "spawn" methods avoid that.

    hello-adam pushed a commit to hello-adam/cpython that referenced this issue Jun 2, 2022
    … fork method. (pythonGH-91598) (pythonGH-92497) (python#92499)
    
    Do not spawn ProcessPool workers on demand when they spawn via fork.
    
    This avoids potential deadlocks in the child processes due to forking from
    a multithreaded process..
    (cherry picked from commit ebb37fc)
    
    Co-authored-by: Gregory P. Smith <greg@krypto.org>
    (cherry picked from commit b795376)
    
    Co-authored-by: Gregory P. Smith <greg@krypto.org>
    
    Co-authored-by: Gregory P. Smith <greg@krypto.org>
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment