This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Multiprocessing.Pool hangs after re-spawning several worker process.
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.7, Python 3.6, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Olivier.Grisel, davin, olarn, pitrou, tomMoral
Priority: normal Keywords:

Created on 2017-10-27 15:10 by olarn, last changed 2022-04-11 14:58 by admin.

Messages (2)
msg305124 - (view) Author: olarn (olarn) Date: 2017-10-27 15:10
Multiprocessing's pool apparently attempts to repopulate the pool in an event of sub-process worker crash. However the pool seems to hangs after about ~ 4*(number of worker) process re-spawns.

I've tracked the issue down to queue.get() stalling at multiprocessing.pool, line 102

Is this a known issue? Are there any known workaround?

To reproduce this issue:

import multiprocessing
import multiprocessing.util
import logging

multiprocessing.util._logger = multiprocessing.util.log_to_stderr(logging.DEBUG)
import time
import ctypes


def crash_py_interpreter():
    print("attempting to crash the interpreter in ", multiprocessing.current_process())
    i = ctypes.c_char('a'.encode())
    j = ctypes.pointer(i)
    c = 0
    while True:
        j[c] = 'a'.encode()
        c += 1
    j


def test_fn(x):
    print("test_fn in ", multiprocessing.current_process().name, x)
    exit(0)

    time.sleep(0.1)


if __name__ == '__main__':

    # pool = multiprocessing.Pool(processes=multiprocessing.cpu_count())
    pool = multiprocessing.Pool(processes=1)

    args_queue = [n for n in range(20)]

    # subprocess quits
    pool.map(test_fn, args_queue)

    # subprocess crashes
    # pool.map(test_fn,queue)
msg305185 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-10-29 10:59
Generally speaking, queues can remain in an inconsistent state after a process crash (because the process might have crashed just after acquiring a shared semaphore or sending part of a large message).  It's not obvious to me how we could make them safer, at least under Unix where there's no widely-available message-oriented communication facility that I know of.
History
Date User Action Args
2022-04-11 14:58:53adminsetgithub: 76067
2017-10-29 11:02:09pitrousetnosy: + Olivier.Grisel, tomMoral
2017-10-29 10:59:45pitrousetnosy: + pitrou, davin

messages: + msg305185
versions: + Python 3.7
2017-10-27 15:10:27olarncreate