This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: [Windows] multiprocessing handle leak when child process is killed during startup/unpickling
Type: behavior Stage:
Components: Library (Lib), Windows Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: dgrunwald, eryksun, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2021-01-19 18:24 by dgrunwald, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
deadlock.py dgrunwald, 2021-01-19 18:24
Messages (3)
msg385283 - (view) Author: Daniel Grunwald (dgrunwald) Date: 2021-01-19 18:24
Running the attached script deadlocks.
Uncommenting the `time.sleep(1)` in the script makes the deadlock disappear.

For context: our application uses multiple child processes (multiprocessing.Process) and uses pipes (multiprocessing.Pipe) to communicate with them.
If one process fails with an error, the main process will kill all other child processes running concurrently.
We noticed that sometimes (non-deterministically), when an error occurs soon after startup, the main process ends up hanging.

We expect that when we pass the writing half of a connection to a child process and close the connection in the main process, that we will receive EOFError if the child process terminates unexpectedly.
But sometimes the EOFError never comes and our application hangs.

I've reduced the problem to the script attached. With the reduced script, the deadlock happens reliably for me.

I've debugged this a bit, and I think this is happening because passing a connection to the process being started involves reduce_pipe_connection() which creates a copy of the handle within the main process.
When the pickled data is unpickled in the child process, it uses DUPLICATE_CLOSE_SOURCE to close the copy in the main process.
But if the pickled data is never unpickled by the child process, the handle ends up being leaked.
Thus the main process itself holds the writing half of the connection open, causing the recv() call on the reading half to block forever.
msg385284 - (view) Author: Daniel Grunwald (dgrunwald) Date: 2021-01-19 18:27
Fix idea: get_spawning_popen().pid could be used to directly copy the handle into the child process, thus avoiding the temporary copy in the main process.
This would help at least in our case (where we pass all connections during startup).

I don't know if the general case is solvable -- in general we don't know which process will unpickle the data, and "child process is killed" isn't the only possible reason why the call to rebuild_pipe_connection() might not happen (e.g. exception while unpickling an earlier part of the same message).
msg385380 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-01-20 23:48
I'm not fond of the way reduction.DupHandle() expects the receiving process to steal the duplicated handle. I'd prefer using the resource_sharer module, like reduction.DupFd() does in POSIX. Except spawning is a special case, for which reduction.DupHandle() can take advantage of the duplicate_for_child() method of the popen_spawn_win32.Popen instance.

With the resource sharer, the handle still needs to be duplicated in the sending process. But an important difference is the resource_sharer.stop() method, which at least allows closing any handles that no longer need to be shared.

---

Proposed Changes (untested)

resource_sharer.py:

    class DupHandle(object):
        '''Wrapper for a handle that can be used at any time.'''
        def __init__(self, handle):
            dh = reduction.duplicate(handle)
            def send(conn, pid):
                reduction.send_handle(conn, dh, pid)
            def close():
                _winapi.CloseHandle(dh)
            self._id = _resource_sharer.register(send, close)

        def detach(self):
            '''Get the handle. This should only be called once.'''
            with _resource_sharer.get_connection(self._id) as conn:
                return reduction.recv_handle(conn)


reduction.py:

    def send_handle(conn, handle, destination_pid):
        '''Send a handle over a local connection.'''
        proc = _winapi.OpenProcess(_winapi.PROCESS_DUP_HANDLE, False,
            destination_pid)
        try:
            dh = duplicate(handle, proc)
            conn.send(dh)
        finally:
            _winapi.CloseHandle(proc)

    def recv_handle(conn):
        '''Receive a handle over a local connection.'''
        return conn.recv()

    class _DupHandle:
        def __init__(self, handle):
            self.handle = handle
        def detach(self):
            return self.handle

    def DupHandle(handle):
        '''Return a wrapper for a handle.'''
        popen_obj = context.get_spawning_popen()
        if popen_obj is not None:
            return _DupHandle(popen_obj.duplicate_for_child(handle))
        from . import resource_sharer
        return resource_sharer.DupHandle(handle)


connection.py:

    def reduce_pipe_connection(conn):
        dh = reduction.DupHandle(conn.fileno())
        return rebuild_pipe_connection, (dh, conn.readable, conn.writable)

    def rebuild_pipe_connection(dh, readable, writable):
        return PipeConnection(dh.detach(), readable, writable)

    reduction.register(PipeConnection, reduce_pipe_connection)
History
Date User Action Args
2022-04-11 14:59:40adminsetgithub: 87134
2021-03-30 18:31:06eryksunsettitle: multiprocessing handle leak on Windows when child process is killed during startup/unpickling -> [Windows] multiprocessing handle leak when child process is killed during startup/unpickling
versions: - Python 3.7
2021-01-20 23:48:39eryksunsetversions: + Python 3.10
nosy: + eryksun

messages: + msg385380

components: + Library (Lib)
2021-01-19 18:27:07dgrunwaldsetmessages: + msg385284
2021-01-19 18:24:16dgrunwaldcreate