This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Deadlock with multiprocessing.Queue()
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.6, Python 3.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: davin, eryksun, max, tim.peters
Priority: normal Keywords:

Created on 2017-03-12 05:45 by max, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (4)
msg289479 - (view) Author: Max (max) * Date: 2017-03-12 05:45
Using multiprocessing.Queue() with several processes writing very fast results in a deadlock both on Windows and UNIX.

For example, this code:

from multiprocessing import Process, Queue, Manager
import time, sys

def simulate(q, n_results):
    for i in range(n_results):
        time.sleep(0.01)
        q.put(i)

def main():
    n_workers = int(sys.argv[1])
    n_results = int(sys.argv[2])

    q = Queue()
    proc_list = [Process(target=simulate, 
        args=(q, n_results),
        daemon=True) for i in range(n_workers)]

    for proc in proc_list:
        proc.start()

    for i in range(5):
        time.sleep(1)
        print('current approximate queue size:', q.qsize())
        alive = [p.pid for p in proc_list if p.is_alive()]
        if alive:
            print(len(alive), 'processes alive; among them:', alive[:5])
        else:
            break

    for p in proc_list:
        p.join()

    print('final appr queue size', q.qsize())


if __name__ == '__main__':
    main()


hangs on Windows 10 (python 3.6) with 2 workers and 1000 results each, and on Ubuntu 16.04 (python 3.5) with 100 workers and 100 results each. The print out shows that the queue has reached the full size, but a bunch of processes are still alive. Presumably, they somehow manage to lock themselves out even though they don't depend on each other (must be in the implementation of Queue()):

current approximate queue size: 9984
47 processes alive; among them: [2238, 2241, 2242, 2244, 2247]
current approximate queue size: 10000
47 processes alive; among them: [2238, 2241, 2242, 2244, 2247]

The deadlock disappears once multiprocessing.Queue() is replaced with multiprocessing.Manager().Queue() - or at least I wasn't able to replicate it with a reasonable number of processes and results.
msg289480 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2017-03-12 06:12
I think this is expected.  Add this as the first line of `simulate()` and the problem should go away:

    q.cancel_join_thread()

As the docs say, a Queue works with a background thread, which feeds incoming data from an internal buffer to a (interprocess) pipe.  By default, a process using the Queue attempts to join that thread when the process exits.  But since you never take anything off the queue, the thread waits forever, hoping for the pipe to drain so it can feed in the rest of its buffer.

But see the docs for why you don't really want to use .cancel_join_thread():  the process will just exit then, and the data in the internal buffer will most likely simply be lost.

A Manager.Queue doesn't have this problem because it runs in its own (Manager) process:  q.put() sends the data to that process at once, without buffering anything.  So if you have write-only Queues, that's the way to go ;-)
msg289490 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-03-12 09:47
On Windows the "QueueFeederThread" in each child process is blocked in WaitForMultipleObjects in PipeConnection._send_bytes. The pipe buffer size is 8 KiB, and each pickled int is 5-6 bytes. With 2 processes the pipe is full after sending (256 + 469) * 2 == 1450 int objects. Joining this feeder thread waits until all of the data is sent, however long that takes. This appears to be working correctly as designed.
msg289492 - (view) Author: Max (max) * Date: 2017-03-12 10:25
Yes, this makes sense. My bad, I didn't realize processes might need to wait until the queue is consumed.

I don't think there's any need to update the docs either, nobody should have production code that never reads the queue (mine was a test of some other issue).
History
Date User Action Args
2022-04-11 14:58:44adminsetgithub: 73983
2017-03-12 15:52:23tim.peterssetstatus: open -> closed
resolution: not a bug
stage: resolved
2017-03-12 10:25:41maxsetmessages: + msg289492
2017-03-12 09:47:32eryksunsetnosy: + eryksun
messages: + msg289490
2017-03-12 08:48:00rhettingersetnosy: + davin
2017-03-12 06:12:36tim.peterssetnosy: + tim.peters
messages: + msg289480
2017-03-12 05:45:09maxcreate