This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Multiprocessing worker functions not terminating with a large number of processes and a manager
Type: behavior Stage: resolved
Components: Library (Lib), macOS Versions: Python 3.6
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: Ericg, andrei.avk, davin, pitrou, ronaldoussoren
Priority: normal Keywords:

Created on 2018-02-24 14:17 by Ericg, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (9)
msg312718 - (view) Author: EricG (Ericg) Date: 2018-02-24 14:17
I have the following code:

    import multiprocessing
    from multiprocessing import Pool, Manager
    import time
    import random

    def worker_function( index, messages ):

        print( "%d: Entered" % index )
        time.sleep( random.randint( 3, 15 ) )
        messages.put( "From: %d" % index )
        print( "%d: Exited" % index )

    manager = Manager()
    messages = manager.Queue()

    with Pool( processes = None ) as pool:

        for x in range( 30 ):
            pool.apply_async( worker_function, [ x, messages ] )

        pool.close()
        pool.join()

It does not terminate -- all entered messages are printed, but not all exited messages are printed.

If I remove all the code related to the Manager and Queue, it will terminate properly with all messages printed.

If I assign processes explicitly, I can continue to increase the number assigned to processes and have it continue to work until I reach a value of 20 or 21. > 20, it fails all of the time. With a value == 20 it fails some of the time. With a value of < 20, it always succeeds.

multiprocessing.cpu_count() returns 24 for my MacPro.
msg312747 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-02-24 19:46
What happens if you add another process that calls get() on the queue?  You should not try to put data on a queue if you don't ever plan to consume it, as the queue's background thread will eventually block until something gets consumed.

For example, this blocks here on Linux:

$ ./python -c "import multiprocessing as mp; q = mp.Queue(); [q.put(None) for i in range(50000)]"
msg312748 - (view) Author: EricG (Ericg) Date: 2018-02-24 19:52
I do plan to consume the messages on the queue, but only after all worker functions are complete...after pool.join() returns. Is this not ok?

I can certainly spawn a thread on the main process which will consume the queue entries and insert them into a list or queue which can then be accessed after join returns. Is that the correct way this code should be written?
msg312751 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-02-24 20:04
I'm not sure... I cannot reproduce your problem on Linux, even with 50 processes and 10000 iterations, on Python 3.6.4.

Which exact version are you using?
What happens if you replace the manager Queue with a non-manager Queue?
msg312924 - (view) Author: EricG (Ericg) Date: 2018-02-26 13:19
If I do:

    from queue import Queue

    messages = Queue()

No messages are printed. I believe this is expected as a regular Queue cannot be shared between processes. It was a problem that the manager was designed to solve.

I am using a MacPro running 10.3.2
Python 3.6.4

It would not surprise me if this were an OS specific issue. To reproduce it may requiring using a Mac with a high number of cores.

It is trivially reproducible for me.
msg312950 - (view) Author: EricG (Ericg) Date: 2018-02-26 20:31
Making some further observations, when I set processes = 12, for example, I can see 12 separate python processes + 4 additional processes also created which I assume are setup for the manager and, perhaps, other purposes.

Now, what makes these 4 additional processes interesting is that for me, multiprocessing.cpu_count() returns 24. And, it is when I set processes = 20 that the python code will sometimes terminate successfully. 20 + 4 = 24...so I am using every single cpu in that situation.

However, as noted, when I set processes = 19, it will always terminate successfully. 19 + 4 < 24...there is at least one cpu not assigned any work.

Perhaps there some some kind of race condition or swapping around of data structures or something that only happens on macOS when every cpu is in use by python for this purpose.
msg313108 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-03-01 19:19
I don't know, maybe someone with a Mac wants to try and debug it?
msg380058 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2020-10-31 12:44
This issue can probably be closed as out of date:

Using: Python3.9, installer from python.org. On my laptop os.cpu_count() == 8.

"python -c 'import repro; repro.main()' works for me (it runs a while and exists when all workers have exited). The repro module contains the code below, this is the code from the original message with some changes because of changes to the multiprocessing launcher strategy on macOS.

# repro.py
import multiprocessing
from multiprocessing import Pool, Manager
import time
import random

def worker_function( index, messages ):

    print( "%d: Entered" % index )
    time.sleep( random.randint( 3, 15 ) )
    messages.put( "From: %d" % index )
    print( "%d: Exited" % index )


def main():
    manager = Manager()
    messages = manager.Queue()
    with Pool( processes = None ) as pool:

        for x in range( 30 ):
            pool.apply_async( worker_function, [ x, messages ] )

        pool.close()
        pool.join()
# EOF
msg398224 - (view) Author: Andrei Kulakov (andrei.avk) * (Python triager) Date: 2021-07-26 12:47
I have confirmed the reproducer posted by Ronald also works for me, on 3.11 compiled from dev branch, also with cpu count == 8; on the new M1 silicon (that shouldn't matter though).

I agree this can be closed.
History
Date User Action Args
2022-04-11 14:58:58adminsetgithub: 77118
2021-08-16 23:22:57ned.deilysetstatus: open -> closed
2021-07-26 12:47:47andrei.avksetstatus: pending -> open
nosy: + andrei.avk
messages: + msg398224

2020-10-31 12:44:15ronaldoussorensetstatus: open -> pending

type: behavior

nosy: + ronaldoussoren
messages: + msg380058
resolution: out of date
stage: resolved
2018-03-01 19:19:31pitrousetmessages: + msg313108
2018-02-26 20:31:10Ericgsetmessages: + msg312950
2018-02-26 13:19:20Ericgsetmessages: + msg312924
2018-02-24 20:04:22pitrousetmessages: + msg312751
2018-02-24 19:52:16Ericgsetmessages: + msg312748
2018-02-24 19:46:01pitrousetmessages: + msg312747
2018-02-24 19:03:00ned.deilysetnosy: + pitrou, davin, - ronaldoussoren, ned.deily
2018-02-24 14:17:03Ericgcreate