classification
Title: multiprocessor spawn
Type: crash Stage:
Components: macOS Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: blalterman, davin, mouse07410, ned.deily, pitrou, ronaldoussoren
Priority: normal Keywords:

Created on 2020-03-29 21:49 by mouse07410, last changed 2020-05-11 02:04 by blalterman.

Messages (2)
msg365279 - (view) Author: Mouse (mouse07410) Date: 2020-03-29 21:49
MacOS Catalina 10.15.3 and 10.15.4. Python-3.8.2 (also tested with 3.7.7, which confirmed the problem being in the fix described in https://bugs.python.org/issue33725.

Trying to use "multiprocessor" with Python-3.8 and with the new default of `set_start_method('spawn')` is nothing but a disaster.

Not doing join() leads to consistent crashes, like described here https://bugs.python.org/issue33725#msg365249

Adding p.join() immediately after p.start() seems to work, but increases the total run-time by factor between two and four, user time by factor of five, and system time by factor of ten. 

Occasionally even with p.join() I'm getting some processes crashing like  shown in https://bugs.python.org/issue33725#msg365249. 

I found two workarounds:
1. Switch back to 'fork' by explicitly adding `set_start_method('fork') to the __main__.
2. Drop the messy "multiprocessing" package and use "multiprocess" instead, which turns out to be a good and reliable fork of "multiprocessing".

If anybody cares to dig deeper into this problem, I'd be happy to provide whatever information that could be helpful.

Here's the sample code (again):
```
#!/usr/bin/env python3
#
# Test "multiprocessing" package included with Python-3.6+
#
# Usage:
#    ./mylti1.py [nElements [nProcesses [tSleep]]]
#
#        nElements  - total number of integers to put in the queue
#                     default: 100
#        nProcesses - total number of parallel processes/threads
#                     default: number of physical cores available
#        tSleep     - number of milliseconds for a thread to sleep
#                     after it retrieved an element from the queue
#                     default: 17
#
# Algorithm:
#   1. Creates a queue and adds nElements integers to it,
#   2. Creates nProcesses threads
#   3. Each thread extracts an element from the queue and sleeps for tSleep milliseconds
#

import sys, queue, time
import multiprocessing as mp


def getElements(q, tSleep, idx):
    l = []  # list of pulled numbers
    while True:
        try:
            l.append(q.get(True, .001))
            time.sleep(tSleep)
        except queue.Empty:
            if q.empty():
                print(f'worker {idx} done, got {len(l)} numbers')
                return


if __name__ == '__main__':
    nElements = int(sys.argv[1]) if len(sys.argv) > 1 else 100
    nProcesses = int(sys.argv[2]) if len(sys.argv) > 2 else mp.cpu_count()
    tSleep = float(sys.argv[3]) if len(sys.argv) > 3 else 17

    # To make this sample code work reliably and fast, uncomment following line
    #mp.set_start_method('fork')

    # Fill the queue with numbers from 0 to nElements
    q = mp.Queue()
    for k in range(nElements):
        q.put(k)

    # Keep track of worker processes
    workers = []

    # Start worker processes
    for m in range(nProcesses):
        p = mp.Process(target=getElements, args=(q, tSleep / 1000, m))
        workers.append(p)
        p.start()

    # Now do the joining
    for p in workers:
        p.join()
```

Here's the timing:
```
$ time python3 multi1.py
worker 9 done, got 5 numbers
worker 16 done, got 5 numbers
worker 6 done, got 5 numbers
worker 8 done, got 5 numbers
worker 17 done, got 5 numbers
worker 3 done, got 5 numbers
worker 14 done, got 5 numbers
worker 0 done, got 5 numbers
worker 15 done, got 4 numbers
worker 7 done, got 5 numbers
worker 5 done, got 5 numbers
worker 12 done, got 5 numbers
worker 4 done, got 5 numbers
worker 19 done, got 5 numbers
worker 18 done, got 5 numbers
worker 1 done, got 5 numbers
worker 10 done, got 5 numbers
worker 2 done, got 5 numbers
worker 11 done, got 6 numbers
worker 13 done, got 5 numbers

real	0m0.325s
user	0m1.375s
sys	0m0.692s
```

If I comment out the join() and uncomment set_start_method('fork'), the timing is
```
$ time python3 multi1.py
worker 0 done, got 5 numbers
worker 3 done, got 5 numbers
worker 2 done, got 5 numbers
worker 1 done, got 5 numbers
worker 5 done, got 5 numbers
worker 10 done, got 5 numbers
worker 6 done, got 5 numbers
worker 4 done, got 5 numbers
worker 7 done, got 5 numbers
worker 9 done, got 5 numbers
worker 8 done, got 5 numbers
worker 14 done, got 5 numbers
worker 11 done, got 5 numbers
worker 12 done, got 5 numbers
worker 13 done, got 5 numbers
worker 16 done, got 5 numbers
worker 15 done, got 5 numbers
worker 17 done, got 5 numbers
worker 18 done, got 5 numbers
worker 19 done, got 5 numbers

real	0m0.175s
user	0m0.073s
sys	0m0.070s
```

You can observe the difference.

Here's the timing if I don't bother with either join() or set_start_method(), but import "multiprocess" instead:
```
$ time python3 multi2.py 
worker 0 done, got 5 numbers
worker 1 done, got 5 numbers
worker 2 done, got 5 numbers
worker 4 done, got 5 numbers
worker 3 done, got 5 numbers
worker 5 done, got 5 numbers
worker 6 done, got 5 numbers
worker 8 done, got 5 numbers
worker 9 done, got 5 numbers
worker 7 done, got 5 numbers
worker 14 done, got 5 numbers
worker 11 done, got 5 numbers
worker 13 done, got 5 numbers
worker 16 done, got 5 numbers
worker 12 done, got 5 numbers
worker 10 done, got 5 numbers
worker 15 done, got 5 numbers
worker 17 done, got 5 numbers
worker 18 done, got 5 numbers
worker 19 done, got 5 numbers

real	0m0.192s
user	0m0.089s
sys	0m0.076s
```

Also, on a weaker machine with only 4 cores (rather than 20 that ran the above example), the instability of the "multiprocessor"-based code shows:
```
$ time python3.8 multi1.py 
worker 3 done, got 33 numbers
worker 2 done, got 33 numbers
worker 1 done, got 34 numbers
worker 0 done, got 0 numbers

real	0m5.448s
user	0m0.339s
sys	0m0.196s
```
Observe how one process out of four got nothing from the queue. With "multiprocess" the code runs like a clockwork - each process gets exactly 1/N of the queue:
```
$ time python3.8 multi2.py 
worker 0 done, got 25 numbers
worker 1 done, got 25 numbers
worker 2 done, got 25 numbers
worker 3 done, got 25 numbers

real	0m0.551s
user	0m0.082s
sys	0m0.044s
```

I think that the best course for "multiprocessor" would be reverting the default to 'fork'. It also looks like for the users the best course would be switching to "multiprocess".
msg368602 - (view) Author: B. L. Alterman (blalterman) Date: 2020-05-11 02:04
@Mouse, using "multiprocess" instead of "multiprocessing" will not work if you're passing a class that inherits from ABC.

"dill" is one of "multiprocess"'s dependencies and "dill" can't pickle an _abc_data object (https://github.com/uqfoundation/dill/issues/332)
History
Date User Action Args
2020-05-11 02:04:31blaltermansetmessages: + msg368602
2020-05-10 21:44:16blaltermansetnosy: + blalterman
2020-04-13 21:34:11ned.deilysetnosy: + pitrou, davin
2020-03-29 21:49:38mouse07410create