classification
Title: multiprocessing - Pool.map() slower about 5 times than map() on 2 cores machine
Type: performance Stage:
Components: Library (Lib) Versions: Python 2.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: 0x666, amaury.forgeotdarc, jnoller, pitrou
Priority: normal Keywords:

Created on 2009-01-19 13:50 by 0x666, last changed 2009-01-19 19:17 by 0x666. This issue is now closed.

Messages (6)
msg80168 - (view) Author: 0x666 (0x666) Date: 2009-01-19 13:50
I think something wrong with implementation of multiprocessing module.
I`ve run this very simple test on my machine (core 2, vista):
import multiprocessing as mul
from time import time

def f(x):
    return x*x

if __name__ == '__main__':
    print "-------- testing multiprocessing on ",mul.cpu_count(),"cores
----------"
    print ""

    elements = 100000

    pool = mul.Pool(processes=mul.cpu_count())
    t1 = time()
    res_par = pool.map(f, range(elements))
    t2 = time()
    res_seq = map(f, range(elements))
    t3 = time()
    res_app = [pool.apply_async(f,(x,)) for x in range(elements)]
    res_app = [result.get() for result in res_app]
    t4 = time()

    print len(res_seq),"elements","map() time",(t3-t2),"s"
    print len(res_par),"elements","pool.map() time",(t2-t1),"s"
    print len(res_app),"elements","pool.apply_async() time", (t4-t3),"s"
    
    print
    raw_input("press enter to exit...")
__________________________________________
Results:
-------- testing multiprocessing on  2 cores -----------

100000 elements map() time 0.0269 s
100000 elements pool.map() time 0.108 s
100000 elements pool.apply_async() time 10.567 s
--------------------------------------------------------

IMHO, execution on 2 cores should be 1.x - 2 times faster than compared
with non-parallel execution. (at least in simple cases).
If you dont believe in this, check http://www.parallelpython.com/
module (demo example sum_primes.py), which fits very well this idea.

So how it can be that parallel pool.map() method executes in about 5
times SLOWER, than ordinary map() function ?
So please correct multiprocessing package to work in more-less
perfomance predictable way (like parallelpython).
msg80173 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-01-19 15:08
You should try with something less trivial than your "f" function. For
such a short function, it seems expectable that the dispatch overhead
will dominate the actual computation time.
msg80180 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-01-19 15:28
The multiprocessing module indeed has some overhead:

- the processes are spawned when needed. Before you perform performance
timings, you should "warm up" the Pool with a line like
    pool.map(f, range(mul.cpu_count()))
(starting a process is a slowish operation specially on Windows)
This reduces timings by a factor of two.

- the dispatch overhead of multiprocessing is certainly greater than a
single multiplication. multiprocessing is for CPU-bound functions!
And do not forget that you have *tree* processes here: two from the
Pool, and your main program.

As Antoine said, try with this function instead:

def f(x):
    for i in range(10):
        x = x * x
    return x

And the timings are much better...
msg80181 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009-01-19 15:32
My results don't match yours. (8 cores, Mac OS/X):

-------- testing multiprocessing on  8 cores ----------

100000 elements map() time 0.0444118976593 s
100000 elements pool.map() time 0.0366489887238 s
100000 elements pool.apply_async() time 24.3125801086 s

Now, this could be for a variety of reasons: More cores, different OS 
(which means different speed at which processes can be forked) and so 
on. As Antoine/Amaury point out you really need a use case that is large 
enough to offset the cost of forking the processes in the first place.

I also ran this on an 8 core Ubuntu box with kernel 2.6.22.19 and 
py2.6.1 and
16gb of ram:

-------- testing multiprocessing on  8 cores ----------

100000 elements map() time 0.0258889198303 s
100000 elements pool.map() time 0.0339770317078 s
100000 elements pool.apply_async() time 11.0373139381 s

OS/X is pretty snappy when it comes for forking. 

Now, if you cut the example you provided over to Amaury's example, you 
see a significant difference:

OS/X, 8 cores:

-------- testing multiprocessing on  8 cores ----------

100000 elements map() time 30.704061985 s
100000 elements pool.map() time 4.95880293846 s
100000 elements pool.apply_async() time 23.6090102196 s

Ubuntu, kernel 2.6.22.19 and py2.6.1:

-------- testing multiprocessing on  8 cores ----------

100000 elements map() time 38.3818569183 s
100000 elements pool.map() time 5.65878105164 s
100000 elements pool.apply_async() time 14.1757941246 s
msg80182 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009-01-19 15:33
Closing as not an issue.
msg80205 - (view) Author: 0x666 (0x666) Date: 2009-01-19 19:17
Thanks for quick response and for informative answers, especially 
thanks to Antoine Pitrou/Amaury Forgeot.

p.s.
By seting bigger range - range(15) and elements = 1000, I was able to 
get speed-up factor up to 1.8. Wow, I`m amazed :-)
Good library, keep it up.
BTW, about warp-up,- forking first processes. User can forget 
to "warpup" pool. It would be cool if pool somehow warm-ups itself in 
initialization phase (in step mul.Pool(processes=mul.cpu_count())).
Maybe you can define another initialization parameter=function ? (if 
not None, first processes should be spawned, not waiting map function) 
or something like that. But it is only cosmetics :-)
History
Date User Action Args
2009-01-19 19:17:100x666setmessages: + msg80205
2009-01-19 15:34:52benjamin.petersonsetstatus: open -> closed
2009-01-19 15:33:20jnollersetresolution: not a bug
messages: + msg80182
2009-01-19 15:32:54jnollersetmessages: + msg80181
2009-01-19 15:28:19amaury.forgeotdarcsetassignee: jnoller ->
messages: + msg80180
nosy: + amaury.forgeotdarc
2009-01-19 15:19:04jnollersetassignee: jnoller
nosy: + jnoller
2009-01-19 15:08:13pitrousetnosy: + pitrou
messages: + msg80173
2009-01-19 13:50:510x666create