Title: multiprocessing - slower about 5 times than map() on 2 cores machine
Type: performance Stage:
Components: Library (Lib) Versions: Python 2.6
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: 0x666, amaury.forgeotdarc, jnoller, pitrou
Priority: normal Keywords:

Created on 2009-01-19 13:50 by 0x666, last changed 2009-01-19 19:17 by 0x666. This issue is now closed.

Messages (6)
msg80168 - (view) Author: 0x666 (0x666) Date: 2009-01-19 13:50
I think something wrong with implementation of multiprocessing module.
I`ve run this very simple test on my machine (core 2, vista):
import multiprocessing as mul
from time import time

def f(x):
    return x*x

if __name__ == '__main__':
    print "-------- testing multiprocessing on ",mul.cpu_count(),"cores
    print ""

    elements = 100000

    pool = mul.Pool(processes=mul.cpu_count())
    t1 = time()
    res_par =, range(elements))
    t2 = time()
    res_seq = map(f, range(elements))
    t3 = time()
    res_app = [pool.apply_async(f,(x,)) for x in range(elements)]
    res_app = [result.get() for result in res_app]
    t4 = time()

    print len(res_seq),"elements","map() time",(t3-t2),"s"
    print len(res_par),"elements"," time",(t2-t1),"s"
    print len(res_app),"elements","pool.apply_async() time", (t4-t3),"s"
    raw_input("press enter to exit...")
-------- testing multiprocessing on  2 cores -----------

100000 elements map() time 0.0269 s
100000 elements time 0.108 s
100000 elements pool.apply_async() time 10.567 s

IMHO, execution on 2 cores should be 1.x - 2 times faster than compared
with non-parallel execution. (at least in simple cases).
If you dont believe in this, check
module (demo example, which fits very well this idea.

So how it can be that parallel method executes in about 5
times SLOWER, than ordinary map() function ?
So please correct multiprocessing package to work in more-less
perfomance predictable way (like parallelpython).
msg80173 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-01-19 15:08
You should try with something less trivial than your "f" function. For
such a short function, it seems expectable that the dispatch overhead
will dominate the actual computation time.
msg80180 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-01-19 15:28
The multiprocessing module indeed has some overhead:

- the processes are spawned when needed. Before you perform performance
timings, you should "warm up" the Pool with a line like, range(mul.cpu_count()))
(starting a process is a slowish operation specially on Windows)
This reduces timings by a factor of two.

- the dispatch overhead of multiprocessing is certainly greater than a
single multiplication. multiprocessing is for CPU-bound functions!
And do not forget that you have *tree* processes here: two from the
Pool, and your main program.

As Antoine said, try with this function instead:

def f(x):
    for i in range(10):
        x = x * x
    return x

And the timings are much better...
msg80181 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009-01-19 15:32
My results don't match yours. (8 cores, Mac OS/X):

-------- testing multiprocessing on  8 cores ----------

100000 elements map() time 0.0444118976593 s
100000 elements time 0.0366489887238 s
100000 elements pool.apply_async() time 24.3125801086 s

Now, this could be for a variety of reasons: More cores, different OS 
(which means different speed at which processes can be forked) and so 
on. As Antoine/Amaury point out you really need a use case that is large 
enough to offset the cost of forking the processes in the first place.

I also ran this on an 8 core Ubuntu box with kernel and 
py2.6.1 and
16gb of ram:

-------- testing multiprocessing on  8 cores ----------

100000 elements map() time 0.0258889198303 s
100000 elements time 0.0339770317078 s
100000 elements pool.apply_async() time 11.0373139381 s

OS/X is pretty snappy when it comes for forking. 

Now, if you cut the example you provided over to Amaury's example, you 
see a significant difference:

OS/X, 8 cores:

-------- testing multiprocessing on  8 cores ----------

100000 elements map() time 30.704061985 s
100000 elements time 4.95880293846 s
100000 elements pool.apply_async() time 23.6090102196 s

Ubuntu, kernel and py2.6.1:

-------- testing multiprocessing on  8 cores ----------

100000 elements map() time 38.3818569183 s
100000 elements time 5.65878105164 s
100000 elements pool.apply_async() time 14.1757941246 s
msg80182 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009-01-19 15:33
Closing as not an issue.
msg80205 - (view) Author: 0x666 (0x666) Date: 2009-01-19 19:17
Thanks for quick response and for informative answers, especially 
thanks to Antoine Pitrou/Amaury Forgeot.

By seting bigger range - range(15) and elements = 1000, I was able to 
get speed-up factor up to 1.8. Wow, I`m amazed :-)
Good library, keep it up.
BTW, about warp-up,- forking first processes. User can forget 
to "warpup" pool. It would be cool if pool somehow warm-ups itself in 
initialization phase (in step mul.Pool(processes=mul.cpu_count())).
Maybe you can define another initialization parameter=function ? (if 
not None, first processes should be spawned, not waiting map function) 
or something like that. But it is only cosmetics :-)
Date User Action Args
2009-01-19 19:17:100x666setmessages: + msg80205
2009-01-19 15:34:52benjamin.petersonsetstatus: open -> closed
2009-01-19 15:33:20jnollersetresolution: not a bug
messages: + msg80182
2009-01-19 15:32:54jnollersetmessages: + msg80181
2009-01-19 15:28:19amaury.forgeotdarcsetassignee: jnoller ->
messages: + msg80180
nosy: + amaury.forgeotdarc
2009-01-19 15:19:04jnollersetassignee: jnoller
nosy: + jnoller
2009-01-19 15:08:13pitrousetnosy: + pitrou
messages: + msg80173
2009-01-19 13:50:510x666create