Issue 5000: multiprocessing - Pool.map() slower about 5 times than map() on 2 cores machine

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/49250

classification

Title:	multiprocessing - Pool.map() slower about 5 times than map() on 2 cores machine
Type:	performance	Stage:
Components:	Library (Lib)	Versions:	Python 2.6

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	0x666, amaury.forgeotdarc, jnoller, pitrou
Priority:	normal	Keywords:

Created on 2009-01-19 13:50 by 0x666, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (6)
msg80168 - (view)	Author: 0x666 (0x666)	Date: 2009-01-19 13:50
I think something wrong with implementation of multiprocessing module. I`ve run this very simple test on my machine (core 2, vista): import multiprocessing as mul from time import time def f(x): return x*x if __name__ == '__main__': print "-------- testing multiprocessing on ",mul.cpu_count(),"cores ----------" print "" elements = 100000 pool = mul.Pool(processes=mul.cpu_count()) t1 = time() res_par = pool.map(f, range(elements)) t2 = time() res_seq = map(f, range(elements)) t3 = time() res_app = [pool.apply_async(f,(x,)) for x in range(elements)] res_app = [result.get() for result in res_app] t4 = time() print len(res_seq),"elements","map() time",(t3-t2),"s" print len(res_par),"elements","pool.map() time",(t2-t1),"s" print len(res_app),"elements","pool.apply_async() time", (t4-t3),"s" print raw_input("press enter to exit...") __________________________________________ Results: -------- testing multiprocessing on 2 cores ----------- 100000 elements map() time 0.0269 s 100000 elements pool.map() time 0.108 s 100000 elements pool.apply_async() time 10.567 s -------------------------------------------------------- IMHO, execution on 2 cores should be 1.x - 2 times faster than compared with non-parallel execution. (at least in simple cases). If you dont believe in this, check http://www.parallelpython.com/ module (demo example sum_primes.py), which fits very well this idea. So how it can be that parallel pool.map() method executes in about 5 times SLOWER, than ordinary map() function ? So please correct multiprocessing package to work in more-less perfomance predictable way (like parallelpython).
msg80173 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-01-19 15:08
You should try with something less trivial than your "f" function. For such a short function, it seems expectable that the dispatch overhead will dominate the actual computation time.
msg80180 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2009-01-19 15:28
The multiprocessing module indeed has some overhead: - the processes are spawned when needed. Before you perform performance timings, you should "warm up" the Pool with a line like pool.map(f, range(mul.cpu_count())) (starting a process is a slowish operation specially on Windows) This reduces timings by a factor of two. - the dispatch overhead of multiprocessing is certainly greater than a single multiplication. multiprocessing is for CPU-bound functions! And do not forget that you have tree processes here: two from the Pool, and your main program. As Antoine said, try with this function instead: def f(x): for i in range(10): x = x * x return x And the timings are much better...
msg80181 - (view)	Author: Jesse Noller (jnoller) *	Date: 2009-01-19 15:32
My results don't match yours. (8 cores, Mac OS/X): -------- testing multiprocessing on 8 cores ---------- 100000 elements map() time 0.0444118976593 s 100000 elements pool.map() time 0.0366489887238 s 100000 elements pool.apply_async() time 24.3125801086 s Now, this could be for a variety of reasons: More cores, different OS (which means different speed at which processes can be forked) and so on. As Antoine/Amaury point out you really need a use case that is large enough to offset the cost of forking the processes in the first place. I also ran this on an 8 core Ubuntu box with kernel 2.6.22.19 and py2.6.1 and 16gb of ram: -------- testing multiprocessing on 8 cores ---------- 100000 elements map() time 0.0258889198303 s 100000 elements pool.map() time 0.0339770317078 s 100000 elements pool.apply_async() time 11.0373139381 s OS/X is pretty snappy when it comes for forking. Now, if you cut the example you provided over to Amaury's example, you see a significant difference: OS/X, 8 cores: -------- testing multiprocessing on 8 cores ---------- 100000 elements map() time 30.704061985 s 100000 elements pool.map() time 4.95880293846 s 100000 elements pool.apply_async() time 23.6090102196 s Ubuntu, kernel 2.6.22.19 and py2.6.1: -------- testing multiprocessing on 8 cores ---------- 100000 elements map() time 38.3818569183 s 100000 elements pool.map() time 5.65878105164 s 100000 elements pool.apply_async() time 14.1757941246 s
msg80182 - (view)	Author: Jesse Noller (jnoller) *	Date: 2009-01-19 15:33
Closing as not an issue.
msg80205 - (view)	Author: 0x666 (0x666)	Date: 2009-01-19 19:17
Thanks for quick response and for informative answers, especially thanks to Antoine Pitrou/Amaury Forgeot. p.s. By seting bigger range - range(15) and elements = 1000, I was able to get speed-up factor up to 1.8. Wow, I`m amazed :-) Good library, keep it up. BTW, about warp-up,- forking first processes. User can forget to "warpup" pool. It would be cool if pool somehow warm-ups itself in initialization phase (in step mul.Pool(processes=mul.cpu_count())). Maybe you can define another initialization parameter=function ? (if not None, first processes should be spawned, not waiting map function) or something like that. But it is only cosmetics :-)

History
Date	User	Action	Args
2022-04-11 14:56:44	admin	set	github: 49250
2009-01-19 19:17:10	0x666	set	messages: + msg80205
2009-01-19 15:34:52	benjamin.peterson	set	status: open -> closed
2009-01-19 15:33:20	jnoller	set	resolution: not a bug messages: + msg80182
2009-01-19 15:32:54	jnoller	set	messages: + msg80181
2009-01-19 15:28:19	amaury.forgeotdarc	set	assignee: jnoller -> messages: + msg80180 nosy: + amaury.forgeotdarc
2009-01-19 15:19:04	jnoller	set	assignee: jnoller nosy: + jnoller
2009-01-19 15:08:13	pitrou	set	nosy: + pitrou messages: + msg80173
2009-01-19 13:50:51	0x666	create