Message241289
Hi,
I've seen an odd behavior for multiprocessing Pool in Linux/MacOS:
-----------------------------
import multiprocessing as mp
from sys import getsizeof
import numpy as np
def f_test(x):
print('process has received argument %s' % x )
r = x[:100] # return will put in a queue for Pool, for objects > 4GB pickle complains
return r
if __name__ == '__main__':
# 2**28 runs ok, 2**29 or bigger breaks pickle
big_param = np.random.random(2**29)
# Process+big_parameter OK:
proc = mp.Process(target=f_test, args=(big_param,))
res = proc.start()
proc.join()
print('size of process result', getsizeof(res))
# Pool+big_parameter BREAKS:
pool = mp.Pool(1)
res = pool.map(f_test, (big_param,))
print('size of Pool result', getsizeof(res))
-----------------------------
$ python bug_mp.py
process has received argument [ 0.65282086 0.34977429 0.64148342 ..., 0.79902495 0.31427761
0.02678803]
size of process result 16
Traceback (most recent call last):
File "bug_mp.py", line 26, in <module>
res = pool.map(f_test, (big_param,))
File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/pool.py", line 599, in get
raise self._value
File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/pool.py", line 383, in _handle_tasks
put(task)
File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/connection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
OverflowError: cannot serialize a bytes object larger than 4 GiB
-----------------------------
There's another flavor of error seen in similar scenario:
...
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
-----------------------------
Tested in:
Python 3.4.2 |Anaconda 2.1.0 (64-bit)| (default, Oct 21 2014, 17:16:37)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
And in:
Python 3.4.3 (default, Apr 9 2015, 16:03:56)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
-----------------------------
Pool.map creates a "task Queue" to handle workers, and I think that but by doing this we are forcing any arguments passed to the workers to be pickled.
Process works OK, since no queue is created, it just forks.
My expectation would be that since we are in POSIX and forking, we shouldn't have to worry about arguments being pickled, and if this is expected behavior, it should be warned/documented (hope I've not missed this in the docs).
For small sized arguments, pickling-unpicking may not be an issue, but for big ones then, it is (I am aware of the Array and MemShare options).
Anybody has seen something similar, is perhaps this a hard requirement to Pool.map or I'm completely missing the point altogether? |
|
Date |
User |
Action |
Args |
2015-04-16 23:14:09 | kieleth | set | recipients:
+ kieleth |
2015-04-16 23:14:09 | kieleth | set | messageid: <1429226049.08.0.139422751174.issue23979@psf.upfronthosting.co.za> |
2015-04-16 23:14:09 | kieleth | link | issue23979 messages |
2015-04-16 23:14:08 | kieleth | create | |
|