Issue 28506: Multiprocessing Pool starmap - struct.error: 'i' format requires -2e10<=n<=2e10

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/72692

classification

Title:	Multiprocessing Pool starmap - struct.error: 'i' format requires -2e10<=n<=2e10
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.5

process

Status:	closed	Resolution:	duplicate
Dependencies:		Superseder:	problem using multiprocessing with really big objects? View: 17560
Assigned To:		Nosy List:	Justin Ting, serhiy.storchaka, tim.peters
Priority:	normal	Keywords:

Created on 2016-10-22 16:30 by Justin Ting, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (6)
msg279200 - (view)	Author: Justin Ting (Justin Ting)	Date: 2016-10-22 16:30
Multiprocessing is throwing this error when dealing with large amounts of data (all floating points an integers), but none of which exceeds the number boundaries in the error that it throws: File "/root/anaconda3/lib/python3.5/multiprocessing/pool.py", line 268, in starmap return self._map_async(func, iterable, starmapstar, chunksize).get() File "/root/anaconda3/lib/python3.5/multiprocessing/pool.py", line 608, in get raise self._value File "/root/anaconda3/lib/python3.5/multiprocessing/pool.py", line 385, in _handle_tasks put(task) File "/root/anaconda3/lib/python3.5/multiprocessing/connection.py", line 206, in send self._send_bytes(ForkingPickler.dumps(obj)) File "/root/anaconda3/lib/python3.5/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647 > /root/anaconda3/lib/python3.5/multiprocessing/connection.py(393)_send_bytes() -> header = struct.pack("!i", n) It works fine on any number of subsets of this data, but not when put together.
msg279201 - (view)	Author: Tim Peters (tim.peters) *	Date: 2016-10-22 16:48
This has nothing to do with the _values_ you're passing - it has to do with the length of the pickle string: def _send_bytes(self, buf): n = len(buf) # For wire compatibility with 3.2 and lower header = struct.pack("!i", n) IT'S BLOWING UP HERE if n > 16384: ... self._send(header) self._send(buf) where the traceback shows it's called here: self._send_bytes(ForkingPickler.dumps(obj)) Of course the less data you're passing, the smaller the pickle, and that's why it doesn't blow up if you pass subsets of the data. I'd suggest rethinking how you're sharing data, as pushing two-gigabyte pickle strings around is bound to be the least efficient way possible even if it didn't blow up ;-)
msg279202 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-10-22 16:52
This looks as a duplicate of issue17560.
msg279203 - (view)	Author: Justin Ting (Justin Ting)	Date: 2016-10-22 16:53
Ah, should have picked that up, coding at 3:30am doesn't do wonders for keeping a clear head. Thanks Tim, I'll keep that in mind! Justin Ting E justingling@gmail.com \| M +61 424 751 665 \| L https://au.linkedin.com/in/justinyting <https://au.linkedin.com/in/justinyting> \| G https://github.com/jyting On Sun, Oct 23, 2016 at 3:48 AM, Tim Peters <report@bugs.python.org> wrote: > > Tim Peters added the comment: > > This has nothing to do with the _values_ you're passing - it has to do > with the length of the pickle string: > > def _send_bytes(self, buf): > n = len(buf) > # For wire compatibility with 3.2 and lower > header = struct.pack("!i", n) IT'S BLOWING UP HERE > if n > 16384: > ... > self._send(header) > self._send(buf) > > where the traceback shows it's called here: > > self._send_bytes(ForkingPickler.dumps(obj)) > > Of course the less data you're passing, the smaller the pickle, and that's > why it doesn't blow up if you pass subsets of the data. > > I'd suggest rethinking how you're sharing data, as pushing two-gigabyte > pickle strings around is bound to be the least efficient way possible even > if it didn't blow up ;-) > > ---------- > nosy: +tim.peters > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue28506> > _______________________________________ >
msg279233 - (view)	Author: Justin Ting (Justin Ting)	Date: 2016-10-23 00:52
Actually, on further inspection, I seem to be having a slightly different problem with the same error that I initially described now. Even after modifying my code so that each python forked off to another process was only given the following arguments: args = [(None, models_shape, False, None, [start, end], 'data/qp_red_features.npy') for start, end in jobs] where models_shape, start, and end are only single integers, the same error still comes up as a result. Within each process, I'm reading in a (relatively small, only 12MB) .npy ndarray and taking the [start:end] slice.
msg304793 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-10-23 11:17
Closed as a duplicate of issue17560.

History
Date	User	Action	Args
2022-04-11 14:58:38	admin	set	github: 72692
2017-10-23 11:17:07	serhiy.storchaka	set	status: open -> closed messages: + msg304793 stage: resolved
2016-10-23 00:52:20	Justin Ting	set	messages: + msg279233
2016-10-22 16:53:24	Justin Ting	set	messages: + msg279203
2016-10-22 16:52:24	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg279202 resolution: duplicate superseder: problem using multiprocessing with really big objects?
2016-10-22 16:48:33	tim.peters	set	nosy: + tim.peters messages: + msg279201
2016-10-22 16:30:52	Justin Ting	create