classification
Title: Multiprocessing Pool starmap - struct.error: 'i' format requires -2e10<=n<=2e10
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: duplicate
Dependencies: Superseder: problem using multiprocessing with really big objects?
View: 17560
Assigned To: Nosy List: Justin Ting, serhiy.storchaka, tim.peters
Priority: normal Keywords:

Created on 2016-10-22 16:30 by Justin Ting, last changed 2017-10-23 11:17 by serhiy.storchaka. This issue is now closed.

Messages (6)
msg279200 - (view) Author: Justin Ting (Justin Ting) Date: 2016-10-22 16:30
Multiprocessing is throwing this error when dealing with large amounts of data (all floating points an integers), but none of which exceeds the number boundaries in the error that it throws:

  File "/root/anaconda3/lib/python3.5/multiprocessing/pool.py", line 268, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/root/anaconda3/lib/python3.5/multiprocessing/pool.py", line 608, in get
    raise self._value
  File "/root/anaconda3/lib/python3.5/multiprocessing/pool.py", line 385, in _handle_tasks
    put(task)
  File "/root/anaconda3/lib/python3.5/multiprocessing/connection.py", line 206, in send
    self._send_bytes(ForkingPickler.dumps(obj))
  File "/root/anaconda3/lib/python3.5/multiprocessing/connection.py", line 393, in _send_bytes
    header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
> /root/anaconda3/lib/python3.5/multiprocessing/connection.py(393)_send_bytes()
-> header = struct.pack("!i", n)

It works fine on any number of subsets of this data, but not when put together.
msg279201 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-10-22 16:48
This has nothing to do with the _values_ you're passing - it has to do with the length of the pickle string:

    def _send_bytes(self, buf):
        n = len(buf)
        # For wire compatibility with 3.2 and lower
        header = struct.pack("!i", n)  IT'S BLOWING UP HERE
        if n > 16384:
            ...
            self._send(header)
            self._send(buf)

where the traceback shows it's called here:

    self._send_bytes(ForkingPickler.dumps(obj))

Of course the less data you're passing, the smaller the pickle, and that's why it doesn't blow up if you pass subsets of the data.

I'd suggest rethinking how you're sharing data, as pushing two-gigabyte pickle strings around is bound to be the least efficient way possible even if it didn't blow up ;-)
msg279202 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-22 16:52
This looks as a duplicate of issue17560.
msg279203 - (view) Author: Justin Ting (Justin Ting) Date: 2016-10-22 16:53
Ah, should have picked that up, coding at 3:30am doesn't do wonders for
keeping a clear head.

Thanks Tim, I'll keep that in mind!

*Justin Ting*
*E* justingling@gmail.com |  *M* +61 424 751 665 | *L*
*https://au.linkedin.com/in/justinyting
<https://au.linkedin.com/in/justinyting>* | *G *https://github.com/jyting

On Sun, Oct 23, 2016 at 3:48 AM, Tim Peters <report@bugs.python.org> wrote:

>
> Tim Peters added the comment:
>
> This has nothing to do with the _values_ you're passing - it has to do
> with the length of the pickle string:
>
>     def _send_bytes(self, buf):
>         n = len(buf)
>         # For wire compatibility with 3.2 and lower
>         header = struct.pack("!i", n)  IT'S BLOWING UP HERE
>         if n > 16384:
>             ...
>             self._send(header)
>             self._send(buf)
>
> where the traceback shows it's called here:
>
>     self._send_bytes(ForkingPickler.dumps(obj))
>
> Of course the less data you're passing, the smaller the pickle, and that's
> why it doesn't blow up if you pass subsets of the data.
>
> I'd suggest rethinking how you're sharing data, as pushing two-gigabyte
> pickle strings around is bound to be the least efficient way possible even
> if it didn't blow up ;-)
>
> ----------
> nosy: +tim.peters
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue28506>
> _______________________________________
>
msg279233 - (view) Author: Justin Ting (Justin Ting) Date: 2016-10-23 00:52
Actually, on further inspection, I seem to be having a slightly different problem with the same error that I initially described now.

Even after modifying my code so that each python forked off to another process was only given the following arguments:
args = [(None, models_shape, False, None, [start, end], 'data/qp_red_features.npy') for start, end in jobs] 

where models_shape, start, and end are only single integers, the same error still comes up as a result. Within each process, I'm reading in a (relatively small, only 12MB) .npy ndarray and taking the [start:end] slice.
msg304793 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-23 11:17
Closed as a duplicate of issue17560.
History
Date User Action Args
2017-10-23 11:17:07serhiy.storchakasetstatus: open -> closed

messages: + msg304793
stage: resolved
2016-10-23 00:52:20Justin Tingsetmessages: + msg279233
2016-10-22 16:53:24Justin Tingsetmessages: + msg279203
2016-10-22 16:52:24serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg279202
resolution: duplicate

superseder: problem using multiprocessing with really big objects?
2016-10-22 16:48:33tim.peterssetnosy: + tim.peters
messages: + msg279201
2016-10-22 16:30:52Justin Tingcreate