This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: multiprocess error on large dataset
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: vishalraoanizer
Priority: normal Keywords:

Created on 2020-09-04 23:14 by vishalraoanizer, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (1)
msg376415 - (view) Author: vishal rao (vishalraoanizer) Date: 2020-09-04 23:14
I am processing a large pandas dataframe using pathos framework which internally uses Python multiprocess package. I get the following error when i run the code with a large dataset. The issue doesn't occur on smaller datasets.
/opt/conda/lib/python3.7/site-packages/pathos/multiprocessing.py in map(self, f, *args, **kwds)
    135         AbstractWorkerPool._AbstractWorkerPool__map(self, f, *args, **kwds)
    136         _pool = self._serve()
--> 137         return _pool.map(star(f), zip(*args)) # chunksize
    138     map.__doc__ = AbstractWorkerPool.map.__doc__
    139     def imap(self, f, *args, **kwds):

/opt/conda/lib/python3.7/site-packages/multiprocess/pool.py in map(self, func, iterable, chunksize)
    266         in a list that is returned.
    267         '''
--> 268         return self._map_async(func, iterable, mapstar, chunksize).get()
    269 
    270     def starmap(self, func, iterable, chunksize=None):

/opt/conda/lib/python3.7/site-packages/multiprocess/pool.py in get(self, timeout)
    655             return self._value
    656         else:
--> 657             raise self._value
    658 
    659     def _set(self, i, obj):

/opt/conda/lib/python3.7/site-packages/multiprocess/pool.py in _handle_tasks(taskqueue, put, outqueue, pool, cache)
    429                         break
    430                     try:
--> 431                         put(task)
    432                     except Exception as e:
    433                         job, idx = task[:2]

/opt/conda/lib/python3.7/site-packages/multiprocess/connection.py in send(self, obj)
    207         self._check_closed()
    208         self._check_writable()
--> 209         self._send_bytes(_ForkingPickler.dumps(obj))
    210 
    211     def recv_bytes(self, maxlength=None):

/opt/conda/lib/python3.7/site-packages/multiprocess/connection.py in _send_bytes(self, buf)
    394         n = len(buf)
    395         # For wire compatibility with 3.2 and lower
--> 396         header = struct.pack("!i", n)
    397         if n > 16384:
    398             # The payload is large so Nagle's algorithm won't be triggered

error: 'i' format requires -2147483648 <= number <= 2147483647

I ran the code in debug mode, and saw that the value of n was 3140852627.
History
Date User Action Args
2022-04-11 14:59:35adminsetgithub: 85888
2020-09-05 02:31:13vishalraoanizersetstatus: open -> closed
stage: resolved
2020-09-04 23:14:43vishalraoanizercreate