In Python 3.8+, pickle protocol 5 ( PEP<574> ) was added, which supports out-of-band buffer collection[1]. The idea being that when pickling an object with a large amount of data attached to it (like an array, dataframe, etc.) one could collect this large amount of data alongside the normal pickled data without causing a copy. This is important in particular when serializing data for communication between two python instances. IOW this is quite valuable when using a `multiprocessing.pool.Pool`[2] or a `concurrent.futures.ProcessPoolExecutor`[3]. However AFAICT neither of these leverage this functionality[4][5]. To ensure zero-copy processing of large data, it would be helpful for pickle protocol 5 to be used in both of these pools.
[1] https://docs.python.org/3/library/pickle.html#pickle-oob
[2] https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool
[3] https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor
[4] https://github.com/python/cpython/blob/16b5bc68964c6126845f4cdd54b24996e71ae0ba/Lib/multiprocessing/queues.py#L372
[5] https://github.com/python/cpython/blob/16b5bc68964c6126845f4cdd54b24996e71ae0ba/Lib/multiprocessing/queues.py#L245
|