This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Supporting out-of-band buffers (pickle protocol 5) in multiprocessing
Type: performance Stage:
Components: IO, Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: jakirkham
Priority: normal Keywords:

Created on 2021-09-27 19:04 by jakirkham, last changed 2022-04-11 14:59 by admin.

Messages (1)
msg402736 - (view) Author: (jakirkham) Date: 2021-09-27 19:04
In Python 3.8+, pickle protocol 5 ( PEP<574> ) was added, which supports out-of-band buffer collection[1]. The idea being that when pickling an object with a large amount of data attached to it (like an array, dataframe, etc.) one could collect this large amount of data alongside the normal pickled data without causing a copy. This is important in particular when serializing data for communication between two python instances. IOW this is quite valuable when using a `multiprocessing.pool.Pool`[2] or a `concurrent.futures.ProcessPoolExecutor`[3]. However AFAICT neither of these leverage this functionality[4][5]. To ensure zero-copy processing of large data, it would be helpful for pickle protocol 5 to be used in both of these pools.


[1] https://docs.python.org/3/library/pickle.html#pickle-oob
[2] https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool
[3] https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor
[4] https://github.com/python/cpython/blob/16b5bc68964c6126845f4cdd54b24996e71ae0ba/Lib/multiprocessing/queues.py#L372
[5] https://github.com/python/cpython/blob/16b5bc68964c6126845f4cdd54b24996e71ae0ba/Lib/multiprocessing/queues.py#L245
History
Date User Action Args
2022-04-11 14:59:50adminsetgithub: 89467
2021-09-27 19:04:19jakirkhamcreate