This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: Info about used pickle protocol used by multiprocessing.Queue
Type: Stage: resolved
Components: Versions: Python 3.9, Python 3.8, Python 3.7
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: buhtz, eric.smith
Priority: normal Keywords:

Created on 2021-08-12 13:33 by buhtz, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (2)
msg399447 - (view) Author: Christian Buhtz (buhtz) Date: 2021-08-12 13:33
I read some of the PEPs about pickeling. But I would not say that I understood everything.

Of course I checked the docu about multiprocessing.Queue. Currently it is not clear for me which pickle protocol is used by multiprocessing.Queue.
Maybe I missed something in the docu or the docu can be improved?

 - Is there a fixed default - maybe different between the Python versions?
 - Or is the pickle protocol version dynamicly selected depending on the kind/type/size of data put() into the Queue?

Is there a way to find out at runtime which protocol version is used for a specific Queue instance with a specific piece of data?

I use Python 3.7 and 3.9 with Pandas 1.3.5.
I parallelize work with hugh(?) pandas.DataFrame objects. I simply cut them into pieces (on row axis) which number is limited to the machines CPU cores (minus 1). The cutting happens several times in my sripts because
for some things I need the data as one complete DataFrame.
Just for example here is one of such pieces which is given to a worker by argument and send back via Queue - 7 workers!

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 226687 entries, 0 to 226686
Data columns (total 38 columns):
 #   Column              Non-Null Count   Dtype
---  ------              --------------   -----
 0   HASH_ ....
 37  NAME_ORG            226687 non-null  object
dtypes: datetime64[ns](6), float64(1), int64(1), object(30)
memory usage: 65.7+ MB 

I am a bit "scared" that Python wasting my CPU time and does some compression on that data. ;) I just want to get a better idea what is done in the background.
msg399455 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-08-12 14:21
Because this is a usage question and not a bug, you'll get more help using the python-list mailing list or Stack Overflow, or some other Q&A forum.
Date User Action Args
2022-04-11 14:59:48adminsetgithub: 89064
2022-01-15 19:40:53iritkatrielsetstatus: open -> closed
resolution: not a bug
stage: resolved
2021-08-12 14:21:00eric.smithsetnosy: + eric.smith
messages: + msg399455
2021-08-12 13:33:02buhtzcreate