Issue 44901: Info about used pickle protocol used by multiprocessing.Queue

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/89064

classification

Title:	Info about used pickle protocol used by multiprocessing.Queue
Type:		Stage:	resolved
Components:		Versions:	Python 3.9, Python 3.8, Python 3.7

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	buhtz, eric.smith
Priority:	normal	Keywords:

Created on 2021-08-12 13:33 by buhtz, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (2)
msg399447 - (view)	Author: Christian Buhtz (buhtz)	Date: 2021-08-12 13:33
I read some of the PEPs about pickeling. But I would not say that I understood everything. Of course I checked the docu about multiprocessing.Queue. Currently it is not clear for me which pickle protocol is used by multiprocessing.Queue. Maybe I missed something in the docu or the docu can be improved? - Is there a fixed default - maybe different between the Python versions? - Or is the pickle protocol version dynamicly selected depending on the kind/type/size of data put() into the Queue? Is there a way to find out at runtime which protocol version is used for a specific Queue instance with a specific piece of data? Background: I use Python 3.7 and 3.9 with Pandas 1.3.5. I parallelize work with hugh(?) pandas.DataFrame objects. I simply cut them into pieces (on row axis) which number is limited to the machines CPU cores (minus 1). The cutting happens several times in my sripts because for some things I need the data as one complete DataFrame. Just for example here is one of such pieces which is given to a worker by argument and send back via Queue - 7 workers! <class 'pandas.core.frame.DataFrame'> RangeIndex: 226687 entries, 0 to 226686 Data columns (total 38 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 HASH_ .... .... 37 NAME_ORG 226687 non-null object dtypes: datetime64[ns](6), float64(1), int64(1), object(30) memory usage: 65.7+ MB I am a bit "scared" that Python wasting my CPU time and does some compression on that data. ;) I just want to get a better idea what is done in the background.
msg399455 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2021-08-12 14:21
Because this is a usage question and not a bug, you'll get more help using the python-list mailing list or Stack Overflow, or some other Q&A forum.

History
Date	User	Action	Args
2022-04-11 14:59:48	admin	set	github: 89064
2022-01-15 19:40:53	iritkatriel	set	status: open -> closed resolution: not a bug stage: resolved
2021-08-12 14:21:00	eric.smith	set	nosy: + eric.smith messages: + msg399455
2021-08-12 13:33:02	buhtz	create