classification
Title: The fact that multiprocess.Queue uses serialization should be documented.
Type: enhancement Stage: needs patch
Components: Documentation Versions: Python 3.7, Python 3.6, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Bernhard10, davin, docs@python, r.david.murray
Priority: normal Keywords:

Created on 2016-12-14 14:38 by Bernhard10, last changed 2016-12-14 16:33 by davin.

Files
File name Uploaded Description Edit
mwe.py Bernhard10, 2016-12-14 14:38 Minimal working example to reproduce this bug/ surprising behaviour.
Messages (6)
msg283192 - (view) Author: Bernhard10 (Bernhard10) Date: 2016-12-14 14:38
When I did some tests involving unittest.mock.sentinel and multiprocessing.Queue, I noticed that multiprocessing.Queue changes the id of the sentinel.

This behaviour is definitely surprising and not documented.
msg283193 - (view) Author: Bernhard10 (Bernhard10) Date: 2016-12-14 15:05
See http://stackoverflow.com/a/925241/5069869

Apparently multiprocessing.Queue uses pickle to serialize the objects in the queue, which explains the change of identity, but is absolutely unclear from the documentation.
msg283195 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-12-14 15:12
That fact that this is so is implicit in the name multi*process*ing and
the documented restrictions of the id function.  That is, it is the purpose of the module is to manage computation across multiple processes.  Since different processes have distinct memory spaces, you cannot depend on object identity between processes, by the definition of object identity (it is constant only for the lifetime of the object in memory, and the different processes have different memory spaces, therefore the object id may be different in the different processes).  By construction this applies also to any multiprocessing mechanism that is used to transmit objects, even if the transmission turns out to be to the same process in a particular case.  You can't *depend* on the id in that case, because the transmission mechanism must be free to change the object identity in order to work in the general case.

Should we document this explicitly?  Perhaps so.  Maybe in the multiprocessing introduction?
msg283198 - (view) Author: Bernhard10 (Bernhard10) Date: 2016-12-14 15:18
My first thought was that Queue was implemented using shared memory.
I guess from the fact that the "Shared memory" section is separate in the multiprocessing documentation I should have known better, though.

So I guess some clarification in the documentation would be helpful.
msg283199 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-12-14 15:37
Yeah, that's why I said "in the general case".  Making it clear in the overview seems reasonable to me.
msg283206 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2016-12-14 16:33
All communication between processes in multiprocessing has consistently used pickle to serialize the data being communicated (this includes what is described in the "Shared memory" section of the docs).  The documentation has not done a great job of making this clear, instead only describing the requirement that data be pickleable in select places.  For example, in the section on Queues:
    Note: When an object is put on a queue, the object is pickled and a
    background thread later flushes the pickled data to an underlying pipe.

Though it only applies to 3.6+, issue28053 still needs its own documentation improvement to make clear that the mechanism for communicating data defaults to serialization by pickle but that this can be replaced by alternatives.

I agree that the documentation around the use of pickle in multiprocessing deserves improvement.
History
Date User Action Args
2016-12-14 16:33:49davinsetversions: + Python 3.6, Python 3.7
nosy: + davin

messages: + msg283206

type: behavior -> enhancement
stage: needs patch
2016-12-14 15:37:09r.david.murraysetmessages: + msg283199
2016-12-14 15:18:24Bernhard10setmessages: + msg283198
2016-12-14 15:12:31r.david.murraysetnosy: + r.david.murray
messages: + msg283195
2016-12-14 15:05:57Bernhard10settitle: multiprocess.Queue changes objects identity -> The fact that multiprocess.Queue uses serialization should be documented.
nosy: + docs@python

messages: + msg283193

assignee: docs@python
components: + Documentation
2016-12-14 14:38:29Bernhard10create