classification
Title: Unhelpful backtrace for multiprocessing.Queue
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 3.3, Python 3.4, Python 2.7, Python 2.6
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: akuchling, neologix, pitrou, sbt, torsten
Priority: normal Keywords: patch

Created on 2011-01-11 11:16 by torsten, last changed 2015-04-23 00:05 by akuchling. This issue is now closed.

Files
File name Uploaded Description Edit
mp_queue_pickle_in_main_thread.patch sbt, 2011-08-29 16:10 review
Messages (8)
msg125996 - (view) Author: Torsten Landschoff (torsten) * Date: 2011-01-11 11:16
When trying to send an object via a Queue that can't be pickled, one gets a quite unhelpful traceback:

Traceback (most recent call last):
  File "/usr/lib/python2.6/multiprocessing/queues.py", line 242, in _feed
    send(obj)
PicklingError: Can't pickle <type 'module'>: attribute lookup __builtin__.module failed

I have no idea where I am sending this. It would be helpful to get the call trace to the call to Queue.put.

My workaround was to create a Queue via this function MyQueue:

def MyQueue():
    import cPickle
    def myput(obj, *args, **kwargs):
        cPickle.dumps(obj)
        return orig_put(obj, *args, **kwargs)

    q = Queue()
    orig_put, q.put = q.put, myput
    return q

That way I get the pickle exception in the caller to put and was able to find out the offending code.
msg143154 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011-08-29 16:10
mp_queue_pickle_in_main_thread.patch (against the default branch) fixes the problem by doing the pickling in Queue.put().  It is version of a patch for Issue 8037 (although I believe the behaviour complained about in Issue 8037 is not an actual bug).

The patch also has the advantage of ensuring that weakref callbacks and 
__del__ methods for objects put in the queue will not be run in the 
background thread.  (Bytes objects have trivial destructors.)  This 
potentially prevents inconsistent state caused by forking a process 
while the background thread is running -- see Issue 6721.
msg143161 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-08-29 16:45
This shouldn't be a problem in Python 3.3, where the Connection classes are reimplemented in pure Python.
msg143177 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011-08-29 20:10
> This shouldn't be a problem in Python 3.3, where the Connection classes
> are reimplemented in pure Python.

What should not be a problem?

Changes to the implementation of Connection won't affect whether Queue.put() raises an error immediately if it gets an unpicklable argument.  

Nor will they affect whether weakref callbacks or __del__ methods run in a background thread, causing fork-safety issues.
msg143179 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-08-29 20:52
> Changes to the implementation of Connection won't affect whether
> Queue.put() raises an error immediately if it gets an unpicklable
> argument.

Ah, right. Then indeed it won't make a difference.
msg182734 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-02-23 11:03
I'm closing, since issue #17025 proposes to do this as part of performance optimization.
msg183537 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-03-05 17:50
For the record, I'm posting thse benchmark numbers here (originally from issue #17025):

"""
with patch:
$ ./python /tmp/multi_queue.py
took 0.7945001125335693 seconds with 1 workers
took 0.7428359985351562 seconds with 2 workers
took 0.7897098064422607 seconds with 3 workers
took 1.1860828399658203 seconds with 4 workers

I tried Richard's suggestion of serializing the data inside put(), but this reduces performance quite notably:
$ ./python /tmp/multi_queue.py
took 1.412883996963501 seconds with 1 workers
took 1.3212130069732666 seconds with 2 workers
took 1.2271699905395508 seconds with 3 workers
took 1.4817359447479248 seconds with 4 workers

Although I didn't analyse it further, I guess one reason could be that if the serializing is done in put(), the feeder thread has nothing to do but keep waiting for data to be available from the buffer, send it, and block until there's more to do: basically, it almost doesn't use its time-slice, and spends its time blocking and doing context switches.
"""

So serializing the data from put() seems to have a significant performance impact (other benchmarks are welcome), that's something to keep in mind.
msg241835 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2015-04-23 00:05
neologix: did you intend to re-open this ticket when you made your 2013-03-05 comment?  It seems to me that you didn't intend to -- your comment doesn't say 're-opening because <reason>'.  I'll close it again; if you want, please re-open it and just explain why.
History
Date User Action Args
2015-04-23 00:05:04akuchlingsetstatus: open -> closed

nosy: + akuchling
messages: + msg241835

resolution: wont fix
stage: resolved
2013-03-05 17:50:39neologixsetmessages: + msg183537
2013-03-05 13:31:23neologixsetstatus: closed -> open
superseder: reduce multiprocessing.Queue contention ->
2013-02-23 11:03:11neologixsetstatus: open -> closed

nosy: + neologix
messages: + msg182734

superseder: reduce multiprocessing.Queue contention
2011-10-06 20:20:39neologixlinkissue8037 superseder
2011-08-29 20:52:21pitrousetmessages: + msg143179
2011-08-29 20:10:04sbtsetmessages: + msg143177
2011-08-29 16:45:16pitrousetnosy: + pitrou
messages: + msg143161
2011-08-29 16:10:54sbtsetfiles: + mp_queue_pickle_in_main_thread.patch
versions: + Python 3.1, Python 2.7, Python 3.2, Python 3.3, Python 3.4
nosy: + sbt

messages: + msg143154

keywords: + patch
2011-01-11 11:16:52torstencreate