classification
Title: multiprocessing.JoinableQueue task_done() issue
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.0, Python 3.1, Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jnoller Nosy List: aloga, ffernand, jnoller, merrellb
Priority: high Keywords:

Created on 2008-12-14 16:48 by merrellb, last changed 2009-08-06 02:10 by jnoller. This issue is now closed.

Files
File name Uploaded Description Edit
unnamed merrellb, 2009-03-30 20:25
unnamed merrellb, 2009-07-08 21:31
Messages (11)
msg77806 - (view) Author: Brian (merrellb) Date: 2008-12-14 16:48
Despite carefully matching my get() and task_done() statements I would 
often trigger "raise ValueError('task_done() called too many times')" in 
my multiprocessing.JoinableQueue (multiprocessing/queues.py)

Looking over the code (and a lot of debug logging), it appears that the 
issue arises from JoinableQueue.put() not being protected with a locking 
mechanism.  A preemption after the first line allows other processes to 
resume without releasing the _unfinished_tasks semaphore.

The simplest solution seems to be allowing task_done() to block while 
waiting to acquire the _unfinished_tasks semaphore.

Replacing:
if not self._unfinished_tasks.acquire(False):
  raise ValueError('task_done() called too many times')

With simply:
self._unfinished_tasks.acquire()

This would however remove the error checking provided (given the many 
far more subtler error that can be made, I might argue it is of limited 
value).  Alternately the JoinableQueue.put() method could be better 
protected.
msg78226 - (view) Author: Brian (merrellb) Date: 2008-12-23 06:40
Here are a few stabs at how this might be addressed.

1)  As originally suggested.  Allow task_done() to block waiting to 
acquire _unfinished_tasks.  This will allow the put() process to resume, 
release() _unfinished_tasks at which point task_done() will unblock.  No 
harm, no foul but you do lose some error checking (and maybe some 
performance?)

2)  One can't protect JoinableQueue.put() by simply acquiring _cond 
before calling Queue.put().  Fixed size queues will block if the queue 
is full, causing deadlock when task_done() can't acquire _cond.  The 
most obvious solution would seem to be reimplementing 
JoinableQueue.put() (not simply calling Queue.put()) and then inserting  self._unfinished_tasks.release() into a protected portion.  Perhaps:

    def put(self, obj, block=True, timeout=None):
        assert not self._closed
        if not self._sem.acquire(block, timeout):
            raise Full

        self._notempty.acquire()
        self._cond.acquire()
        try:
            if self._thread is None:
                self._start_thread()
            self._buffer.append(obj)
            self._unfinished_tasks.release()
            self._notempty.notify()
        finally:
            self._cond.release()
            self._notempty.release()

We may be able to get away with not acquiring _cond as _notempty would 
provide some protection.   However its relationship to get() isn't 
entirely clear to me so I am not sure if this would be sufficient.
msg84613 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009-03-30 18:55
Hi Brian - do you have a chunk of code that exacerbates this? I'm having 
problems reproducing this, and need a test so I can prove out the fix.
msg84639 - (view) Author: Brian (merrellb) Date: 2009-03-30 20:25
Hey Jesse,
It was good meeting you at Pycon.  I don't have anything handy at the moment
although, if memory serves, the most trivial of example seemed to illustrate
the problem.  Basically any situation where a joinable queue would keep
bumping up against being empty (ie retiring items faster than they are being
fed), and does enough work between get() and task_done() to be preempted
would eventually break.  FWIW I was running on a Windows box.

I am afraid I am away from my computer until late tonight but I can try to
cook something up then (I presume you are sprinting today?).  Also I think
the issue becomes clear when you think about what happens if
joinablequeue.task_done() gets preempted between its few lines.

-brian

On Mon, Mar 30, 2009 at 2:55 PM, Jesse Noller <report@bugs.python.org>wrote:

>
> Jesse Noller <jnoller@gmail.com> added the comment:
>
> Hi Brian - do you have a chunk of code that exacerbates this? I'm having
> problems reproducing this, and need a test so I can prove out the fix.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue4660>
> _______________________________________
>
msg86044 - (view) Author: Brian (merrellb) Date: 2009-04-16 21:16
Jesse,

I am afraid my last post may have confused the issue.  As I mentioned in
my first post, the problem arises when JoinableQueue.put is preempted
between its two lines.  Perhaps the easiest way to illustrate this is to
exacerbate it by modifying JoinableQueue.put to force a preemption at
this inopportune time.

import time
def put(self, item, block=True, timeout=None):
    Queue.put(self, item, block, timeout)
    time.sleep(1)
    self._unfinished_tasks.release()

Almost any example will now fail.

from multiprocessing import JoinableQueue, Process

def printer(in_queue):
    while True:
        print in_queue.get()
        in_queue.task_done()

if __name__ == '__main__':
    jqueue = JoinableQueue()
    a = Process(target = printer, args=(jqueue,)).start()
    jqueue.put("blah")
msg89735 - (view) Author: Filipe Fernandes (ffernand) Date: 2009-06-26 18:53
I ran into the same problem and am greatful to Brian for reporting this
as I thought I was loosing my mind.

Brian noted that he was running windows and I can confirm that Brian's
test case is reproducable on my laptop running:

Ubuntu 9.04
python 2.6.2

Although I'm reluctant to try Brian's suggestions without additional
comments even if they do work.  I'll be using this in production.
msg89812 - (view) Author: Brian (merrellb) Date: 2009-06-29 05:55
Filipe,

Thanks for the confirmation.  While I think the second option (ie 
properly protecting JoinableQueue.put()) is best, the first option 
(simply removing the 'task_done() called too many times' check) should 
be safe (presuming your get() and put() calls actually match).  

Jesse, any luck sorting out the best fix for this?  I really think that 
JoinableQueue (in my opinion the most useful form of multiprocessing 
queues) can't be guaranteed to work on any system right now.

-brian
msg89828 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009-06-29 11:46
I'm leaning towards the properly protecting JoinableQueue.put() fix, I'm 
not a terribly big fan of removing error checking. I'm trying to carve off 
time this week to beat on my bug queue, so I'm hoping to be able to commit 
something (once I have docs+tests) this week.
msg90283 - (view) Author: Brian (merrellb) Date: 2009-07-08 21:31
Cool., let me know if there is anything I can do to help.

On Mon, Jun 29, 2009 at 7:46 AM, Jesse Noller <report@bugs.python.org>wrote:

>
> Jesse Noller <jnoller@gmail.com> added the comment:
>
> I'm leaning towards the properly protecting JoinableQueue.put() fix, I'm
> not a terribly big fan of removing error checking. I'm trying to carve off
> time this week to beat on my bug queue, so I'm hoping to be able to commit
> something (once I have docs+tests) this week.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue4660>
> _______________________________________
>
msg91346 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009-08-06 02:08
Fix checked into python trunk with r74326, 26 maint w/ r74327
msg91347 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009-08-06 02:10
I used the protected JoinableQueue put method suggested by Brian.
History
Date User Action Args
2009-08-06 02:10:00jnollersetstatus: open -> closed
resolution: fixed
messages: + msg91347
2009-08-06 02:08:52jnollersetmessages: + msg91346
2009-07-08 21:31:44merrellbsetfiles: + unnamed

messages: + msg90283
2009-06-29 11:46:22jnollersetmessages: + msg89828
2009-06-29 05:55:05merrellbsetmessages: + msg89812
2009-06-26 18:53:16ffernandsetnosy: + ffernand
messages: + msg89735
2009-05-13 09:30:03alogasetnosy: + aloga
2009-04-16 21:16:24merrellbsetmessages: + msg86044
2009-03-30 20:25:32merrellbsetfiles: + unnamed

messages: + msg84639
2009-03-30 18:55:10jnollersetmessages: + msg84613
2009-01-22 19:51:44jnollersetpriority: high
2008-12-23 06:40:36merrellbsetmessages: + msg78226
2008-12-14 17:24:23benjamin.petersonsetassignee: jnoller
nosy: + jnoller
2008-12-14 16:54:21merrellbsettype: behavior
2008-12-14 16:48:02merrellbcreate