This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: multiprocessing deadlocks when sending large data through Queue with timeout
Type: Stage:
Components: Library (Lib) Versions: Python 2.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: jnoller Nosy List: DavidDecotigny, jnoller
Priority: normal Keywords:

Created on 2008-09-05 22:35 by DavidDecotigny, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
c.py DavidDecotigny, 2008-09-05 22:35 Example showing the bug ("Happy" never displayed)
Messages (6)
msg72640 - (view) Author: David Decotigny (DavidDecotigny) Date: 2008-09-05 22:35
With the attached script, then demo() called with for example
datasize=40*1024*1024 and timeout=1 will deadlock: the program never
terminates.

The bug appears on Linux (RHEL4) / intel x86 with "multiprocessing"
coming with python 2.6b3 and I think it can be easily reproduced on
other Unices. It also appears with python 2.5 and the standalone
processing package 0.52
(https://developer.berlios.de/bugs/?func=detailbug&bug_id=14453&group_id=9001).

After a quick investigation, it seems to be a deadlock between waitpid
in the parent process, and a pipe::send in the "_feed" thread of the
child process. Indeed, the problem seems to be that "_feed" is still
sending data (the data is laaarge) to the pipe while the parent process
already called waitpid (because of the "short" timeout): the pipe fills
up because no consumer is eating the data (consumer already in waitpid)
and hence the "_feed" thread in the child blocks forever. Since the
child process does a _feed.join() before exiting (after function f), it
never exits. And hence the waitpid in the parent process never returns
because the child never exits.

This doesn't happen anymore if I use timeout=None or a larger timeout
(eg. 10 seconds). Because in both cases, waitpid is called /after/ the
"_feed" thread in the child process could send all of its data through
the pipe.
msg72655 - (view) Author: David Decotigny (DavidDecotigny) Date: 2008-09-06 00:38
A quick fix in the user code, when we are sure we don't need the child
process if a timeout happens, is to call worker.terminate() in an except
Empty clause.
msg72657 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2008-09-06 01:10
See http://docs.python.org/dev/library/multiprocessing.html#multiprocessing-
programming

Specifically:
Joining processes that use queues

Bear in mind that a process that has put items in a queue will wait 
before terminating until all the buffered items are fed by the “feeder” 
thread to the underlying pipe. (The child process can call the 
Queue.cancel_join() method of the queue to avoid this behaviour.)

This means that whenever you use a queue you need to make sure that all 
items which have been put on the queue will eventually be removed before 
the process is joined. Otherwise you cannot be sure that processes which 
have put items on the queue will terminate. Remember also that non-
daemonic processes will be automatically be joined.
msg72658 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2008-09-06 01:16
In a later release, I'd like to massage this in such a way that you do not 
have to wait for a child queue to be drained prior to calling join.

One way to work around this David, is to call Queue.cancel_join_thread():

def f(datasize, q):
    q.cancel_join_thread()
    q.put(range(datasize))
msg72659 - (view) Author: David Decotigny (DavidDecotigny) Date: 2008-09-06 01:45
Thank you Jesse. When I read this passage, I thought naively that a
timeout raised in a get() would not be harmful: that somehow the whole
get() request would be aborted. But now I realize that it would make
things rather complicated and dangerous: the data would get dropped, and
will never be recovered by subsequent get().
So thank you for the hint, and leave the things as they are, it's better.
msg72660 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2008-09-06 01:55
No problem David, you're the 4th person to ask me about this in the past 2 
months :)
History
Date User Action Args
2022-04-11 14:56:38adminsetgithub: 48039
2008-09-06 01:55:37jnollersetmessages: + msg72660
2008-09-06 01:45:24DavidDecotignysetmessages: + msg72659
2008-09-06 01:20:34jnollersetstatus: open -> closed
resolution: not a bug
2008-09-06 01:16:24jnollersetmessages: + msg72658
2008-09-06 01:10:44jnollersetmessages: + msg72657
2008-09-06 00:38:34DavidDecotignysetmessages: + msg72655
2008-09-05 22:36:18benjamin.petersonsetassignee: jnoller
nosy: + jnoller
2008-09-05 22:35:25DavidDecotignycreate