This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Bug in multiprocessing.JoinableQueue() implementation on Ubuntu 11.04
Type: behavior Stage: test needed
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: Michael.Hall, davin, jnoller, meador.inge
Priority: normal Keywords:

Created on 2011-08-12 00:25 by Michael.Hall, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test_case.zip Michael.Hall, 2011-08-12 18:11 Test Case
Messages (7)
msg141932 - (view) Author: Michael Hall (Michael.Hall) Date: 2011-08-12 00:25
I recently switched to Ubuntu 11.04 from OpenSUSE 11.4, and when I go to run a project I coded a couple days ago under OpenSUSE using the multiprocessing library, it hangs when it did not under OpenSUSE.

Specifically, I am using two queues, work_queue from which the children get jobs, and results_queue where they place their results before calling JoinableQueue.task_done() and grabbing the next result. I use the "poison pill" technique to terminate the children, where a None object is placed at the end of the queue for each child, and when they get one of the terminating objects they call task_done() again (to account for the None object) and exit.

In the main process, after spawning all of the children (one per physical CPU), it joins with the work_queue in order to wait for all of its children to finish.

This is pretty much a cookie-cutter multiprocessing implementation that I've used successfully for years under OpenSUSE, but for some odd reason the exact same code does not work under Ubuntu.

I would try porting to python 3.x, but the rest of my research team is still using 2.7, so that's not really an option right now.
msg141933 - (view) Author: Michael Hall (Michael.Hall) Date: 2011-08-12 00:30
Edit: Sorry, I should have been more clear. The hang occurs after the first child process exits, at which point all four children become zombies (none of the others exit, they just zombify immediately), and the main process sits there waiting forever for the rest of the children to clear out the queue, which of course never happens.
msg141972 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2011-08-12 16:50
Michael,

It is hard to tell from your description alone where the bug is.  Could you provide more detailed reproduction steps with a test case that exhibits the issue?
msg141982 - (view) Author: Michael Hall (Michael.Hall) Date: 2011-08-12 18:11
Okay, I have attached the code I've been using. Don't worry about what it does (it's a biology thing), but just follow these steps:

1. Make sure you have numpy and scipy installed.
2. Extract the zip file.
3. Run it with ./svm_main.py test_obligate.dat test_transient.dat

The method svm_main.grid_search and the module grid_search_process are probably the only things you need pay attention to, everything else is problem-specific.
msg142090 - (view) Author: Michael Hall (Michael.Hall) Date: 2011-08-15 00:55
I tried switching from joining on the work_queue to just joining on the individual child processes, and it seems to work now. Weird. Anyway, it'd be nice to see the JoinableQueue fixed, but it's not pressing any more.
msg235625 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2015-02-09 17:53
Thank you for the provided test case but because it depends upon compiled code (the libsvm.so.2 file you supplied) it:
(1) makes me wonder if the issue might not arise from an issue inside the supplied library (perhaps it was not rebuilt properly on your Ubuntu 11.04 system after migrating to it from OpenSUSE 11.4 -- the timestamp on the libsvm.so.2 file appears to support this suspicion);
(2) does not give us a reasonably concise test case to be able to debug and begin to try to understand.

Would it be possible to supply a simpler demonstration of the issue that perhaps only involves Python code?

I realize this issue is quite stale now and that you (Michael) have already reported discovering a workaround.
msg240478 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2015-04-11 14:41
Closing this very stale issue as out of date with no response from OP since request months ago for enough info to be able to proceed.
History
Date User Action Args
2022-04-11 14:57:20adminsetgithub: 56947
2015-04-11 14:41:57davinsetstatus: pending -> closed
resolution: out of date
messages: + msg240478
2015-02-10 14:51:10davinsetstatus: open -> pending
2015-02-09 17:53:03davinsetnosy: + davin
messages: + msg235625
2011-08-15 00:55:51Michael.Hallsetmessages: + msg142090
2011-08-12 18:11:39Michael.Hallsetfiles: + test_case.zip

messages: + msg141982
2011-08-12 16:50:27meador.ingesetnosy: + meador.inge, jnoller

messages: + msg141972
stage: test needed
2011-08-12 00:30:41Michael.Hallsettype: behavior
messages: + msg141933
2011-08-12 00:25:42Michael.Hallcreate