classification
Title: join method of multiprocessing Pool object hangs if iterable argument of pool.map is empty
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: gkcn, haypo, jnoller, mouad, neologix, petri.lehtinen, python-dev, rosslagerwall, sbt, terry.reedy
Priority: normal Keywords: patch

Created on 2011-05-23 11:42 by gkcn, last changed 2012-06-07 22:18 by sbt. This issue is now closed.

Files
File name Uploaded Description Edit
multi.py gkcn, 2011-05-23 11:42 Code to reproduce the bug
issue-12157.patch mouad, 2011-06-25 14:32 Remove the MapResult instance from the Pool cache when the iterable passed to map is empty. review
issue-12157.patch mouad, 2011-06-25 16:46 Don't Try to use any fancy way to check if the join will hang, leave all the job to faulthandler. review
Messages (9)
msg136613 - (view) Author: Gökcen Eraslan (gkcn) * Date: 2011-05-23 11:42
When I use map method Pool object with an empty list parameter and then call close and wait methods, join() method hangs. I think this is not intended.

Code to reproduce the bug is attached. 

PS: A similar issue (using map method with an empty list argument) is reported here[1], but it was about the chunksize parameter and it's resolved.

[1] http://bugs.python.org/issue6433
msg137154 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-05-28 21:30
I ran with 3.2, winxp with "if __name__ == '__main__':" added after the def statement (without this, process spawned 150 processes before I got logged out) and ()s added to prints. Hung on pool.join as OP said. I could only stop by closing command window as ^C was ignored. Any new test should have a timeout ;-).
msg137157 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-05-28 22:10
When map is called, a MapResult object is created, which adds itself to the Pool's result cache.
When the pool is shut down, the result handler thread waits until the cache drains (while cache and thread._state != TERMINATE). But since no result is posted to the result queue (since the iterable is empty), the result handler never receives any task, and never gets to drain the cache. It thus waits forever on the recv on the result queue.
msg139064 - (view) Author: mouad (mouad) * Date: 2011-06-25 13:39
Hello,

This is my first patch to cpython, hope it will be accepted :)

The fix that i did is to remove the ResultMap instance from the pool cache when the iterable is empty. 

In general here is what happen: The "map" method create a MapResult instance, which add it self automatically to the pool._cache and this ResultMap instance will be used by the task that will be created and added after in the "pool._taskqueue" to communicate the task result, but in case of an empty iterable the task will not be created and we will end up with a MapResult with no task and when we will try to join the pool, it will hang waiting for the task to set the result in the MapResult instance.

For the test i created a new helper `operation_timeout` that is used as a contextmanager to make sure that the test will not hang for ever, i don't know if it's useful maybe just running the test without checking for any timeout is more *realistic*.
msg139078 - (view) Author: mouad (mouad) * Date: 2011-06-25 14:55
The test case use a helper function in test/support.py that i have proposed in issue #12410.

I'm dropping this comment here because i don't have the rights to edit the issue dependency.

cheers;
msg139101 - (view) Author: mouad (mouad) * Date: 2011-06-25 16:48
Here is a new patch that in the opposite of the first one, it don't try to check if the pool.join() will hang or no, after a discussion with neologix in issue #12410 .
msg139125 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-06-25 22:31
The patch to the multiprocessing code is trivial:
+            del cache[self._job]

The difference in tests is
+        with test.support.operation_timeout(5):
+            p.join()
versus
+        p.join()

Victor, do you agree with the simpler method, depending on faulthandler to catch a hang in the test and fail it? Or is the explicit timeout better?
msg139128 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-06-25 23:02
>Don't Try to use any fancy way to check if the join will hang,
> leave all the job to faulthandler.	

> Victor, do you agree with the simpler method, depending
> on faulthandler to catch a hang in the test and fail it?
> Or is the explicit timeout better?

If the patch fixes the hang, there is no good reason to write code to handle a new hang.

We have generic "watchdogs":

 - buildbot timeout (any Python version)
 - regrtest timeout implemented using faulthandler (only in Python 3.x)

If you run directly the .py test file on a command line, you can still use CTRL+c or CTRL+z to interrupt / stop the process.

You may want to improve these generic watchdogs, but write a specific watchdog for one specific test function looks useless to me.

Remember that timeouts are not reliable: we have sometimes false failures because of very slow buildbots... For regrtest timeout, I tried 10, 15, 20 and 30 minutes before choosing a timeout of 60 minutes. For lower values, we have many false failures.
msg162493 - (view) Author: Roundup Robot (python-dev) Date: 2012-06-07 19:42
New changeset 1b3d4ffcb4d1 by Richard Oudkerk in branch '3.2':
Issue #12157: pool.map() does not handle empty iterable correctly
http://hg.python.org/cpython/rev/1b3d4ffcb4d1

New changeset 3585cb1388f2 by Richard Oudkerk in branch 'default':
Merge fixes for #13854 and #12157.
http://hg.python.org/cpython/rev/3585cb1388f2

New changeset 7ab7836894c4 by Richard Oudkerk in branch '2.7':
Issue #12157: pool.map() does not handle empty iterable correctly
http://hg.python.org/cpython/rev/7ab7836894c4
History
Date User Action Args
2012-06-07 22:18:20sbtsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2012-06-07 19:42:03python-devsetnosy: + python-dev
messages: + msg162493
2012-06-06 12:25:09sbtsetnosy: + sbt
2012-02-04 11:55:08rosslagerwallsetnosy: + rosslagerwall
2012-02-04 10:33:59neologixlinkissue13937 superseder
2011-12-07 17:49:26neologixlinkissue13542 superseder
2011-06-25 23:02:01hayposetmessages: + msg139128
2011-06-25 22:31:00terry.reedysetmessages: + msg139125
stage: test needed -> patch review
2011-06-25 21:56:47hayposetnosy: + haypo
2011-06-25 16:48:57mouadsetmessages: + msg139101
2011-06-25 16:46:42mouadsetfiles: + issue-12157.patch
2011-06-25 14:55:34mouadsetmessages: + msg139078
2011-06-25 14:32:56mouadsetfiles: + issue-12157.patch
2011-06-25 14:32:30mouadsetfiles: - support.patch
2011-06-25 14:32:27mouadsetfiles: - test_multiprocess.patch
2011-06-25 14:32:23mouadsetfiles: - pool.patch
2011-06-25 13:39:07mouadsetnosy: + mouad
messages: + msg139064
2011-06-25 13:23:31mouadsetfiles: + support.patch
2011-06-25 13:20:58mouadsetfiles: + test_multiprocess.patch
2011-06-25 13:19:13mouadsetfiles: + pool.patch
keywords: + patch
2011-05-28 22:10:28neologixsetnosy: + neologix
messages: + msg137157
2011-05-28 21:30:46terry.reedysetversions: + Python 3.2, Python 3.3
nosy: + terry.reedy

messages: + msg137154

stage: test needed
2011-05-26 08:01:34petri.lehtinensetnosy: + petri.lehtinen
2011-05-23 11:42:45gkcncreate