Title: Race condition in concurrent.futures
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: anacrolix, bquinlan, loewis, pitrou, python-dev, rosslagerwall, schmir
Priority: normal Keywords: patch

Created on 2012-03-25 18:36 by anacrolix, last changed 2012-08-20 13:09 by schmir. This issue is now closed.

File name Uploaded Description Edit anacrolix, 2012-03-25 18:36
concurrent.futures._AllCompletedWaiter-race-condition.patch anacrolix, 2012-03-25 18:37 review
concurrent.futures._AllCompletedWaiter.num_pending_calls-race-test.patch anacrolix, 2012-03-31 09:34 review
Messages (8)
msg156764 - (view) Author: Matt Joiner (anacrolix) Date: 2012-03-25 18:36
There's a race condition in concurrent.futures in _AllCompletedWaiter, which affects wait(return_when=ALL_COMPLETED).

The attached test will go into an infinite wait.
msg156765 - (view) Author: Matt Joiner (anacrolix) Date: 2012-03-25 18:37
Patch attached.
msg157063 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-03-29 16:16
Could your patch also include a proper test case in Lib/test/ ?
msg157121 - (view) Author: Matt Joiner (anacrolix) Date: 2012-03-30 02:45
I'll add this shortly.
msg157167 - (view) Author: Matt Joiner (anacrolix) Date: 2012-03-31 09:34
Patch with a test included. Being a nondeterministic bug, please adjust the thread count, or timing as necessary, the parameters in the patch are as low as I can get them and still reasonably reproduce the bug (linux 3.2, i386).

There's a few complications in testing this. Firstly timeouts cannot be used, because at the end of the timeout, all the futures are checked to categorize them into done and not done. Except for taking the entire duration of the timeout given to return, the bug is masked in this case.

Secondly, it can't be known how long a wait *should* take, so failing after some duration and assuming a stall due to the race decrementing num_pending_calls would be a guess dependent on the capabilities of the system.

If a timeout needs to be added (and fired from an outside thread for reasons given above), please let me know, and direct me to an example of this kind of testing elsewhere in the stdlib, so I can do this idiomatically.

Without applying the patch, the test case provided will (usually) hang indefinitely.
msg157219 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-03-31 18:30
New changeset 0312db5265d0 by Antoine Pitrou in branch '3.2':
Issue #14406: Fix a race condition when using `concurrent.futures.wait(return_when=ALL_COMPLETED)`.

New changeset 2c1432552213 by Antoine Pitrou in branch 'default':
Issue #14406: Fix a race condition when using `concurrent.futures.wait(return_when=ALL_COMPLETED)`.
msg157220 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-03-31 18:31
The patch looks fine, I've committed it. Thanks!
msg168657 - (view) Author: Ralf Schmitt (schmir) Date: 2012-08-20 13:09
I think having a lock here is unnecessary. The following code should work:

def _decrement_pending_calls(self):
    if self.num_pending_calls == len(self.finished_futures):

(well, maybe the method should also be renamed then)
Date User Action Args
2012-08-20 13:09:45schmirsetmessages: + msg168657
2012-08-20 12:40:31schmirsetnosy: + schmir
2012-03-31 18:31:20pitrousetstatus: open -> closed
resolution: fixed
messages: + msg157220

stage: resolved
2012-03-31 18:30:55python-devsetnosy: + python-dev
messages: + msg157219
2012-03-31 09:34:52anacrolixsetfiles: + concurrent.futures._AllCompletedWaiter.num_pending_calls-race-test.patch

messages: + msg157167
2012-03-30 02:45:42anacrolixsetmessages: + msg157121
2012-03-29 16:16:01pitrousetmessages: + msg157063
2012-03-25 18:41:23anacrolixsetnosy: + loewis, bquinlan, pitrou, rosslagerwall
2012-03-25 18:37:27anacrolixsetfiles: + concurrent.futures._AllCompletedWaiter-race-condition.patch
keywords: + patch
messages: + msg156765
2012-03-25 18:36:42anacrolixcreate