classification
Title: concurrent.futures.as_completed() installs waiters for already completed Futures
Type: performance Stage:
Components: Library (Lib) Versions: Python 3.3, Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: bquinlan, mark.dickinson, vstinner
Priority: normal Keywords: patch

Created on 2014-01-18 21:19 by glangford, last changed 2014-07-18 16:46 by glangford.

Files
File name Uploaded Description Edit
as_completed_proposed.py glangford, 2014-01-21 15:19
as_completed_proposed.patch vstinner, 2014-01-23 13:48 review
test_dupfuture.py glangford, 2014-01-23 14:43
Messages (7)
msg208418 - (view) Author: Glenn Langford (glangford) * Date: 2014-01-18 21:19
as_completed() calls _create_and_install_waiters() over all given Futures, even if some (or all) of the Futures have already completed.

The following fragment of the as_completed code (from 3.3.3):

with _AcquireFutures(fs):
        finished = set(
                f for f in fs
                if f._state in [CANCELLED_AND_NOTIFIED, FINISHED])
        pending = set(fs) - finished
        waiter = _create_and_install_waiters(fs, _AS_COMPLETED)

installs waiters into Futures contained in fs, rather than pending. 

A more efficient approach might be to only install waiters on the pending subset if necessary. This would be faster and would also result in less dwell time with all of the Future conditions locked via the _AcquireFutures context manager.

Also: waiters are appended with the Future condition lock acquired, but are removed (at the conclusion of as_completed) without the _AcquireFutures condition lock. Is this correct?
msg208593 - (view) Author: Glenn Langford (glangford) * Date: 2014-01-21 00:21
See also http://bugs.python.org/issue20319
msg208652 - (view) Author: Glenn Langford (glangford) * Date: 2014-01-21 15:19
Uploading proposed new version of as_completed() for review. Note the following changes:

- does not install waiters for Futures which are completed
- locks only one Future at a time to improve concurrency (rather than locking all Futures at once); traverses Futures in the order given, as no need to sort into a canonical order
- immediately yields each completed Future, without waiting to lock and examine other Futures
- fixes locking bug in waiter removal
msg208921 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-01-23 13:48
I attached Glenn's function as a patch.

Comments on as_completed_proposed.patch:

- _create_and_install_waiters() doesn't remove duplicates using set(), so you change the behaving of as_completed(). I don't think that it is correct. as_completed([f, f]) should yield f twice, but I didn't check the current behaviour. Future._waiters is a list and so accept duplicated waiters.

- you replaced the _AcquireFutures context manager on all futures with an individual lock. In my opinion, it should be changed in a different patch. I don't know which option is better, but if it is changed, it should be changed in the whole file.

@Glenn: You should take a look at the Python Developer's Guide to learn how to write a patch and how to contribute to Python ;-)
http://docs.python.org/devguide/
msg208942 - (view) Author: Glenn Langford (glangford) * Date: 2014-01-23 14:33
Comments on @Victor's comments - I will have a look at the dev guide.  :-)

I think you have identified another bug in the current code. 

"Future._waiters is a list and so accept duplicated waiters." What this means is that the following code is broken, because it attempts to remove the Future twice from the pending set.

for future in finished:
     yield future
     pending.remove(future)  ## KeyError triggered on as_completed( [f,f] )

See attached test which demonstrates the breakage.

Also, the behaviour of as_completed([f,f]) and wait([f,f], return_when=ALL_COMPLETED) is currently different. wait will only yield f once.
msg208943 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-01-23 14:36
> I think you have identified another bug in the current code.

Please open a separated issue in this case.
msg208974 - (view) Author: Glenn Langford (glangford) * Date: 2014-01-23 18:26
> - you replaced the _AcquireFutures context manager on all futures with an >individual lock. In my opinion, it should be changed in a different patch. I >don't know which option is better, but if it is changed, it should be changed >in the whole file.

I can pursue the replacement of _AcquireFutures in a different patch, however I don't necessarily agree that it should be changed in the whole file. as_completed() is different than wait(), since wait() is obligated to return done and notdone Futures, which might require a different implementation. Initially, it may be better to only change as_completed(), where there is a clear concurrency win.
History
Date User Action Args
2014-07-18 16:46:17glangfordsetnosy: - glangford
2014-01-23 18:26:14glangfordsetmessages: + msg208974
2014-01-23 14:43:21glangfordsetfiles: + test_dupfuture.py
2014-01-23 14:41:37glangfordsetfiles: - test_dupfuture.py
2014-01-23 14:36:16vstinnersetmessages: + msg208943
2014-01-23 14:33:33glangfordsetfiles: + test_dupfuture.py

messages: + msg208942
2014-01-23 13:48:03vstinnersetfiles: + as_completed_proposed.patch

nosy: + vstinner
messages: + msg208921

keywords: + patch
2014-01-23 12:45:20glangfordsetnosy: + mark.dickinson
2014-01-21 21:23:20glangfordsetnosy: + bquinlan
2014-01-21 15:19:29glangfordsetfiles: + as_completed_proposed.py

messages: + msg208652
2014-01-21 00:21:14glangfordsetmessages: + msg208593
2014-01-18 21:19:35glangfordcreate