Title: Provide an async-generator version of as_completed
Type: enhancement Stage: patch review
Components: asyncio Versions: Python 3.8
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: asvetlov, hniksic, mivade, yselivanov
Priority: normal Keywords: patch

Created on 2018-05-16 06:48 by hniksic, last changed 2018-10-31 02:17 by mivade.

Pull Requests
URL Status Linked Edit
PR 10251 open mivade, 2018-10-31 02:17
Messages (7)
msg316773 - (view) Author: Hrvoje Nikšić (hniksic) Date: 2018-05-16 06:48
Judging by questions on the StackOverflow python-asyncio tag[1][2], it seems that users find it hard to understand how to use as_completed correctly. I have identified three issues:

* It's somewhat sparingly documented.

A StackOverflow user ([2]) didn't find it obvious that it runs the futures in parallel. Unless one is already aware of the meaning, the term "as completed" could suggest that they are executed and completed sequentially.

* Unlike its concurrent.futures counter-part, it's non-blocking.

This sounds like a good idea because it's usable from synchronous code, but it means that the futures it yields aren't completed, you have to await them first. This is confusing for a function with "completed" in the name, and is not the case with concurrent.futures.as_completed, nor with other waiting functions in asyncio (gather, wait, wait_for).

* It yields futures other than those that were passed in.

This prevents some usual patterns from working, e.g. associating the results with context data, such as Python docs itself uses for concurrent.futures.as_completed in .  See SO question [1] for a similar request in asyncio.

Here is my proposal to address the issues.

I believe the usage problems stem from as_completed predating the concept of async iterators. If we had async iterators back then, as_completed would have been an obvious candidate to be one. In that case it could be both "blocking" (but not for the event loop) and return the original futures. For example:

async def as_completed2(fs):
    pending = set(map(asyncio.ensure_future(fs)))
    while pending:
        done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
        yield from done

(It is straightforward to add support for a timeout argument.)

I propose to deprecate asyncio.as_completed and advertise the async-iterator version like the one presented here - under a nicer name, such as as_done(), or as_completed_async().

msg316774 - (view) Author: Hrvoje Nikšić (hniksic) Date: 2018-05-16 06:49
Of course, `yield from done` would actually have to be `for future in done: yield future`, since async generators don't support yield from.
msg316963 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-17 16:43
I like the idea. Let's revisit it after Python 3.7 is released.
msg317247 - (view) Author: Hrvoje Nikšić (hniksic) Date: 2018-05-21 18:34
Another option occurred to me: as_completed could return an object that implements both synchronous and asynchronous iteration protocol:

class as_completed:
    def __init__(fs, *, loop=None, timeout=None):
        self.__fs = fs
        self.__loop = loop
        self.__timeout = timeout

    def __iter__(self):
        # current implementation here

    async def __aiter__(self):
        # new async implementation here

    def __next__(self):
        # defined for backward compatibility with code that expects
        # as_completed() to return an iterator rather than an iterable
        if self._iter is None:
            self._iter = iter(self)
        return next(self._iter)

With that design there wouldn't need to be a new function under a different name; instead, as_completed could just be documented as an asynchronous iterable, with the old synchronous iteration supported for backward compatibility.
msg326748 - (view) Author: Hrvoje Nikšić (hniksic) Date: 2018-09-30 21:13
If there is interest in this, I'd like to attempt a PR for a sync/async variant of as_completed.

Note that the new docs are *much* clearer, so the first (documentation) problem from the description is now fixed. Although the documentation is still brief, it now contains the key pieces of information: 1) that the futures are actually run in parallel, and 2) that each yielded future produces the next result that becomes available. Neither was actually stated in the old docs (!), so this is a marked improvement.
msg328454 - (view) Author: Michael DePalatis (mivade) * Date: 2018-10-25 15:50
Is there any progress on this? I was thinking the exact same thing regarding the backwards-compatible approach and would like to work on it if no one else is.
msg328527 - (view) Author: Hrvoje Nikšić (hniksic) Date: 2018-10-26 08:57
I didn't start working on the PR, so please go ahead if you're interested.

One small suggestion: If you're implementing this, please note that the proof-of-concept implementation shown in the description is not very efficient because each call to `wait` has to iterate over all the futures (which can be potentially large in number) to set up and tear down the done callbacks on each one. A more efficient implementation would set up the callbacks only once - see for an example.
Date User Action Args
2018-10-31 02:17:01mivadesetkeywords: + patch
stage: patch review
pull_requests: + pull_request9565
2018-10-26 08:57:34hniksicsetmessages: + msg328527
2018-10-25 15:50:20mivadesetnosy: + mivade
messages: + msg328454
2018-09-30 21:13:35hniksicsetmessages: + msg326748
2018-05-21 18:34:42hniksicsetmessages: + msg317247
2018-05-17 16:43:34yselivanovsetmessages: + msg316963
2018-05-16 19:26:33hniksicsettype: enhancement
2018-05-16 06:49:56hniksicsetmessages: + msg316774
2018-05-16 06:48:52hniksiccreate