classification
Title: Provide an async-generator version of as_completed
Type: enhancement Stage:
Components: asyncio Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: asvetlov, hniksic, yselivanov
Priority: normal Keywords:

Created on 2018-05-16 06:48 by hniksic, last changed 2018-09-30 21:13 by hniksic.

Messages (5)
msg316773 - (view) Author: Hrvoje Nikšić (hniksic) Date: 2018-05-16 06:48
Judging by questions on the StackOverflow python-asyncio tag[1][2], it seems that users find it hard to understand how to use as_completed correctly. I have identified three issues:

* It's somewhat sparingly documented.

A StackOverflow user ([2]) didn't find it obvious that it runs the futures in parallel. Unless one is already aware of the meaning, the term "as completed" could suggest that they are executed and completed sequentially.

* Unlike its concurrent.futures counter-part, it's non-blocking.

This sounds like a good idea because it's usable from synchronous code, but it means that the futures it yields aren't completed, you have to await them first. This is confusing for a function with "completed" in the name, and is not the case with concurrent.futures.as_completed, nor with other waiting functions in asyncio (gather, wait, wait_for).

* It yields futures other than those that were passed in.

This prevents some usual patterns from working, e.g. associating the results with context data, such as Python docs itself uses for concurrent.futures.as_completed in https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor-example .  See SO question [1] for a similar request in asyncio.


Here is my proposal to address the issues.

I believe the usage problems stem from as_completed predating the concept of async iterators. If we had async iterators back then, as_completed would have been an obvious candidate to be one. In that case it could be both "blocking" (but not for the event loop) and return the original futures. For example:

async def as_completed2(fs):
    pending = set(map(asyncio.ensure_future(fs)))
    while pending:
        done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
        yield from done

(It is straightforward to add support for a timeout argument.)

I propose to deprecate asyncio.as_completed and advertise the async-iterator version like the one presented here - under a nicer name, such as as_done(), or as_completed_async().



[1] https://stackoverflow.com/questions/50028465/python-get-reference-to-original-task-after-ordering-tasks-by-completion
[2] https://stackoverflow.com/questions/50355944/yield-from-async-generator-in-python-asyncio
msg316774 - (view) Author: Hrvoje Nikšić (hniksic) Date: 2018-05-16 06:49
Of course, `yield from done` would actually have to be `for future in done: yield future`, since async generators don't support yield from.
msg316963 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-17 16:43
I like the idea. Let's revisit it after Python 3.7 is released.
msg317247 - (view) Author: Hrvoje Nikšić (hniksic) Date: 2018-05-21 18:34
Another option occurred to me: as_completed could return an object that implements both synchronous and asynchronous iteration protocol:


class as_completed:
    def __init__(fs, *, loop=None, timeout=None):
        self.__fs = fs
        self.__loop = loop
        self.__timeout = timeout

    def __iter__(self):
        # current implementation here
        ...

    async def __aiter__(self):
        # new async implementation here
        ...

    def __next__(self):
        # defined for backward compatibility with code that expects
        # as_completed() to return an iterator rather than an iterable
        if self._iter is None:
            self._iter = iter(self)
        return next(self._iter)

With that design there wouldn't need to be a new function under a different name; instead, as_completed could just be documented as an asynchronous iterable, with the old synchronous iteration supported for backward compatibility.
msg326748 - (view) Author: Hrvoje Nikšić (hniksic) Date: 2018-09-30 21:13
If there is interest in this, I'd like to attempt a PR for a sync/async variant of as_completed.

Note that the new docs are *much* clearer, so the first (documentation) problem from the description is now fixed. Although the documentation is still brief, it now contains the key pieces of information: 1) that the futures are actually run in parallel, and 2) that each yielded future produces the next result that becomes available. Neither was actually stated in the old docs (!), so this is a marked improvement.
History
Date User Action Args
2018-09-30 21:13:35hniksicsetmessages: + msg326748
2018-05-21 18:34:42hniksicsetmessages: + msg317247
2018-05-17 16:43:34yselivanovsetmessages: + msg316963
2018-05-16 19:26:33hniksicsettype: enhancement
2018-05-16 06:49:56hniksicsetmessages: + msg316774
2018-05-16 06:48:52hniksiccreate