Message296982
asyncio.as_completed allows us to provide lots of coroutines (or Futures) to schedule, and then deal with the results as soon as they are available, in a loop, or a streaming style.
I propose to allow as_completed to work on very large numbers of coroutines, provided through a generator (rather than a list). In order to make this practical, we need to limit the number of coroutines that are scheduled simultaneously to a reasonable number.
For tasks that open files or sockets, a reasonable number might be 1000 or fewer. For other tasks, a much larger number might be reasonable, but we would still like some limit to prevent us running out of memory.
I suggest adding a "limit" argument to as_completed that limits the number of coroutines that it schedules simultaneously.
For me, the key advantage of as_completed (in the proposed modified form) is that it enables a streaming style that looks quite like synchronous code, but is efficient in terms of memory usage (as you'd expect from a streaming style):
#!/usr/bin/env python3
import asyncio
import sys
limit = int(sys.argv[1])
async def double(x):
await asyncio.sleep(1)
return x * 2
async def print_doubles():
coros = (double(x) for x in range(1000000))
for res in asyncio.as_completed(coros, limit=limit):
r = await res
if r % 100000 == 0:
print(r)
loop = asyncio.get_event_loop()
loop.run_until_complete(print_doubles())
loop.close()
Using my prototype implementation, this runs faster and uses much less memory on my machine when you run it with a limit of 100K instead of 1 million concurrent tasks:
$ /usr/bin/time --format "Memory usage: %MKB\tTime: %e seconds" ./example 1000000
Memory usage: 2234552KB Time: 97.52 seconds
$ /usr/bin/time --format "Memory usage: %MKB\tTime: %e seconds" ./example 100000
Memory usage: 252732KB Time: 94.13 seconds
I have been working on an implementation and there is some discussion in my blog posts: http://www.artificialworlds.net/blog/2017/06/12/making-100-million-requests-with-python-aiohttp/ and http://www.artificialworlds.net/blog/2017/06/27/adding-a-concurrency-limit-to-pythons-asyncio-as_completed/
Possibly the most controversial thing about this proposal is the fact that we need to allow passing a generator to as_completed instead of enforcing that it be a list. This is fundamental to allowing the style I outlined above, but it's possible that we can do better than the blanket allowing of all generators that I did. |
|
Date |
User |
Action |
Args |
2017-06-27 01:05:22 | andybalaam | set | recipients:
+ andybalaam, yselivanov |
2017-06-27 01:05:22 | andybalaam | set | messageid: <1498525522.6.0.617023291854.issue30782@psf.upfronthosting.co.za> |
2017-06-27 01:05:22 | andybalaam | link | issue30782 messages |
2017-06-27 01:05:20 | andybalaam | create | |
|