classification
Title: async/await performance is very low
Type: performance Stage: resolved
Components: asyncio Versions: Python 3.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Liran Nuna, serhiy.storchaka, yselivanov
Priority: normal Keywords:

Created on 2017-12-03 09:09 by Liran Nuna, last changed 2017-12-20 14:47 by yselivanov. This issue is now closed.

Files
File name Uploaded Description Edit
batch_asyncio.py Liran Nuna, 2017-12-03 09:09 Benchmark in async/await
batch_asynq.py Liran Nuna, 2017-12-03 09:09 Benchmark in asynq
Pull Requests
URL Status Linked Edit
PR 4186 closed Liran Nuna, 2017-12-03 09:09
Messages (15)
msg307492 - (view) Author: Liran Nuna (Liran Nuna) * Date: 2017-12-03 09:09
The performance of async/await is very low when compared to similar code that implements similar functionality via iterators, such as Quora's asynq library (https://github.com/quora/asynq/tree/master/asynq).

Based on my benchmarks, asynq is almost twice as fast as async/await. 

I have found some performance hanging fruit when benchmarking (See attached GitHub PR).


$ time python batch_asyncio.py 

real	0m5.851s
user	0m5.760s
sys	0m0.088s
$ time python batch_asynq.py 

real	0m2.999s
user	0m2.900s
sys	0m0.076s
msg307494 - (view) Author: Liran Nuna (Liran Nuna) * Date: 2017-12-03 09:09
Added comparable benchmark in asynq
msg307496 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-12-03 09:50
I don't see any difference after applying PR 4186.
msg307498 - (view) Author: Liran Nuna (Liran Nuna) * Date: 2017-12-03 10:11
The PR is merely a finding I ran into while benchmarking. In my executions I saw a consistent ~3-5% performance increase.

At my company we are attempting to migrate from asynq to asyncio but performance suffers (our servers response times nearly double). I personally ran profiles and benchmarks and noticed this tiny bottleneck which I knew how to fix, so I submitted a PR to the python project in hopes it helps.
msg307501 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-12-03 10:21
~3-5% difference is a random variance.

Add 1/0 in this method and repeat the benchmark.
msg307517 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-03 15:18
In general, implementing coroutines using 'yield' expressions (not 'yield from'!) is slower than async/await, because the former approach needs a trampoline to manage the stack, whereas CPython itself handles that for 'yield from' and 'await'.  I suspect that any difference in performance is not related to 'async/await' vs 'yield' performance.

The attached benchmarks compare two completely different frameworks: asyncio and asynq.  Both have different implementations of Task and Future and event loops primitives.  Perhaps both of them schedule IO events and callbacks differently as well.

asyncio could be slower because all tasks' callbacks must be scheduled through the event loop, whereas some frameworks like Twisted schedule them right away, which makes them faster in some specific micro-benchmarks.  Or there might be an issue with 'asyncio.gather()', which is stressed heavily in the attached benchmarks.

What I can say for sure, is that Python implementation of async/await has nothing to do with the results you observe.

I suggest to take a look at 'asyncio.gather', maybe we can make it faster.  Please open a new issue if you find any ways to make it faster.
msg307626 - (view) Author: Liran Nuna (Liran Nuna) * Date: 2017-12-05 03:12
> which makes them faster in some specific micro-benchmarks

I'm not talking about micro-benchmarks, we are actively using asynq in production. Recent efforts to migrate to async/await have hit a major  performance hit - our response times nearly doubled.

I agree that the PR I offers little (or no) improvement but I implore you to explore performance bottlenecks in async/await.
msg307631 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-05 03:47
> I agree that the PR I offers little (or no) improvement but I implore you to explore performance bottlenecks in async/await.

And I'm saying that there are no "performance bottlenecks in async/await".  async/await is *not* asyncio.  async/await and yield are language constructs that use generator objects.

Your benchmark *does not* test async/await vs yield.  It compares asyncio.gather to batches in asynq.

Now, maybe asyncio.gather can be optimized, but we should open a separate issue for that if we can have a better implementation of it.

Your benchmark doesn't test the performance of IO -- that's the thing we actually optimize in asyncio and that we usually benchmark.  asyncio.gather is a niche thing, and usually network applications don't have it as a bottleneck.

Closing this issue.
msg307632 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-05 03:53
Also I suggest you to try uvloop.  If your production code is still slower that with asynq, I suggest you to profile it and post real profiles here.  I just don't think that asyncio.gather can be the main bottleneck you have, it must be something else.
msg307640 - (view) Author: Liran Nuna (Liran Nuna) * Date: 2017-12-05 04:26
> Also I suggest you to try uvloop

Sadly, uvloop does not offer any noticeable improvement in performance for us. Our usage is very similar to the "benchmarks" I posted - we don't do any actual async I/O because asynq does not offer that. 

> I suggest you to profile it and post real profiles here

Gladly! Would you like profiles of python within itself (as in, made with `cProfile`) or gmon.out profiles? the latter would be a little more difficult to run since we run a web server which needs to accept traffic, but I do have plenty of `cProfile` profiles I could share with you.

> I just don't think that asyncio.gather can be the main bottleneck you have, it must be something else

I think my PR and the examples I have provided set a different mindset to this issue - the issue in question is that the "sync" performance of async/await is very poor when used to execute things in "sync". The benchmarks provided are an example of what a request life time looks like - there's a lot of scatter-gather to batch database queries that happen in sync (for the time being) which in the benchmark being simulated as simple math operations. To reiterate, I do not think `asyncio.gather` is the main performance bottleneck. I do not know how to better identify it with my limited knowledge about cpython.
msg308504 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-18 01:23
Liran,  Task (C implementation) was optimized in issue 32348; your benchmark now runs 15% faster.
msg308513 - (view) Author: Liran Nuna (Liran Nuna) * Date: 2017-12-18 04:29
Yury, thank you very much for circling back. I wish I could be more helpful in pursuing performance improvements.
msg308514 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-18 04:33
NP.  I have another PR in the pipeline: https://github.com/python/cpython/pull/4913

Both optimizations make your benchmark run 30% faster on 3.7.  If you compile asyncio.gather() with Cython you will get it another 5-15% faster.  If you use uvloop - another 10-20%.

If it's still slower than asynq, then the issue must be in how asynq schedules its callbacks, it might be more optimal for some specific use cases than asyncio.

FWIW I don't expect asynq to be any faster than asyncio (or than uvloop) for network IO.  And there's definitely no problem with async/await performance -- we're optimizing asyncio here, not the interpreter.
msg308704 - (view) Author: Liran Nuna (Liran Nuna) * Date: 2017-12-20 05:34
Yuri, Those speed improvements are awesome and I'm really excited about them, performance is slowly starting to match asynq and would make us migrating our code to async/await more feasable!

Today, python3.6.4 was released and these performance improvements did not make it in this version.

I'm not familiar with python's release processes , what are the steps or time line for these to be backported to 3.6 and released on next minor version as to avoid a lengthy wait for python3.7?
msg308735 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-20 14:47
Unfortunately they will not be backported, that's against our release policies, and I can't do anything about it.  You can backport them yourself and build your own CPython 3.6.  That's what bigger users (e.g. Facebook/Google) of Python usually do.
History
Date User Action Args
2017-12-20 14:47:52yselivanovsetmessages: + msg308735
2017-12-20 05:34:05Liran Nunasetmessages: + msg308704
2017-12-18 04:33:05yselivanovsetmessages: + msg308514
2017-12-18 04:29:16Liran Nunasetmessages: + msg308513
2017-12-18 01:23:47yselivanovsetmessages: + msg308504
2017-12-05 04:26:31Liran Nunasetmessages: + msg307640
2017-12-05 03:53:51yselivanovsetmessages: + msg307632
2017-12-05 03:47:50yselivanovsetstatus: open -> closed
resolution: not a bug
messages: + msg307631

stage: resolved
2017-12-05 03:12:27Liran Nunasetmessages: + msg307626
2017-12-03 15:18:02yselivanovsetmessages: + msg307517
2017-12-03 10:21:41serhiy.storchakasetmessages: + msg307501
2017-12-03 10:11:03Liran Nunasetmessages: + msg307498
2017-12-03 09:50:24serhiy.storchakasetversions: + Python 3.7, - Python 3.6
2017-12-03 09:50:07serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg307496
2017-12-03 09:09:48Liran Nunasetfiles: + batch_asynq.py

messages: + msg307494
2017-12-03 09:09:09Liran Nunacreate