msg307492 - (view) |
Author: Liran Nuna (Liran Nuna) * |
Date: 2017-12-03 09:09 |
The performance of async/await is very low when compared to similar code that implements similar functionality via iterators, such as Quora's asynq library (https://github.com/quora/asynq/tree/master/asynq).
Based on my benchmarks, asynq is almost twice as fast as async/await.
I have found some performance hanging fruit when benchmarking (See attached GitHub PR).
$ time python batch_asyncio.py
real 0m5.851s
user 0m5.760s
sys 0m0.088s
$ time python batch_asynq.py
real 0m2.999s
user 0m2.900s
sys 0m0.076s
|
msg307494 - (view) |
Author: Liran Nuna (Liran Nuna) * |
Date: 2017-12-03 09:09 |
Added comparable benchmark in asynq
|
msg307496 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-12-03 09:50 |
I don't see any difference after applying PR 4186.
|
msg307498 - (view) |
Author: Liran Nuna (Liran Nuna) * |
Date: 2017-12-03 10:11 |
The PR is merely a finding I ran into while benchmarking. In my executions I saw a consistent ~3-5% performance increase.
At my company we are attempting to migrate from asynq to asyncio but performance suffers (our servers response times nearly double). I personally ran profiles and benchmarks and noticed this tiny bottleneck which I knew how to fix, so I submitted a PR to the python project in hopes it helps.
|
msg307501 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-12-03 10:21 |
~3-5% difference is a random variance.
Add 1/0 in this method and repeat the benchmark.
|
msg307517 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2017-12-03 15:18 |
In general, implementing coroutines using 'yield' expressions (not 'yield from'!) is slower than async/await, because the former approach needs a trampoline to manage the stack, whereas CPython itself handles that for 'yield from' and 'await'. I suspect that any difference in performance is not related to 'async/await' vs 'yield' performance.
The attached benchmarks compare two completely different frameworks: asyncio and asynq. Both have different implementations of Task and Future and event loops primitives. Perhaps both of them schedule IO events and callbacks differently as well.
asyncio could be slower because all tasks' callbacks must be scheduled through the event loop, whereas some frameworks like Twisted schedule them right away, which makes them faster in some specific micro-benchmarks. Or there might be an issue with 'asyncio.gather()', which is stressed heavily in the attached benchmarks.
What I can say for sure, is that Python implementation of async/await has nothing to do with the results you observe.
I suggest to take a look at 'asyncio.gather', maybe we can make it faster. Please open a new issue if you find any ways to make it faster.
|
msg307626 - (view) |
Author: Liran Nuna (Liran Nuna) * |
Date: 2017-12-05 03:12 |
> which makes them faster in some specific micro-benchmarks
I'm not talking about micro-benchmarks, we are actively using asynq in production. Recent efforts to migrate to async/await have hit a major performance hit - our response times nearly doubled.
I agree that the PR I offers little (or no) improvement but I implore you to explore performance bottlenecks in async/await.
|
msg307631 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2017-12-05 03:47 |
> I agree that the PR I offers little (or no) improvement but I implore you to explore performance bottlenecks in async/await.
And I'm saying that there are no "performance bottlenecks in async/await". async/await is *not* asyncio. async/await and yield are language constructs that use generator objects.
Your benchmark *does not* test async/await vs yield. It compares asyncio.gather to batches in asynq.
Now, maybe asyncio.gather can be optimized, but we should open a separate issue for that if we can have a better implementation of it.
Your benchmark doesn't test the performance of IO -- that's the thing we actually optimize in asyncio and that we usually benchmark. asyncio.gather is a niche thing, and usually network applications don't have it as a bottleneck.
Closing this issue.
|
msg307632 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2017-12-05 03:53 |
Also I suggest you to try uvloop. If your production code is still slower that with asynq, I suggest you to profile it and post real profiles here. I just don't think that asyncio.gather can be the main bottleneck you have, it must be something else.
|
msg307640 - (view) |
Author: Liran Nuna (Liran Nuna) * |
Date: 2017-12-05 04:26 |
> Also I suggest you to try uvloop
Sadly, uvloop does not offer any noticeable improvement in performance for us. Our usage is very similar to the "benchmarks" I posted - we don't do any actual async I/O because asynq does not offer that.
> I suggest you to profile it and post real profiles here
Gladly! Would you like profiles of python within itself (as in, made with `cProfile`) or gmon.out profiles? the latter would be a little more difficult to run since we run a web server which needs to accept traffic, but I do have plenty of `cProfile` profiles I could share with you.
> I just don't think that asyncio.gather can be the main bottleneck you have, it must be something else
I think my PR and the examples I have provided set a different mindset to this issue - the issue in question is that the "sync" performance of async/await is very poor when used to execute things in "sync". The benchmarks provided are an example of what a request life time looks like - there's a lot of scatter-gather to batch database queries that happen in sync (for the time being) which in the benchmark being simulated as simple math operations. To reiterate, I do not think `asyncio.gather` is the main performance bottleneck. I do not know how to better identify it with my limited knowledge about cpython.
|
msg308504 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2017-12-18 01:23 |
Liran, Task (C implementation) was optimized in issue 32348; your benchmark now runs 15% faster.
|
msg308513 - (view) |
Author: Liran Nuna (Liran Nuna) * |
Date: 2017-12-18 04:29 |
Yury, thank you very much for circling back. I wish I could be more helpful in pursuing performance improvements.
|
msg308514 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2017-12-18 04:33 |
NP. I have another PR in the pipeline: https://github.com/python/cpython/pull/4913
Both optimizations make your benchmark run 30% faster on 3.7. If you compile asyncio.gather() with Cython you will get it another 5-15% faster. If you use uvloop - another 10-20%.
If it's still slower than asynq, then the issue must be in how asynq schedules its callbacks, it might be more optimal for some specific use cases than asyncio.
FWIW I don't expect asynq to be any faster than asyncio (or than uvloop) for network IO. And there's definitely no problem with async/await performance -- we're optimizing asyncio here, not the interpreter.
|
msg308704 - (view) |
Author: Liran Nuna (Liran Nuna) * |
Date: 2017-12-20 05:34 |
Yuri, Those speed improvements are awesome and I'm really excited about them, performance is slowly starting to match asynq and would make us migrating our code to async/await more feasable!
Today, python3.6.4 was released and these performance improvements did not make it in this version.
I'm not familiar with python's release processes , what are the steps or time line for these to be backported to 3.6 and released on next minor version as to avoid a lengthy wait for python3.7?
|
msg308735 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2017-12-20 14:47 |
Unfortunately they will not be backported, that's against our release policies, and I can't do anything about it. You can backport them yourself and build your own CPython 3.6. That's what bigger users (e.g. Facebook/Google) of Python usually do.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:55 | admin | set | github: 76385 |
2017-12-20 14:47:52 | yselivanov | set | messages:
+ msg308735 |
2017-12-20 05:34:05 | Liran Nuna | set | messages:
+ msg308704 |
2017-12-18 04:33:05 | yselivanov | set | messages:
+ msg308514 |
2017-12-18 04:29:16 | Liran Nuna | set | messages:
+ msg308513 |
2017-12-18 01:23:47 | yselivanov | set | messages:
+ msg308504 |
2017-12-05 04:26:31 | Liran Nuna | set | messages:
+ msg307640 |
2017-12-05 03:53:51 | yselivanov | set | messages:
+ msg307632 |
2017-12-05 03:47:50 | yselivanov | set | status: open -> closed resolution: not a bug messages:
+ msg307631
stage: resolved |
2017-12-05 03:12:27 | Liran Nuna | set | messages:
+ msg307626 |
2017-12-03 15:18:02 | yselivanov | set | messages:
+ msg307517 |
2017-12-03 10:21:41 | serhiy.storchaka | set | messages:
+ msg307501 |
2017-12-03 10:11:03 | Liran Nuna | set | messages:
+ msg307498 |
2017-12-03 09:50:24 | serhiy.storchaka | set | versions:
+ Python 3.7, - Python 3.6 |
2017-12-03 09:50:07 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg307496
|
2017-12-03 09:09:48 | Liran Nuna | set | files:
+ batch_asynq.py
messages:
+ msg307494 |
2017-12-03 09:09:09 | Liran Nuna | create | |