Issue 32204: async/await performance is very low

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/76385

classification

Title:	async/await performance is very low
Type:	performance	Stage:	resolved
Components:	asyncio	Versions:	Python 3.7

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	Liran Nuna, serhiy.storchaka, yselivanov
Priority:	normal	Keywords:

Created on 2017-12-03 09:09 by Liran Nuna, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
batch_asyncio.py	Liran Nuna, 2017-12-03 09:09	Benchmark in async/await
batch_asynq.py	Liran Nuna, 2017-12-03 09:09	Benchmark in asynq

Pull Requests
URL	Status	Linked	Edit
PR 4186	closed	Liran Nuna, 2017-12-03 09:09

Messages (15)
msg307492 - (view)	Author: Liran Nuna (Liran Nuna) *	Date: 2017-12-03 09:09
The performance of async/await is very low when compared to similar code that implements similar functionality via iterators, such as Quora's asynq library (https://github.com/quora/asynq/tree/master/asynq). Based on my benchmarks, asynq is almost twice as fast as async/await. I have found some performance hanging fruit when benchmarking (See attached GitHub PR). $ time python batch_asyncio.py real 0m5.851s user 0m5.760s sys 0m0.088s $ time python batch_asynq.py real 0m2.999s user 0m2.900s sys 0m0.076s
msg307494 - (view)	Author: Liran Nuna (Liran Nuna) *	Date: 2017-12-03 09:09
Added comparable benchmark in asynq
msg307496 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-12-03 09:50
I don't see any difference after applying PR 4186.
msg307498 - (view)	Author: Liran Nuna (Liran Nuna) *	Date: 2017-12-03 10:11
The PR is merely a finding I ran into while benchmarking. In my executions I saw a consistent ~3-5% performance increase. At my company we are attempting to migrate from asynq to asyncio but performance suffers (our servers response times nearly double). I personally ran profiles and benchmarks and noticed this tiny bottleneck which I knew how to fix, so I submitted a PR to the python project in hopes it helps.
msg307501 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-12-03 10:21
~3-5% difference is a random variance. Add 1/0 in this method and repeat the benchmark.
msg307517 - (view)	Author: Yury Selivanov (yselivanov) *	Date: 2017-12-03 15:18
In general, implementing coroutines using 'yield' expressions (not 'yield from'!) is slower than async/await, because the former approach needs a trampoline to manage the stack, whereas CPython itself handles that for 'yield from' and 'await'. I suspect that any difference in performance is not related to 'async/await' vs 'yield' performance. The attached benchmarks compare two completely different frameworks: asyncio and asynq. Both have different implementations of Task and Future and event loops primitives. Perhaps both of them schedule IO events and callbacks differently as well. asyncio could be slower because all tasks' callbacks must be scheduled through the event loop, whereas some frameworks like Twisted schedule them right away, which makes them faster in some specific micro-benchmarks. Or there might be an issue with 'asyncio.gather()', which is stressed heavily in the attached benchmarks. What I can say for sure, is that Python implementation of async/await has nothing to do with the results you observe. I suggest to take a look at 'asyncio.gather', maybe we can make it faster. Please open a new issue if you find any ways to make it faster.
msg307626 - (view)	Author: Liran Nuna (Liran Nuna) *	Date: 2017-12-05 03:12
> which makes them faster in some specific micro-benchmarks I'm not talking about micro-benchmarks, we are actively using asynq in production. Recent efforts to migrate to async/await have hit a major performance hit - our response times nearly doubled. I agree that the PR I offers little (or no) improvement but I implore you to explore performance bottlenecks in async/await.
msg307631 - (view)	Author: Yury Selivanov (yselivanov) *	Date: 2017-12-05 03:47
> I agree that the PR I offers little (or no) improvement but I implore you to explore performance bottlenecks in async/await. And I'm saying that there are no "performance bottlenecks in async/await". async/await is not asyncio. async/await and yield are language constructs that use generator objects. Your benchmark does not test async/await vs yield. It compares asyncio.gather to batches in asynq. Now, maybe asyncio.gather can be optimized, but we should open a separate issue for that if we can have a better implementation of it. Your benchmark doesn't test the performance of IO -- that's the thing we actually optimize in asyncio and that we usually benchmark. asyncio.gather is a niche thing, and usually network applications don't have it as a bottleneck. Closing this issue.
msg307632 - (view)	Author: Yury Selivanov (yselivanov) *	Date: 2017-12-05 03:53
Also I suggest you to try uvloop. If your production code is still slower that with asynq, I suggest you to profile it and post real profiles here. I just don't think that asyncio.gather can be the main bottleneck you have, it must be something else.
msg307640 - (view)	Author: Liran Nuna (Liran Nuna) *	Date: 2017-12-05 04:26
> Also I suggest you to try uvloop Sadly, uvloop does not offer any noticeable improvement in performance for us. Our usage is very similar to the "benchmarks" I posted - we don't do any actual async I/O because asynq does not offer that. > I suggest you to profile it and post real profiles here Gladly! Would you like profiles of python within itself (as in, made with `cProfile`) or gmon.out profiles? the latter would be a little more difficult to run since we run a web server which needs to accept traffic, but I do have plenty of `cProfile` profiles I could share with you. > I just don't think that asyncio.gather can be the main bottleneck you have, it must be something else I think my PR and the examples I have provided set a different mindset to this issue - the issue in question is that the "sync" performance of async/await is very poor when used to execute things in "sync". The benchmarks provided are an example of what a request life time looks like - there's a lot of scatter-gather to batch database queries that happen in sync (for the time being) which in the benchmark being simulated as simple math operations. To reiterate, I do not think `asyncio.gather` is the main performance bottleneck. I do not know how to better identify it with my limited knowledge about cpython.
msg308504 - (view)	Author: Yury Selivanov (yselivanov) *	Date: 2017-12-18 01:23
Liran, Task (C implementation) was optimized in issue 32348; your benchmark now runs 15% faster.
msg308513 - (view)	Author: Liran Nuna (Liran Nuna) *	Date: 2017-12-18 04:29
Yury, thank you very much for circling back. I wish I could be more helpful in pursuing performance improvements.
msg308514 - (view)	Author: Yury Selivanov (yselivanov) *	Date: 2017-12-18 04:33
NP. I have another PR in the pipeline: https://github.com/python/cpython/pull/4913 Both optimizations make your benchmark run 30% faster on 3.7. If you compile asyncio.gather() with Cython you will get it another 5-15% faster. If you use uvloop - another 10-20%. If it's still slower than asynq, then the issue must be in how asynq schedules its callbacks, it might be more optimal for some specific use cases than asyncio. FWIW I don't expect asynq to be any faster than asyncio (or than uvloop) for network IO. And there's definitely no problem with async/await performance -- we're optimizing asyncio here, not the interpreter.
msg308704 - (view)	Author: Liran Nuna (Liran Nuna) *	Date: 2017-12-20 05:34
Yuri, Those speed improvements are awesome and I'm really excited about them, performance is slowly starting to match asynq and would make us migrating our code to async/await more feasable! Today, python3.6.4 was released and these performance improvements did not make it in this version. I'm not familiar with python's release processes , what are the steps or time line for these to be backported to 3.6 and released on next minor version as to avoid a lengthy wait for python3.7?
msg308735 - (view)	Author: Yury Selivanov (yselivanov) *	Date: 2017-12-20 14:47
Unfortunately they will not be backported, that's against our release policies, and I can't do anything about it. You can backport them yourself and build your own CPython 3.6. That's what bigger users (e.g. Facebook/Google) of Python usually do.

History
Date	User	Action	Args
2022-04-11 14:58:55	admin	set	github: 76385
2017-12-20 14:47:52	yselivanov	set	messages: + msg308735
2017-12-20 05:34:05	Liran Nuna	set	messages: + msg308704
2017-12-18 04:33:05	yselivanov	set	messages: + msg308514
2017-12-18 04:29:16	Liran Nuna	set	messages: + msg308513
2017-12-18 01:23:47	yselivanov	set	messages: + msg308504
2017-12-05 04:26:31	Liran Nuna	set	messages: + msg307640
2017-12-05 03:53:51	yselivanov	set	messages: + msg307632
2017-12-05 03:47:50	yselivanov	set	status: open -> closed resolution: not a bug messages: + msg307631 stage: resolved
2017-12-05 03:12:27	Liran Nuna	set	messages: + msg307626
2017-12-03 15:18:02	yselivanov	set	messages: + msg307517
2017-12-03 10:21:41	serhiy.storchaka	set	messages: + msg307501
2017-12-03 10:11:03	Liran Nuna	set	messages: + msg307498
2017-12-03 09:50:24	serhiy.storchaka	set	versions: + Python 3.7, - Python 3.6
2017-12-03 09:50:07	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg307496
2017-12-03 09:09:48	Liran Nuna	set	files: + batch_asynq.py messages: + msg307494
2017-12-03 09:09:09	Liran Nuna	create