Title: Add 1.32x faster C implementation of asyncio.isfuture().
Type: performance Stage: patch review
Components: asyncio Versions: Python 3.8
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: asvetlov, jimmylai, pitrou, vstinner, yselivanov
Priority: normal Keywords: patch

Created on 2018-05-15 15:31 by jimmylai, last changed 2018-05-26 00:14 by yselivanov.

File name Uploaded Description Edit jimmylai, 2018-05-15 20:02 vstinner, 2018-05-18 23:04
Pull Requests
URL Status Linked Edit
PR 6876 open jimmylai, 2018-05-15 21:20
Messages (12)
msg316670 - (view) Author: Jimmy Lai (jimmylai) * Date: 2018-05-15 15:31
asyncio.isfuture called whenever ensure_future is called. Providing C implementation to make it fast.
msg316712 - (view) Author: Jimmy Lai (jimmylai) * Date: 2018-05-15 20:02
$./python.exe -o isfuture_optimized.json
$ ./python.exe -m perf compare_to isfuture_original.json isfuture_optimized.json
Mean +- std dev: [isfuture_original] 7.16 ms +- 0.23 ms -> [isfuture_optimized] 5.41 ms +- 0.25 ms: 1.32x faster (-24%)
msg317065 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-18 23:04
About the benchmark: you should not loop inside the function. I propose a variant of the benchmark:

Jimmy: Would you mind to run this variant on your PR?
msg317129 - (view) Author: Jimmy Lai (jimmylai) * Date: 2018-05-19 17:31
@vstinner Thanks for the new benchmark, it provides more detailed wins:
It's 1.64x faster for future object and 1.23x faster for non-future object.
$ ./python.exe -m perf compare_to isfuture_original_2.json isfuture_optimized_2.json
future: Mean +- std dev: [isfuture_original_2] 224 ns +- 8 ns -> [isfuture_optimized_2] 135 ns +- 2 ns: 1.66x faster (-40%)
task: Mean +- std dev: [isfuture_original_2] 224 ns +- 6 ns -> [isfuture_optimized_2] 137 ns +- 3 ns: 1.64x faster (-39%)
regular_func: Mean +- std dev: [isfuture_original_2] 443 ns +- 5 ns -> [isfuture_optimized_2] 361 ns +- 5 ns: 1.23x faster (-18%)
str: Mean +- std dev: [isfuture_original_2] 449 ns +- 15 ns -> [isfuture_optimized_2] 360 ns +- 12 ns: 1.25x faster (-20%)
msg317130 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-05-19 17:58
What is the impact on a regular application?  The fact that a micro-benchmark becomes X% faster isn't very interesting itself, especially as the original function, at less than 1┬Ás per call, is already very fast.
msg317131 - (view) Author: Jimmy Lai (jimmylai) * Date: 2018-05-19 18:16
@pitrou This change is part of optimization for asyncio.gather().
gather -> ensure_future -> isfuture/iscoroutine/isawaitable

We need C implementation for all those function to make gather really efficient for large scale application (e.g. Instagram)
Gather is really slow and cost ~2% CPU on our server.

The same optimization approach has been apply on other ciritcal asyncio modules, e.g. Future, get_event_loop, etc.
msg317132 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-05-19 18:17
Then it would be nice to post benchmarks using asyncio.gather() (on something not too trivial perhaps).
msg317146 - (view) Author: Jimmy Lai (jimmylai) * Date: 2018-05-20 00:32
@pitrou We'll measure the wins of gather when we implement it in C. Before that, we need to get all helpers ready in C.
msg317282 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-22 13:40
> @pitrou We'll measure the wins of gather when we implement it in C. Before that, we need to get all helpers ready in C.

You don't have to provide _asyncio.isfuture() (in C) to implement gather() in C.

If your goal is to optimize gather(), write a change to only implement gather() no?

What's the point of optimizing isfuture() is gather() is implemented in C and doesn't call the Python implementation anymore?

Implement isfuture() is C and expose it as _asyncio.isfuture() are two different things.
msg317693 - (view) Author: Jimmy Lai (jimmylai) * Date: 2018-05-25 17:42
In general, we would like to make all asyncio common functions efficient with C implementation because CPython asyncio overhead is very expensive.
In our application, overall it costs about 10% global CPU instructions after we used UVLoop (it's much worse when use default event loop).
gather() is only one of the high level function bottleneck. To make CPU overhead not a concern for asyncio user, we should make isfuture in C because it's called by many other event loop functions, e.g. in asyncio/, asyncio/, asyncio/
msg317721 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-26 00:10
I disagree with your rationale. At least, it is not how we decide to
optimize something or not in Python. Python is first made of people who
will have to maintain the code for the next 5 years if not longer. C code
is harder to maintain than Python code.

A significant speedup on a microbenchmark is nice, but asyncio still lacks
a more general benchmark suite...

I have no opinion on this specific optimisation.
msg317722 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-26 00:14
IMO this particular patch is OK and should go in.  I'll take another look at the PR in a few days (and will run some benchmarks myself before making the decision).
Date User Action Args
2018-05-26 00:14:20yselivanovsetmessages: + msg317722
2018-05-26 00:10:45vstinnersetmessages: + msg317721
2018-05-25 17:42:31jimmylaisetmessages: + msg317693
2018-05-22 13:40:29vstinnersetmessages: + msg317282
2018-05-20 00:32:02jimmylaisetmessages: + msg317146
2018-05-19 18:17:38pitrousetmessages: + msg317132
2018-05-19 18:16:08jimmylaisetmessages: + msg317131
2018-05-19 17:58:35pitrousetnosy: + pitrou
messages: + msg317130
2018-05-19 17:31:43jimmylaisetmessages: + msg317129
2018-05-18 23:04:29vstinnersetfiles: +

messages: + msg317065
2018-05-17 06:58:45vstinnersetnosy: + vstinner

versions: - Python 3.6, Python 3.7
2018-05-15 21:20:17jimmylaisetkeywords: + patch
stage: patch review
pull_requests: + pull_request6552
2018-05-15 20:02:21jimmylaisetfiles: +

messages: + msg316712
title: Optimize asyncio.isfuture by providing C implementation -> Add 1.32x faster C implementation of asyncio.isfuture().
2018-05-15 15:31:32jimmylaicreate