msg291261 - (view) |
Author: Giampaolo Rodola' (giampaolo.rodola) * |
Date: 2017-04-07 10:40 |
Patch in attachment modifies DefaultSelector.modify() so that it uses the underlying selector's modify() method instead of unregister() and register() resulting in a 2x speedup.
Without patch:
~/svn/cpython {master}$ ./python bench.py
0.006010770797729492
With patch:
~/svn/cpython {master}$ ./python bench.py
0.00330352783203125
|
msg291264 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2017-04-07 12:02 |
Hm, do you have a realistic benchmark which would show the benefit?
Because this is really a micro-benchmark, and I'm not convinced that
Selector.modify() is a significant bottleneck in a real-world
application.
|
msg291265 - (view) |
Author: Giampaolo Rodola' (giampaolo.rodola) * |
Date: 2017-04-07 12:19 |
modify() can be used often in verbose protocols such as FTP and SMTP where a lot of requests and responses are continuously exchanged between client and server, so you need to switch from EVENT_READ to EVENT_WRITE all the time. I did a similar change some years ago in pyftpdlib but I don't have another benchmark other than this one.
|
msg291266 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-04-07 12:24 |
Hi Giampaolo Rodola'! It seems like you proposed the same idea 4 years ago and I wrote a similar patch: issue #18932 :-)
I suggest you to use my perf module to produce more reliable benchmarks. Here is my results on my computer smithers tuned for benchmarks:
haypo@smithers$ ./python bench_selectors.py -o ref.json
[apply the patch]
haypo@smithers$ ./python bench_selectors.py -o patch.json
haypo@smithers$ ./python -m perf compare_to ref.json patch.json --table
+----------------------+---------+------------------------------+
| Benchmark | ref | patch |
+======================+=========+==============================+
| PollSelector.modify | 11.3 us | 8.22 us: 1.37x faster (-27%) |
+----------------------+---------+------------------------------+
| EpollSelector.modify | 13.5 us | 8.88 us: 1.52x faster (-34%) |
+----------------------+---------+------------------------------+
Not significant (1): SelectSelector.modify
@neologix: "Hm, do you have a realistic benchmark which would show the benefit?"
I don't think that selector.modify() can be a bottleneck, but IMHO the change is simple and safe enough to be worth it. In a network server with 10k client, an optimization making .modify() 1.52x faster is welcomed.
|
msg291267 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-04-07 12:28 |
@Giampaolo Rodola: CPython development moved to GitHub, can you please create a pull request instead of a patch? Thank you.
https://docs.python.org/devguide/pullrequest.html
Hum, I see that my old patch of issue #18932 (selectors_optimize_modify-2.patch) is different. I tried to factorize code. What do you think of my change?
|
msg291268 - (view) |
Author: Giampaolo Rodola' (giampaolo.rodola) * |
Date: 2017-04-07 12:40 |
Hey Stinner, thanks for chiming in! Your old patch uses unregister() and register() so I'm not sure what speedup that'll take as the whole point is to avoid doing that. You may want to rewrite it and benchmark it but it looks like it'll be slower.
|
msg291269 - (view) |
Author: Giampaolo Rodola' (giampaolo.rodola) * |
Date: 2017-04-07 12:49 |
PR: https://github.com/python/cpython/pull/1030
|
msg291270 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-04-07 12:58 |
@neologix: Do you have a GitHub account? It's hard to me to see if https://github.com/neologix is you or not, and your GitHub username is not filled in the Bug Tracker database.
|
msg291271 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-04-07 13:03 |
Giampaolo Rodola': "Your old patch uses unregister() and register() so I'm not sure what speedup that'll take as the whole point is to avoid doing that."
My patch calls generic unregister() and register() of the base classes, but it only uses one syscall per modify() call.
Oh by the way, it seems like my patch changes KqueueSelector, but your change doesn't. Am I right? KqueueSelector is the default selector on FreeBSD and macOS. IMHO it's worth it to optimize it as well.
|
msg291272 - (view) |
Author: Giampaolo Rodola' (giampaolo.rodola) * |
Date: 2017-04-07 13:11 |
> My patch calls generic unregister() and register() of the base
> classes, but it only uses one syscall per modify() call.
Doesn't that mean doing 3 operations (unregister(), register(), modify()) instead of the current 2 (unregister(), register())? I don't see how it can be faster than a single modify() syscall.
> Oh by the way, it seems like my patch changes KqueueSelector,
> but your change doesn't. Am I right?
You are right but it looks like you end up doing the same thing as unregister() and register(). kqueue() has no modify() method so I don't think it can benefit from this change.
|
msg291273 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-04-07 13:24 |
> Doesn't that mean doing 3 operations (unregister(), register(), modify()) instead of the current 2 (unregister(), register())? I don't see how it can be faster than a single modify() syscall.
The idea is to reuse _BaseSelectorImpl.register() and _BaseSelectorImpl.unregister() to factorize the code. These methods don't use syscall, they create the SelectorKey object and update _fd_to_key. So each class doesn't have to redo these things.
I don't insist to redo what I did, I'm just trying to explain my change because your change basically copy/paste the same code 3 times, and you forgot KqueueSelector, so you even may have to copy it a 4th time ;-)
|
msg291274 - (view) |
Author: Giampaolo Rodola' (giampaolo.rodola) * |
Date: 2017-04-07 13:39 |
> The idea is to reuse _BaseSelectorImpl.register() and
> _BaseSelectorImpl.unregister() to factorize the code.
You can't factorize the logic of modify() into those as they do two different things. I also don't like repeating the same thing 3 times but given how the module is organized I'm not sure how to do that as I need to pass 3 things around: the low-level selector (epoll, poll, whatever) and the read and write constants (POLLIN, EPOLLIN) which change depending on the selector being used.
The same thing applies to the devpoll class (http://bugs.python.org/issue18931).
I can write a second patch which to refactor the whole module if that is desirable but I prefer to do that in another PR.
|
msg291280 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2017-04-07 17:34 |
> I don't think that selector.modify() can be a bottleneck, but IMHO the change is simple and safe enough to be worth it. In a network server with 10k client, an optimization making .modify() 1.52x faster is welcomed.
IMHO it complicates the code for little benefit: that's why I asked
for a realistic benchmark. This patch could made modify() 10x faster,
if modify() only accounts for 1% of the overall overhead in a
realistic use-case, then it's not worth it.
|
msg291282 - (view) |
Author: Giampaolo Rodola' (giampaolo.rodola) * |
Date: 2017-04-07 18:09 |
In certain protocols modify() is supposed to be used on every interaction between client and server. E.g. an FTP server does this:
- register(fd, EVENT_READ); recv() # wait command from client
- modify(fd, EVENT_WRITE); send(response) # send response
- modify(fd, EVENT_READ); recv() # wait for new command
...so it's two calls for each command received.
In asyncio modify() is also used in different circumstances.
If you're worried about code duplication I can refactor selectors.py first, but IMO it would be a bad idea to reject this or future improvements because the current code status prevents code reuse. Right now there are already 3 classes sharing basically the same code (poll, epoll and devpoll related classes). Those can be refactored similarly to this:
https://github.com/giampaolo/pyftpdlib/blob/ab699b5f89223e03593f3e004d6a370b4c2e5308/pyftpdlib/ioloop.py#L465-L565
|
msg291286 - (view) |
Author: Giampaolo Rodola' (giampaolo.rodola) * |
Date: 2017-04-07 19:25 |
@neologix: here's a PR which refactors the poll-related classes: https://github.com/python/cpython/pull/1035/files
From there, we'll be able to avoid modify() code duplication.
|
msg291306 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2017-04-07 22:41 |
This refactoring was already suggested a long time ago, and at the
time both Guido and I didn't like it because it makes the code
actually more complicated: DRY in this case doesn't apply IMO.
Also, this whole thread is a repeat of:
http://bugs.python.org/issue18932
At the time, I already asked for one realistic use case demonstrating
the benefit of implementing modify() natively instead of
unregister()+register().
I know that this code makes modify() faster, but as I said multiple
times, I'd like to se the impact on something else than a
micro-benchmark: arguably if you propose this it's because you've
profiled/found it to be an actual bottleneck, do you have *one*
benchmark to support this change?
|
msg291340 - (view) |
Author: Giampaolo Rodola' (giampaolo.rodola) * |
Date: 2017-04-08 18:31 |
For the sake of experiment I'm attaching a toy echo server which uses modify() to switch between EVENT_READ and EVENT_WRITE. Without patch I get 35000 req/sec, with patch around 39000 req/sec (11.4% faster).
To be entirely honest a smarter echo server would switch between EVENT_READ / EVENT_WRITE only in case of BlockingIOError on send() or recv(), but unfortunately it's a condition which (for send()) does not occur often by testing on localhost (for recv() it naturally occurs +27000 times out of 39000). On the other hand, by experimenting with a real network it occurs quite soon:
import socket
sock = socket.socket()
sock.connect(('google.com', 80))
sock.setblocking(False)
print(sock.send(b'x' * 50000))
print(sock.send(b'x' * 50000)) # raise BlockingIOError
So basically my benchmark is emulating a worst case scenario in which send() always blocks on the first call and succeed on the next one, in order to mimic recv() which blocks half of the times.
The whole point is that the speedup entirely depends on how many times send() or recv() will block in a real world app connected to the internet (because that's when modify() is supposed to be used) but that's hard to determine, especially under heavy server loads which is hard to emulate. This also involves the SSL handshake when it fails with WANT_READ / WANT_WRITE, which is another circumstance where modify() is going to be used and for which I have no benchmarks.
I personally don't mind about selectors module per-se, but given that it's the core of important libs such as asyncio and curio I would like to hear a better rationale than "let's reject this 1.5x speedup because DRY does not apply in this case".
Also please consider that Tornado, Twisted and gevent use native modify() method.
|
msg292586 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2017-04-29 09:24 |
The rationale for rejecting wouldn't be "DRY does not apply in this
case", it would be that this makes the code more complicated, and that
a negligible speedup would not be worth it.
Now, thanks to your benchmark, a 10% speedup is not negligible, so
that seems like a reasonable change.
I'll have a look at your refactoring code, and once pushed, we can
optimize modify().
|
msg292602 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-04-29 22:23 |
I also like the plan starting with refactoring ;-)
|
msg295338 - (view) |
Author: Giampaolo Rodola' (giampaolo.rodola) * |
Date: 2017-06-07 14:10 |
OK, https://github.com/python/cpython/pull/1030 should be good to go.
|
msg295564 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-06-09 20:20 |
New changeset fbfaa6fd57f8dc8a3da808acb8422370fad2f27b by Victor Stinner (Giampaolo Rodola) in branch 'master':
bpo-30014: make poll-like selector's modify() method faster (#1030)
https://github.com/python/cpython/commit/fbfaa6fd57f8dc8a3da808acb8422370fad2f27b
|
msg308807 - (view) |
Author: Andrew Svetlov (asvetlov) * |
Date: 2017-12-20 20:36 |
PR was merged 6 months ago, closing the issue.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:45 | admin | set | github: 74200 |
2017-12-20 20:36:49 | asvetlov | set | status: open -> closed
nosy:
+ asvetlov messages:
+ msg308807
resolution: fixed stage: patch review -> resolved |
2017-06-09 20:20:43 | vstinner | set | messages:
+ msg295564 |
2017-06-07 14:10:40 | giampaolo.rodola | set | messages:
+ msg295338 |
2017-04-29 22:23:24 | vstinner | set | messages:
+ msg292602 |
2017-04-29 09:24:18 | neologix | set | messages:
+ msg292586 |
2017-04-08 18:31:53 | giampaolo.rodola | set | files:
+ echo_client.py |
2017-04-08 18:31:40 | giampaolo.rodola | set | files:
+ echo_server.py
messages:
+ msg291340 |
2017-04-07 22:41:26 | neologix | set | messages:
+ msg291306 |
2017-04-07 19:25:28 | giampaolo.rodola | set | messages:
+ msg291286 |
2017-04-07 18:09:29 | giampaolo.rodola | set | messages:
+ msg291282 |
2017-04-07 17:34:58 | neologix | set | messages:
+ msg291280 title: Speedup DefaultSelectors.modify() by 1.5x -> Speedup DefaultSelectors.modify() by 2x |
2017-04-07 14:31:15 | gvanrossum | set | nosy:
- gvanrossum
|
2017-04-07 13:39:46 | giampaolo.rodola | set | messages:
+ msg291274 |
2017-04-07 13:24:04 | vstinner | set | messages:
+ msg291273 |
2017-04-07 13:11:44 | giampaolo.rodola | set | messages:
+ msg291272 |
2017-04-07 13:03:27 | vstinner | set | title: Speedup DefaultSelectors.modify() by 2x -> Speedup DefaultSelectors.modify() by 1.5x |
2017-04-07 13:03:12 | vstinner | set | messages:
+ msg291271 |
2017-04-07 12:58:56 | vstinner | set | messages:
+ msg291270 |
2017-04-07 12:49:22 | giampaolo.rodola | set | messages:
+ msg291269 pull_requests:
+ pull_request1187 |
2017-04-07 12:40:38 | giampaolo.rodola | set | messages:
+ msg291268 |
2017-04-07 12:28:21 | vstinner | set | messages:
+ msg291267 |
2017-04-07 12:24:37 | vstinner | set | files:
+ bench_selectors_modify.py
messages:
+ msg291266 |
2017-04-07 12:19:16 | giampaolo.rodola | set | messages:
+ msg291265 |
2017-04-07 12:02:25 | neologix | set | messages:
+ msg291264 |
2017-04-07 11:01:15 | giampaolo.rodola | set | type: performance components:
+ Library (Lib), asyncio stage: patch review |
2017-04-07 10:41:47 | giampaolo.rodola | set | nosy:
+ gvanrossum, vstinner, neologix, yselivanov
|
2017-04-07 10:40:26 | giampaolo.rodola | set | files:
+ bench.py versions:
+ Python 3.7 |
2017-04-07 10:40:07 | giampaolo.rodola | create | |