Issue 30014: Speedup DefaultSelectors.modify() by 2x

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/74200

classification

Title:	Speedup DefaultSelectors.modify() by 2x
Type:	performance	Stage:	resolved
Components:	asyncio, Library (Lib)	Versions:	Python 3.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	asvetlov, giampaolo.rodola, neologix, vstinner, yselivanov
Priority:	normal	Keywords:	patch

Created on 2017-04-07 10:40 by giampaolo.rodola, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
selectors_modify.diff	giampaolo.rodola, 2017-04-07 10:40		review
bench.py	giampaolo.rodola, 2017-04-07 10:40
bench_selectors_modify.py	vstinner, 2017-04-07 12:24
echo_server.py	giampaolo.rodola, 2017-04-08 18:31
echo_client.py	giampaolo.rodola, 2017-04-08 18:31

Pull Requests
URL	Status	Linked	Edit
PR 1030	merged	giampaolo.rodola, 2017-04-07 12:49

Messages (22)
msg291261 - (view)	Author: Giampaolo Rodola' (giampaolo.rodola) *	Date: 2017-04-07 10:40
Patch in attachment modifies DefaultSelector.modify() so that it uses the underlying selector's modify() method instead of unregister() and register() resulting in a 2x speedup. Without patch: ~/svn/cpython {master}$ ./python bench.py 0.006010770797729492 With patch: ~/svn/cpython {master}$ ./python bench.py 0.00330352783203125
msg291264 - (view)	Author: Charles-François Natali (neologix) *	Date: 2017-04-07 12:02
Hm, do you have a realistic benchmark which would show the benefit? Because this is really a micro-benchmark, and I'm not convinced that Selector.modify() is a significant bottleneck in a real-world application.
msg291265 - (view)	Author: Giampaolo Rodola' (giampaolo.rodola) *	Date: 2017-04-07 12:19
modify() can be used often in verbose protocols such as FTP and SMTP where a lot of requests and responses are continuously exchanged between client and server, so you need to switch from EVENT_READ to EVENT_WRITE all the time. I did a similar change some years ago in pyftpdlib but I don't have another benchmark other than this one.
msg291266 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-04-07 12:24
Hi Giampaolo Rodola'! It seems like you proposed the same idea 4 years ago and I wrote a similar patch: issue #18932 :-) I suggest you to use my perf module to produce more reliable benchmarks. Here is my results on my computer smithers tuned for benchmarks: haypo@smithers$ ./python bench_selectors.py -o ref.json [apply the patch] haypo@smithers$ ./python bench_selectors.py -o patch.json haypo@smithers$ ./python -m perf compare_to ref.json patch.json --table +----------------------+---------+------------------------------+ \| Benchmark \| ref \| patch \| +======================+=========+==============================+ \| PollSelector.modify \| 11.3 us \| 8.22 us: 1.37x faster (-27%) \| +----------------------+---------+------------------------------+ \| EpollSelector.modify \| 13.5 us \| 8.88 us: 1.52x faster (-34%) \| +----------------------+---------+------------------------------+ Not significant (1): SelectSelector.modify @neologix: "Hm, do you have a realistic benchmark which would show the benefit?" I don't think that selector.modify() can be a bottleneck, but IMHO the change is simple and safe enough to be worth it. In a network server with 10k client, an optimization making .modify() 1.52x faster is welcomed.
msg291267 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-04-07 12:28
@Giampaolo Rodola: CPython development moved to GitHub, can you please create a pull request instead of a patch? Thank you. https://docs.python.org/devguide/pullrequest.html Hum, I see that my old patch of issue #18932 (selectors_optimize_modify-2.patch) is different. I tried to factorize code. What do you think of my change?
msg291268 - (view)	Author: Giampaolo Rodola' (giampaolo.rodola) *	Date: 2017-04-07 12:40
Hey Stinner, thanks for chiming in! Your old patch uses unregister() and register() so I'm not sure what speedup that'll take as the whole point is to avoid doing that. You may want to rewrite it and benchmark it but it looks like it'll be slower.
msg291269 - (view)	Author: Giampaolo Rodola' (giampaolo.rodola) *	Date: 2017-04-07 12:49
PR: https://github.com/python/cpython/pull/1030
msg291270 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-04-07 12:58
@neologix: Do you have a GitHub account? It's hard to me to see if https://github.com/neologix is you or not, and your GitHub username is not filled in the Bug Tracker database.
msg291271 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-04-07 13:03
Giampaolo Rodola': "Your old patch uses unregister() and register() so I'm not sure what speedup that'll take as the whole point is to avoid doing that." My patch calls generic unregister() and register() of the base classes, but it only uses one syscall per modify() call. Oh by the way, it seems like my patch changes KqueueSelector, but your change doesn't. Am I right? KqueueSelector is the default selector on FreeBSD and macOS. IMHO it's worth it to optimize it as well.
msg291272 - (view)	Author: Giampaolo Rodola' (giampaolo.rodola) *	Date: 2017-04-07 13:11
> My patch calls generic unregister() and register() of the base > classes, but it only uses one syscall per modify() call. Doesn't that mean doing 3 operations (unregister(), register(), modify()) instead of the current 2 (unregister(), register())? I don't see how it can be faster than a single modify() syscall. > Oh by the way, it seems like my patch changes KqueueSelector, > but your change doesn't. Am I right? You are right but it looks like you end up doing the same thing as unregister() and register(). kqueue() has no modify() method so I don't think it can benefit from this change.
msg291273 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-04-07 13:24
> Doesn't that mean doing 3 operations (unregister(), register(), modify()) instead of the current 2 (unregister(), register())? I don't see how it can be faster than a single modify() syscall. The idea is to reuse _BaseSelectorImpl.register() and _BaseSelectorImpl.unregister() to factorize the code. These methods don't use syscall, they create the SelectorKey object and update _fd_to_key. So each class doesn't have to redo these things. I don't insist to redo what I did, I'm just trying to explain my change because your change basically copy/paste the same code 3 times, and you forgot KqueueSelector, so you even may have to copy it a 4th time ;-)
msg291274 - (view)	Author: Giampaolo Rodola' (giampaolo.rodola) *	Date: 2017-04-07 13:39
> The idea is to reuse _BaseSelectorImpl.register() and > _BaseSelectorImpl.unregister() to factorize the code. You can't factorize the logic of modify() into those as they do two different things. I also don't like repeating the same thing 3 times but given how the module is organized I'm not sure how to do that as I need to pass 3 things around: the low-level selector (epoll, poll, whatever) and the read and write constants (POLLIN, EPOLLIN) which change depending on the selector being used. The same thing applies to the devpoll class (http://bugs.python.org/issue18931). I can write a second patch which to refactor the whole module if that is desirable but I prefer to do that in another PR.
msg291280 - (view)	Author: Charles-François Natali (neologix) *	Date: 2017-04-07 17:34
> I don't think that selector.modify() can be a bottleneck, but IMHO the change is simple and safe enough to be worth it. In a network server with 10k client, an optimization making .modify() 1.52x faster is welcomed. IMHO it complicates the code for little benefit: that's why I asked for a realistic benchmark. This patch could made modify() 10x faster, if modify() only accounts for 1% of the overall overhead in a realistic use-case, then it's not worth it.
msg291282 - (view)	Author: Giampaolo Rodola' (giampaolo.rodola) *	Date: 2017-04-07 18:09
In certain protocols modify() is supposed to be used on every interaction between client and server. E.g. an FTP server does this: - register(fd, EVENT_READ); recv() # wait command from client - modify(fd, EVENT_WRITE); send(response) # send response - modify(fd, EVENT_READ); recv() # wait for new command ...so it's two calls for each command received. In asyncio modify() is also used in different circumstances. If you're worried about code duplication I can refactor selectors.py first, but IMO it would be a bad idea to reject this or future improvements because the current code status prevents code reuse. Right now there are already 3 classes sharing basically the same code (poll, epoll and devpoll related classes). Those can be refactored similarly to this: https://github.com/giampaolo/pyftpdlib/blob/ab699b5f89223e03593f3e004d6a370b4c2e5308/pyftpdlib/ioloop.py#L465-L565
msg291286 - (view)	Author: Giampaolo Rodola' (giampaolo.rodola) *	Date: 2017-04-07 19:25
@neologix: here's a PR which refactors the poll-related classes: https://github.com/python/cpython/pull/1035/files From there, we'll be able to avoid modify() code duplication.
msg291306 - (view)	Author: Charles-François Natali (neologix) *	Date: 2017-04-07 22:41
This refactoring was already suggested a long time ago, and at the time both Guido and I didn't like it because it makes the code actually more complicated: DRY in this case doesn't apply IMO. Also, this whole thread is a repeat of: http://bugs.python.org/issue18932 At the time, I already asked for one realistic use case demonstrating the benefit of implementing modify() natively instead of unregister()+register(). I know that this code makes modify() faster, but as I said multiple times, I'd like to se the impact on something else than a micro-benchmark: arguably if you propose this it's because you've profiled/found it to be an actual bottleneck, do you have one benchmark to support this change?
msg291340 - (view)	Author: Giampaolo Rodola' (giampaolo.rodola) *	Date: 2017-04-08 18:31
For the sake of experiment I'm attaching a toy echo server which uses modify() to switch between EVENT_READ and EVENT_WRITE. Without patch I get 35000 req/sec, with patch around 39000 req/sec (11.4% faster). To be entirely honest a smarter echo server would switch between EVENT_READ / EVENT_WRITE only in case of BlockingIOError on send() or recv(), but unfortunately it's a condition which (for send()) does not occur often by testing on localhost (for recv() it naturally occurs +27000 times out of 39000). On the other hand, by experimenting with a real network it occurs quite soon: import socket sock = socket.socket() sock.connect(('google.com', 80)) sock.setblocking(False) print(sock.send(b'x' * 50000)) print(sock.send(b'x' * 50000)) # raise BlockingIOError So basically my benchmark is emulating a worst case scenario in which send() always blocks on the first call and succeed on the next one, in order to mimic recv() which blocks half of the times. The whole point is that the speedup entirely depends on how many times send() or recv() will block in a real world app connected to the internet (because that's when modify() is supposed to be used) but that's hard to determine, especially under heavy server loads which is hard to emulate. This also involves the SSL handshake when it fails with WANT_READ / WANT_WRITE, which is another circumstance where modify() is going to be used and for which I have no benchmarks. I personally don't mind about selectors module per-se, but given that it's the core of important libs such as asyncio and curio I would like to hear a better rationale than "let's reject this 1.5x speedup because DRY does not apply in this case". Also please consider that Tornado, Twisted and gevent use native modify() method.
msg292586 - (view)	Author: Charles-François Natali (neologix) *	Date: 2017-04-29 09:24
The rationale for rejecting wouldn't be "DRY does not apply in this case", it would be that this makes the code more complicated, and that a negligible speedup would not be worth it. Now, thanks to your benchmark, a 10% speedup is not negligible, so that seems like a reasonable change. I'll have a look at your refactoring code, and once pushed, we can optimize modify().
msg292602 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-04-29 22:23
I also like the plan starting with refactoring ;-)
msg295338 - (view)	Author: Giampaolo Rodola' (giampaolo.rodola) *	Date: 2017-06-07 14:10
OK, https://github.com/python/cpython/pull/1030 should be good to go.
msg295564 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-06-09 20:20
New changeset fbfaa6fd57f8dc8a3da808acb8422370fad2f27b by Victor Stinner (Giampaolo Rodola) in branch 'master': bpo-30014: make poll-like selector's modify() method faster (#1030) https://github.com/python/cpython/commit/fbfaa6fd57f8dc8a3da808acb8422370fad2f27b
msg308807 - (view)	Author: Andrew Svetlov (asvetlov) *	Date: 2017-12-20 20:36
PR was merged 6 months ago, closing the issue.

History
Date	User	Action	Args
2022-04-11 14:58:45	admin	set	github: 74200
2017-12-20 20:36:49	asvetlov	set	status: open -> closed nosy: + asvetlov messages: + msg308807 resolution: fixed stage: patch review -> resolved
2017-06-09 20:20:43	vstinner	set	messages: + msg295564
2017-06-07 14:10:40	giampaolo.rodola	set	messages: + msg295338
2017-04-29 22:23:24	vstinner	set	messages: + msg292602
2017-04-29 09:24:18	neologix	set	messages: + msg292586
2017-04-08 18:31:53	giampaolo.rodola	set	files: + echo_client.py
2017-04-08 18:31:40	giampaolo.rodola	set	files: + echo_server.py messages: + msg291340
2017-04-07 22:41:26	neologix	set	messages: + msg291306
2017-04-07 19:25:28	giampaolo.rodola	set	messages: + msg291286
2017-04-07 18:09:29	giampaolo.rodola	set	messages: + msg291282
2017-04-07 17:34:58	neologix	set	messages: + msg291280 title: Speedup DefaultSelectors.modify() by 1.5x -> Speedup DefaultSelectors.modify() by 2x
2017-04-07 14:31:15	gvanrossum	set	nosy: - gvanrossum
2017-04-07 13:39:46	giampaolo.rodola	set	messages: + msg291274
2017-04-07 13:24:04	vstinner	set	messages: + msg291273
2017-04-07 13:11:44	giampaolo.rodola	set	messages: + msg291272
2017-04-07 13:03:27	vstinner	set	title: Speedup DefaultSelectors.modify() by 2x -> Speedup DefaultSelectors.modify() by 1.5x
2017-04-07 13:03:12	vstinner	set	messages: + msg291271
2017-04-07 12:58:56	vstinner	set	messages: + msg291270
2017-04-07 12:49:22	giampaolo.rodola	set	messages: + msg291269 pull_requests: + pull_request1187
2017-04-07 12:40:38	giampaolo.rodola	set	messages: + msg291268
2017-04-07 12:28:21	vstinner	set	messages: + msg291267
2017-04-07 12:24:37	vstinner	set	files: + bench_selectors_modify.py messages: + msg291266
2017-04-07 12:19:16	giampaolo.rodola	set	messages: + msg291265
2017-04-07 12:02:25	neologix	set	messages: + msg291264
2017-04-07 11:01:15	giampaolo.rodola	set	type: performance components: + Library (Lib), asyncio stage: patch review
2017-04-07 10:41:47	giampaolo.rodola	set	nosy: + gvanrossum, vstinner, neologix, yselivanov
2017-04-07 10:40:26	giampaolo.rodola	set	files: + bench.py versions: + Python 3.7
2017-04-07 10:40:07	giampaolo.rodola	create