classification
Title: Optimize selectors.EpollSelector.modify()
Type: performance Stage: resolved
Components: asyncio Versions: Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, felipecruz, giampaolo.rodola, gvanrossum, meador.inge, neologix, pitrou, python-dev, rhettinger, rosslagerwall, sbt, vstinner, yselivanov
Priority: normal Keywords: patch

Created on 2013-09-05 11:01 by giampaolo.rodola, last changed 2018-06-25 23:01 by giampaolo.rodola. This issue is now closed.

Files
File name Uploaded Description Edit
selectors_optimize_modify.patch vstinner, 2015-02-04 17:32 review
selectors_optimize_modify-2.patch vstinner, 2015-02-05 09:01 review
epoll.diff giampaolo.rodola, 2015-02-15 21:31 review
Messages (17)
msg196990 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2013-09-05 11:01
This is a follow up of issue 16853 (see comment http://bugs.python.org/issue16853#msg196984).
msg214991 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-03-27 23:00
Why can't this be fixed in 3.4.1? It isn't an API change.
msg235345 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-03 15:01
selectors.EpollSelector.modify() should use epoll.modify() instead of unregister()+register().
msg235390 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-04 17:32
Here is a patch:

Issue #18932, selectors: Optimize the modify() method of selectors

Optimize also register() and unregister() methods of KqueueSelector: only call kqueue.control() once.
msg235391 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-04 17:49
I tested SelectSelector, PollSelector, EpollSelector on Linux. I ran tests on FreeBSD, so also tested KqueueSelector. I didn't test DevpollSelector, but the code should be identical to PollSelector (the API is the same).
msg235392 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2015-02-04 17:53
_BaseSelectorImpl.modify() still calls unregister() and register(). To my understanding the whole point of this proposal was to avoid that in order to obtain the speedup and (possibly) avoid race conditions.
msg235396 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2015-02-04 19:25
Well, I'd like to see at least one benchmark.
msg235401 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-04 20:25
Benchmark on Fedora 21 (Linux 3.18.3, glibc 2.20, Python 3.5 rev 7494f3972726).

Original:

haypo@selma$ ./python -m timeit -s 'import os, selectors; s=selectors.SelectSelector(); r,w=os.pipe(); s.register(r, selectors.EVENT_READ)' 's.modify(r, selectors.EVENT_WRITE); s.modify(r, selectors.EVENT_READ)'
10000 loops, best of 3: 43.7 usec per loop
haypo@selma$ ./python -m timeit -s 'import os, selectors; s=selectors.PollSelector(); r,w=os.pipe(); s.register(r, selectors.EVENT_READ)' 's.modify(r, selectors.EVENT_WRITE); s.modify(r, selectors.EVENT_READ)'
10000 loops, best of 3: 45.1 usec per loop
haypo@selma$ ./python -m timeit -s 'import os, selectors; s=selectors.EpollSelector(); r,w=os.pipe(); s.register(r, selectors.EVENT_READ)' 's.modify(r, selectors.EVENT_WRITE); s.modify(r, selectors.EVENT_READ)'
10000 loops, best of 3: 48.4 usec per loop

Patched:

haypo@selma$ ./python -m timeit -s 'import os, selectors; s=selectors.SelectSelector(); r,w=os.pipe(); s.register(r, selectors.EVENT_READ)' 's.modify(r, selectors.EVENT_WRITE); s.modify(r, selectors.EVENT_READ)'
10000 loops, best of 3: 37.2 usec per loop
haypo@selma$ ./python -m timeit -s 'import os, selectors; s=selectors.PollSelector(); r,w=os.pipe(); s.register(r, selectors.EVENT_READ)' 's.modify(r, selectors.EVENT_WRITE); s.modify(r, selectors.EVENT_READ)'
10000 loops, best of 3: 40.5 usec per loop
haypo@selma$ ./python -m timeit -s 'import os, selectors; s=selectors.EpollSelector(); r,w=os.pipe(); s.register(r, selectors.EVENT_READ)' 's.modify(r, selectors.EVENT_WRITE); s.modify(r, selectors.EVENT_READ)'
10000 loops, best of 3: 39.9 usec per loop

* SelectSelector: 14% faster (-6.5 us)
* PollSelector: 10% faster (-4.6 us)
* EpollSelector: 18% faster (-8.5 us)

--

> _BaseSelectorImpl.modify() still calls unregister() and register().

I chose to factorize code in modify() and only put the specific code in _modify(). The advantage is to pass directly oldkey and key to _modify().

We may even try to keep the selector consistent if _modify() fails. For example, start with unregister, and only register if _modify() succeed.

--

In a previous (local) patch, the default implementation of _modify() was:

    self.unregister(fileobj)
    return self.register(fileobj, events, data)

And each specialized _modify() started with:

    oldkey = self.unregister(fileobj)
    key = self.register(fileobj, events, data)

Do you prefer this?
msg235425 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-05 09:01
Updated patch with a less surprising _modify() method:
- the default implementation simply calls unregister() + register(), as before
- drop SelectSelector._modify(): calling unregister() + register() is just fine
msg235426 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-05 09:07
> - drop SelectSelector._modify(): calling unregister() + register() is just fine

I checked with strace: PollSelector.modify() doesn't require any syscall, so I propose to also drop it (to just call unregister + register). What do you think?

I would like to reduce the number of syscalls, calling a C function of the glibc is cheap.

> This is a follow up of issue 16853 (see comment http://bugs.python.org/issue16853#msg196984).

"(...) the problem with unregister() + register() vs a real
modify (e.g. EPOLL_CTL_MOD) is that it's subject to a race condition,
if an event fires during the short window where the FD isn't
registered anymore."

The race condition only occurs with selectors which have a state machine in the kernel: epoll, kqueue, devpoll. SelectSelector and PollSelector are state-less (in the kernel, the state is build for a single syscall and then destroyed). Am I right?
msg235439 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2015-02-05 17:46
I mean something like this (for epoll), as in avoiding to call unregister() / register() at all:


diff -r 017e7391ab58 Lib/selectors.py
--- a/Lib/selectors.py  Wed Feb 04 08:37:02 2015 -0800
+++ b/Lib/selectors.py  Thu Feb 05 18:42:26 2015 +0100
@@ -412,6 +412,23 @@
                 pass
             return key
 
+        def modify(self, fileobj, events, data=None):
+            try:
+                key = self._fd_to_key[self._fileobj_lookup(fileobj)]
+            except KeyError:
+                raise KeyError("{!r} is not registered".format(fileobj)) from None
+            if data != key.data:
+                key = key._replace(data=data)
+                self._fd_to_key[key.fd] = key
+            if events != key.events:
+                epoll_events = 0
+                if events & EVENT_READ:
+                    epoll_events |= select.POLLIN
+                if events & EVENT_WRITE:
+                    epoll_events |= select.POLLOUT
+                self._epoll.modify(key.fd, epoll_events)
+            return key
+
         def select(self, timeout=None):
             if timeout is None:
                 timeout = -1


Before the patch:

~/svn/python/3.5$ ./python -m timeit -s 'import os, selectors; s=selectors.EpollSelector(); r,w=os.pipe(); s.register(r, selectors.EVENT_READ)' 's.modify(r, selectors.EVENT_WRITE); s.modify(r, selectors.EVENT_READ)'
100000 loops, best of 3: 14.7 usec per loop


After the patch:

~/svn/python/3.5$ ./python -m timeit -s 'import os, selectors; s=selectors.EpollSelector(); r,w=os.pipe(); s.register(r, selectors.EVENT_READ)' 's.modify(r, selectors.EVENT_WRITE); s.modify(r, selectors.EVENT_READ)'
100000 loops, best of 3: 3.07 usec per loop

That's equal to about a 4.5x (+450%) speedup if I'm not mistaken.
msg235452 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-05 20:59
2015-02-05 18:46 GMT+01:00 Giampaolo Rodola' <report@bugs.python.org>:
> +        def modify(self, fileobj, events, data=None):
> +            ...
> +            if events != key.events:

You should update the SelectorKey in the case, no?
msg235459 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2015-02-06 03:05
Yes, it appears so. I would suggest to come up with a patch which overrides modify() for all those selectors which can benefit from it (poll, epoll, kqueue) and refactor register() / unregister() (for select) in a separate issue / patch. Also, I suppose a couple of tests should be added (if not there already) in order to make sure that 'events' and 'data' arguments get actually changed in different situations.
msg236066 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-15 20:19
@Giampaolo: I'm not sure that I understood your proposition? Can you please write a patch?
msg236071 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2015-02-15 21:31
Attached is a patch (for epoll only) which updates the SelectorKey. This introduces a considerable slowdown compared to my first patch:

no patch:    16.2 usec per loop
your patch:  12.4 usec per loop
my patch:    10.9 usec per loop
first patch: 3.07 usec per loop

The culprit is the new nameduple's _replace() call. It's a shame a single line adds such a great slowdown. I think it makes sense to investigate whether we can introduce a faster replace mechanism for namedtuple.
msg236108 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2015-02-16 19:31
Does anyone have a realistic use case where modify() is actually a
non-negligible part?
msg320453 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2018-06-25 23:01
This was implemented in https://bugs.python.org/issue30014. Closing as duplicate.
History
Date User Action Args
2018-06-25 23:01:13giampaolo.rodolasetstatus: open -> closed
versions: + Python 2.7, - Python 3.5
messages: + msg320453

resolution: duplicate
stage: needs patch -> resolved
2015-02-16 19:31:27neologixsetmessages: + msg236108
2015-02-15 21:31:55giampaolo.rodolasetfiles: + epoll.diff
nosy: + rhettinger
messages: + msg236071

2015-02-15 20:19:18vstinnersetmessages: + msg236066
2015-02-06 03:08:46giampaolo.rodolasettype: performance
stage: needs patch
2015-02-06 03:05:42giampaolo.rodolasetmessages: + msg235459
2015-02-05 20:59:21vstinnersetmessages: + msg235452
2015-02-05 17:46:37giampaolo.rodolasetmessages: + msg235439
2015-02-05 09:07:34vstinnersetmessages: + msg235426
2015-02-05 09:01:53vstinnersetfiles: + selectors_optimize_modify-2.patch

messages: + msg235425
2015-02-04 20:25:57vstinnersetmessages: + msg235401
2015-02-04 19:25:15neologixsetmessages: + msg235396
2015-02-04 17:53:44giampaolo.rodolasetmessages: + msg235392
2015-02-04 17:49:18vstinnersetmessages: + msg235391
2015-02-04 17:32:21vstinnersetfiles: + selectors_optimize_modify.patch
keywords: + patch
messages: + msg235390
2015-02-03 15:01:38vstinnersetnosy: + yselivanov
components: + asyncio
2015-02-03 15:01:22vstinnersetmessages: + msg235345
title: selectors and modify() -> Optimize selectors.EpollSelector.modify()
2014-03-27 23:00:02gvanrossumsetmessages: + msg214991
2014-03-27 22:43:24asvetlovsetversions: + Python 3.5, - Python 3.4
2013-09-05 11:01:54giampaolo.rodolacreate