classification
Title: epoll.poll(timeout) and PollSelector.select(timeout) must round the timeout to the upper bound
Type: Stage:
Components: Versions: Python 3.4, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: eric.snow, georg.brandl, gvanrossum, haypo, neologix, pitrou, python-dev
Priority: normal Keywords: patch

Created on 2014-01-20 10:33 by haypo, last changed 2014-02-10 21:04 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
epoll_timeout.patch haypo, 2014-01-20 10:33 review
select_round_timeout.patch haypo, 2014-01-20 14:08 review
asyncio_granularity.patch haypo, 2014-01-24 10:04 review
Messages (54)
msg208532 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-01-20 10:33
Hi, while comparing performances of Tulip and Trollius projects (asyncio module), I found a bug. The event loop is called too many times when a callback is scheduled with a timeout and the epoll selector is used, especially for timeout close to one millisecond.

In my opinion, it's a bug in epoll.poll(): it should wait *at least* timeout seconds if no event is received, not shorter.

Pseudo-code of Tulip:
---
timeout = scheduled[0].when - time.monotonic()
events = selector.select(timeout)
process_events(events)
while scheduled:
   if scheduled[0].when > time.monotonic():
      break
   ...
---

The problem is that "scheduled[0].when > time.monotonic()" test fails because epoll.poll() returns in less than timeout seconds.

The difference between the timeout and elapsed time is very small, less than 1 millisecond.

Attached patch fixes this issue by rounding the timeout to the upper bound.

The rounding is different than datetime.timedelta constructor for example, because the usecase is different. epoll.poll(0.0001) calls epoll_wait() with a timemout of 1 millisecond (1e-4 rounded to 1e-3).

epoll.poll(0.0) still calls epoll_wait() with a timeout of 0 millisecond. I don't think that it's possible to write a reliable unit test for that on our slow buildbots.
msg208533 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-01-20 10:54
Bug report in Tulip project, the bug affects also the new asyncio module of Python 3.4:
http://code.google.com/p/tulip/issues/detail?id=106
msg208534 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-01-20 10:58
Looks good to me on the principle.
msg208539 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-01-20 13:11
AFAICT, this also affects poll().
Although it's supposed to be passed an integer, passing a float will result
in a truncation towards 0:
"""
$ strace -e poll python -c "import select; p = select.poll(); p.poll(0.9)"
poll(0x23321b0, 0, 0)                   = 0 (Timeout)
"""

See also this line in PollSelector:
"""
        def select(self, timeout=None):
            timeout = None if timeout is None else max(int(1000 * timeout),
0)
"""

This will round timeout=1e-4 to 0.
msg208540 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-01-20 14:08
select.select(), selectors.PollSelector.select(), select.epoll.poll(), select.kqueue.control() have the bug.

OS multiplexers:

- select: microsecond resolution (10^-6), timeout converted by _PyTime_ObjectToTimeval() which converts 1e-7 to zero => BUG!
- poll: millisecond resolution (10^-3) but expects int type, select is ok, selectors uses int(1000 * timeout) which converts 1e-4 to zero => BUG!
- epoll: millisecond resolution (10^-3), select uses (int)(dtimeout * 1000.0) which converts 1e-4 to zero => BUG!
- devpoll: millisecond resolution (10^-3) but expects int type, select is ok, selectors doesn't support it yet => ok
- kqueue: nanosecond resolution (10^-9), timeout converted by _PyTime_ObjectToTimespec() which converts 1e-10 to zero => BUG!

_PyTime_ObjectToTimeval() is used in various places:

- datetime.datetime.fromtimestamp(), datetime.datetime.utcfromtimestamp()
- select.select(), 

_PyTime_ObjectToTimespec() is used in various places:

- posix.utime()
- signal.sigtimedwait()
- select.kqueue.control()
- time.clock_settime()

A new parameter should be added to _PyTime_ObjectToTimeval() and _PyTime_ObjectToTimespec() to choose the rounding method.

--

I saw the bug with poll and epoll because these functions have the worst resolution (millisecond). Attached patch fixes poll and epoll, and add a generic test on all selectors: select_round_timeout.patch.

According to my test, select.select() has an effective resolution of 0.1 ms (10^4) on my Fedora 19 (Linux kernel 3.12), so I don't think that the select rounding issue is important. Tell me if you consider that _PyTime_ObjectToTime*() (select, kqueue) should be fixed. If yes, I will open a separated issue (because the implementation is different and it may impact other unrelated functions).
msg208588 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-01-20 23:14
> A new parameter should be added to _PyTime_ObjectToTimeval() and
> _PyTime_ObjectToTimespec() to choose the rounding method.

That doesn't sound necessary. Just fix the select and selectors module so that the final (OS-level) timeout is always > 0 when the user timeout is > 0.
msg208595 - (view) Author: Roundup Robot (python-dev) Date: 2014-01-21 00:43
New changeset 033137c12d88 by Victor Stinner in branch '3.3':
Issue #20311: select.epoll.poll() now rounds the timeout away from zero,
http://hg.python.org/cpython/rev/033137c12d88

New changeset 02f9db3e684e by Victor Stinner in branch 'default':
(Merge 3.3) Issue #20311: select.epoll.poll() now rounds the timeout away from
http://hg.python.org/cpython/rev/02f9db3e684e
msg208596 - (view) Author: Roundup Robot (python-dev) Date: 2014-01-21 00:49
New changeset e042ea77a152 by Victor Stinner in branch 'default':
Issue #20311: selector.PollSelector.select() now rounds the timeout away from
http://hg.python.org/cpython/rev/e042ea77a152
msg208597 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-01-21 00:51
Rounding issues with poll() and epoll() have been fixed in Python 3.3 and 3.4. I'm now waiting for buildbots before closing the issue.

I created a new issue #20320 to address the rounding issue with select() and kqueue (and signal.sigtimedwait).
msg208599 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-01-21 01:16
The epoll test failed on a buildbot. The test uses a short delay (10 milliseconds) and compare two different clocks: time.perf_counter() and select.epoll.poll(). I didn't expect a failure very the greatest delay, the test uses even shorter delay (the shortest is 0.1 ms).

The best would be to trace syscalls to check if epoll_wait() is called with a non-zero timeout, but tracing syscalls is overkill and too low level for such test.

It's maybe better to drop this unstable test? Unstable on our slow buildbots.

http://buildbot.python.org/all/builders/x86%20Gentoo%20Non-Debug%203.3/builds/1392/steps/test/logs/stdio

======================================================================
FAIL: test_timeout_rounding (test.test_epoll.TestEPoll)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/lib/buildslave/3.3.murray-gentoo-wide/build/Lib/test/test_epoll.py", line 58, in test_timeout_rounding
    self.assertGreaterEqual(dt, timeout)
AssertionError: 0.008995789801701903 not greater than or equal to 0.01
msg208601 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-01-21 01:28
Another failure:

http://buildbot.python.org/all/builders/x86%20Ubuntu%20Shared%203.3/builds/1360/steps/test/logs/stdio

======================================================================
FAIL: test_timeout_rounding (test.test_epoll.TestEPoll)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/srv/buildbot/buildarea/3.3.bolen-ubuntu/build/Lib/test/test_epoll.py", line 58, in test_timeout_rounding
    self.assertGreaterEqual(dt, timeout)
AssertionError: 0.009388947859406471 not greater than or equal to 0.01
msg208602 - (view) Author: Roundup Robot (python-dev) Date: 2014-01-21 01:38
New changeset e1619465c49d by Victor Stinner in branch '3.3':
Issue #20311: Try to fix the unit test, use time.monotonic() instead of
http://hg.python.org/cpython/rev/e1619465c49d

New changeset 26c54e9a933b by Victor Stinner in branch 'default':
(Merge 3.3) Issue #20311: Try to fix the unit test, use time.monotonic()
http://hg.python.org/cpython/rev/26c54e9a933b
msg208696 - (view) Author: Roundup Robot (python-dev) Date: 2014-01-21 20:01
New changeset 7ce7295393c2 by Victor Stinner in branch 'default':
Issue #20311: EpollSelector now also rounds the timeout towards zero, as
http://hg.python.org/cpython/rev/7ce7295393c2
msg208927 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-01-23 14:06
http://buildbot.python.org/all/builders/x86%20Ubuntu%20Shared%203.x/builds/9552

======================================================================
FAIL: test_timeout_rounding (test.test_epoll.TestEPoll)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_epoll.py", line 59, in test_timeout_rounding
    self.assertGreaterEqual(dt, timeout)
AssertionError: 0.009316698648035526 not greater than or equal to 0.01
msg208928 - (view) Author: Roundup Robot (python-dev) Date: 2014-01-23 14:07
New changeset 3637ccf9516a by Victor Stinner in branch 'default':
Issue #20311: add debug help in test_selectors
http://hg.python.org/cpython/rev/3637ccf9516a
msg208976 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-01-23 18:42
======================================================================
> FAIL: test_timeout_rounding (test.test_epoll.TestEPoll)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File
"/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_epoll.py",
line 59, in test_timeout_rounding
>     self.assertGreaterEqual(dt, timeout)
> AssertionError: 0.009316698648035526 not greater than or equal to 0.01

Those failures are "expected", nothing guarantees that the syscall will
take at least the amount of time specified.
msg209001 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-01-23 21:39
> Those failures are "expected", nothing guarantees that the syscall
> will take at least the amount of time specified.

Ah? The manual page of epoll_wait() says:

"The  timeout  argument specifies the minimum number of milliseconds that epoll_wait() will block.  (This interval will be rounded up to the system clock granularity, and kernel scheduling delays mean that the blocking  interval may  overrun  by  a  small  amount.)"

I read minimum, not maximum here :-)

If epoll_wait(timeout_ms) may wait less than timeout_ms seconds, asyncio algorithm is wrong, or at least inefficient. It should loop until the time delta is at least total_timeout seconds. See the original issue:
http://code.google.com/p/tulip/issues/detail?id=106
msg209005 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-01-23 22:00
> "The  timeout  argument specifies the minimum number of milliseconds
> that epoll_wait() will block.  (This interval will be rounded up to
> the system clock granularity, and kernel scheduling delays mean that
> the blocking  interval may  overrun  by  a  small  amount.)"
> 
> I read minimum, not maximum here :-)

Aren't you using different clocks, though?
The excerpt above says "system clock" (which admittedly sounds a bit
vague :-)).
msg209006 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-01-23 22:01
Isn't any blocked syscall with a timeout can be interrupted by a signal? 
epoll_wait() can return EINTR.
msg209008 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-01-23 22:21
> Ah? The manual page of epoll_wait() says:
>
> "The  timeout  argument specifies the minimum number of milliseconds that
epoll_wait() will block.  (This interval will be rounded up to the system
clock granularity, and kernel scheduling delays mean that the blocking
 interval may  overrun  by  a  small  amount.)"
>
> I read minimum, not maximum here :-)

Yes, but we're talking about a 1e-4 accuracy here: I really doubt all
hardware supports high-resolution timers. epoll() returning 1e-4s before
the passed delay doesn't surprise me.

> If epoll_wait(timeout_ms) may wait less than timeout_ms seconds, asyncio
algorithm is wrong, or at least inefficient. It should loop until the time
delta is at least total_timeout seconds. See the original issue:
> http://code.google.com/p/tulip/issues/detail?id=106

Not really: sure, an early wakeup can cause spurious loops, but this should
be really rare: how often is the main event loop called with
sub-millisecond timeouts?
msg209015 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-01-23 23:04
Le 23 janv. 2014 23:21, e me.
> > If epoll_wait(timeout_ms) may wait less than timeout_ms seconds, asyncio
> algorithm is wrong, or at least inefficient. It should loop until the time
> delta is at least total_timeout seconds. See the original issue:
> > http://code.google.com/p/tulip/issues/detail?id=106
>
> Not really: sure, an early wakeup can cause spurious loops, but this
should
> be really rare: how often is the main event loop called with
> sub-millisecond timeouts?

Did you read my Tulip? Maybe I didn't explain correctly.

Usually, the shorter delay in an application is 10 ms. The problem is that
because of the rounding issue (which should now be fixed), the event loop
computed a delay smaller than 1 ms. And this delay was rounded to 0 ms by
epoll.poll(). Then the deadline was still not reached, the loop was called
again... The number of call depends on the performance of your cpu. The
high load should be short, probably less than 5 ms, but it occured regulary
and was 2totally inefficient.

Now asyncio should no more call epoll_wait() in non blocking mode (timeout
of 0 ms) if a task is scheduled. So the major inefficient bug is fixed.

But it would be nice to ensure that asyncio doesn't loop if nothing happens
whereas the deadline is known.

For example, it may call again epoll_wait() if it took less than timeout
seconds and returned no event, *and* the ready list is empty.

For the selector unit test: why does it fail whereas test_epoll has the
same test and test_epoll doesn't fail?

I fixed the rounding issue in selectmodule.c, but later I made a different
fix in selectors which should only be needed for unpatched selectmodule.c.
I did that to keep Tulip and Python code identical (including selectors, as
asked by Guido).

Maybe my rounding formula in selectors.py is wrong? 1e-3 number isn't IEEE
754 friendly, it cannot be stored exactly in binary (in a C double of 64
bits).

Maybe the code should be skipped depending on the python version?
msg209016 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-01-23 23:08
> For example, it may call again epoll_wait() if it took less than timeout
> seconds and returned no event, *and* the ready list is empty.

Easy solution: add 1 ms. to the timeout before calling epoll_wait().
Perhaps we need the same kind of thing for select() and poll() too.
msg209019 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-01-23 23:25
Le 24 janv. 2014 00:08, "Antoine Pitrou" <report@bugs.python.org> a écrit :
> > For example, it may call again epoll_wait() if it took less than timeout
> > seconds and returned no event, *and* the ready list is empty.
>
> Easy solution: add 1 ms. to the timeout before calling epoll_wait().
> Perhaps we need the same kind of thing for select() and poll() too.

It doesn't fix the case when EpollSelector.select() got an InterruptedError.

Adding 1 ms works around the (now fixed) timeout rounding issue but I
prefered to round differently to not sleep longer than requested (try to
respect deadline).

About the "system clock", it's possibke than Linux uses a different clock
for epoll_wait() than CLOCK_MONOTONIC, and than clocks have a different
resolution.

I didn't test the rounding issue on Windows where time.monotonic() has a
resolution between... 1 and 16 ms!
msg209021 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-01-23 23:27
> It doesn't fix the case when EpollSelector.select() got an InterruptedError.

That should be very rare. I don't see a problem with retrying on EINTR.

> Adding 1 ms works around the (now fixed) timeout rounding issue but I
> prefered to round differently to not sleep longer than requested (try to
> respect deadline).

What are you going to round exactly? The epoll timeout? The deadline?
msg209039 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-01-24 05:36
> Did you read my Tulip? Maybe I didn't explain correctly.

No, it was quite clear, and I think I understand the original issue:
calling epoll_wait(0) when passed a timeout of 0.9ms was bad, because it
led to busy-loop during 0.9ms.

But here, that's another story: once in a blue moon, depending on the
platform, epoll_wait(1ms) might return after e.g. 0.931ms as above.
In this case, we'll just call epoll once more with another extra 1ms
timeout: this doesn't yield excessive CPU usage, only a slight latency, but
this extra latency is completely normal, since we have to round the delay
somehow (e.g. if someone passes 0.001ms it will be rounded to 1ms).

Someone needing high-resolution wakeups shouldn't rely on epoll/select
timeouts, but use timerfd_create or something similar.

Furthermore, in practice this shouldn't occur often: if someone passes a
1.001 ms timeout, we'll round it to 2ms, so we won't have the problem.
If someone passes a 0.9ms timeout, it will be rounded to 1ms, so even if
epoll wakes up a little before 1ms, let's say 0.931ms as in the above
buildbot failure, it will still be greater than 0.9ms, so no problem either.

So IMO it's not a problem, and even if it is, there's no reason to
complicate the code.

In your test, just restrict the rounding check for delays which are
strictly less than the underlying syscall resolutuion, e.g. for epoll 1-6,
1-5, 1-4. That's enough to check the correct rounding.

> Maybe my rounding formula in selectors.py is wrong? 1e-3 number isn't IEEE
> 754 friendly, it cannot be stored exactly in binary (in a C double of 64
> bits).

Regardless of IEEE754 representation, in the end, epoll_wait() is passed an
integer (a C long).
So it's either passed 0, 1 or 2. Given the 0.009316698648035526 value, the
only integer that was possibly passed is 1 (ms), so I don't think rounding
is the issue.
msg209040 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-01-24 07:07
I've lost some context, but perhaps we should have the notion of
"granularity" of the poll/select timeout (e.g. 1 msec), and consider
events that are in the future by less than that granularity as ready?
Then we can round the timeout down: if someone passes a timeout of 1.1
msec, poll would wait for approximately 1 msec, and when it returns
the event would be considered "due now" as long as the balance (about
0.1 msec) was under 1 msec.

On Thu, Jan 23, 2014 at 9:36 PM, Charles-François Natali
<report@bugs.python.org> wrote:
>
> Charles-François Natali added the comment:
>
>> Did you read my Tulip? Maybe I didn't explain correctly.
>
> No, it was quite clear, and I think I understand the original issue:
> calling epoll_wait(0) when passed a timeout of 0.9ms was bad, because it
> led to busy-loop during 0.9ms.
>
> But here, that's another story: once in a blue moon, depending on the
> platform, epoll_wait(1ms) might return after e.g. 0.931ms as above.
> In this case, we'll just call epoll once more with another extra 1ms
> timeout: this doesn't yield excessive CPU usage, only a slight latency, but
> this extra latency is completely normal, since we have to round the delay
> somehow (e.g. if someone passes 0.001ms it will be rounded to 1ms).
>
> Someone needing high-resolution wakeups shouldn't rely on epoll/select
> timeouts, but use timerfd_create or something similar.
>
> Furthermore, in practice this shouldn't occur often: if someone passes a
> 1.001 ms timeout, we'll round it to 2ms, so we won't have the problem.
> If someone passes a 0.9ms timeout, it will be rounded to 1ms, so even if
> epoll wakes up a little before 1ms, let's say 0.931ms as in the above
> buildbot failure, it will still be greater than 0.9ms, so no problem either.
>
> So IMO it's not a problem, and even if it is, there's no reason to
> complicate the code.
>
> In your test, just restrict the rounding check for delays which are
> strictly less than the underlying syscall resolutuion, e.g. for epoll 1-6,
> 1-5, 1-4. That's enough to check the correct rounding.
>
>> Maybe my rounding formula in selectors.py is wrong? 1e-3 number isn't IEEE
>> 754 friendly, it cannot be stored exactly in binary (in a C double of 64
>> bits).
>
> Regardless of IEEE754 representation, in the end, epoll_wait() is passed an
> integer (a C long).
> So it's either passed 0, 1 or 2. Given the 0.009316698648035526 value, the
> only integer that was possibly passed is 1 (ms), so I don't think rounding
> is the issue.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue20311>
> _______________________________________
msg209056 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-01-24 10:04
Guido van Rossum added the comment:
> I've lost some context, but perhaps we should have the notion of
> "granularity" of the poll/select timeout (e.g. 1 msec), and consider
> events that are in the future by less than that granularity as ready?
> Then we can round the timeout down: if someone passes a timeout of 1.1
> msec, poll would wait for approximately 1 msec, and when it returns
> the event would be considered "due now" as long as the balance (about
> 0.1 msec) was under 1 msec.

Well, it sounds like a good compromise according to my experimentation in my last commits on epoll and selectors.

New attached asyncio_granularity.patch:

- Restore the previous rounding method for poll and epoll (in select and selectors modules), remove unit tests which were designed to test the rounding the timeout away from zero
- Add a new resolution attribute to selectors.BaseSelector (select: 1e-6, poll and epoll: 1e-3, kqueue: 1e-9)
- Add a new granularity attribute to asyncio.BaseEventLoop: maximum of time resolution and selector resolution
- Add a unit test for asyncio counting the number of calls to _run_once(), independent of the selector and the platform
- BaseEventLoop._run_once() rounds the deadline and current time using granularity

I tested the patch on Linux, FreeBSD and Windows (select, selectors and asyncio tests pass). Granularity:

- Linux. poll, epoll: 1 ms, select: 1 us (resolution of time.monotonic: 1 ns)
- FreeBSD. poll: 1 ms, select: 1 us, kqueue: 11 ns (resolution of time.monotonic: 11 ns)
- Windows (tested on my Windows 7 VM): select, proactor: 15.6 ms (resolution of time.monotonic(): 15.6 ms)

BaseProactorEventLoop doesn't use the resolution of IocpProactor because I don't know the resolution :-) Anyway, time.monotonic() has a low resolution (15.6 ms) and so it should be fine.

If the patch is accepted, my changes on Python 3.3 should also be reverted.
msg209164 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2014-01-25 08:10
Since the test is still failing on at least 3 stable buildbots, I've reverted the 3.3 changes for 3.3.4rc1.
msg209197 - (view) Author: Roundup Robot (python-dev) Date: 2014-01-25 13:40
New changeset 900a1ff323bb by Victor Stinner in branch '3.3':
Issue #20311: Revert 033137c12d88, select.epoll.poll() rounds again the timeout
http://hg.python.org/cpython/rev/900a1ff323bb

New changeset caab3e191485 by Victor Stinner in branch 'default':
(Merge 3.3) Issue #20311: Revert 033137c12d88 (02f9db3e684e),
http://hg.python.org/cpython/rev/caab3e191485
msg209198 - (view) Author: Roundup Robot (python-dev) Date: 2014-01-25 13:44
New changeset 90354a4c9dde by Victor Stinner in branch 'default':
Issue #20311: Revert e042ea77a152 and 7ce7295393c2, PollSelector.select() and
http://hg.python.org/cpython/rev/90354a4c9dde
msg209199 - (view) Author: Roundup Robot (python-dev) Date: 2014-01-25 14:02
New changeset 3b8a2281d323 by Victor Stinner in branch 'default':
Issue #20311: selectors: Add a resolution attribute to BaseSelector.
http://hg.python.org/cpython/rev/3b8a2281d323

New changeset 4bc550c66228 by Victor Stinner in branch 'default':
Issue #20311: asyncio: Add a granularity attribute to BaseEventLoop: maximum
http://hg.python.org/cpython/rev/4bc550c66228
msg209201 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-01-25 14:44
I revert all changes in select an selectors, the timeout is rounded again towards zero, as it was before.

I applied my asyncio_granularity.patch:
- selectors.BaseSelector has a new abstract "resolution" property
- asyncio.BaseEventLoop has a new granularity attribute which is used in _run_once() to round timings

I don't like hurry to push changes without review, but I wanted to revert my changes on select and selectors and fix the issue in asyncio issue before Python 3.4 beta 3.

I'm now waiting for buildbots. Don't hesitate to review my changes on selectors (new resolution attribute) and asyncio (new granularity attribute).

For selectors, I chose an abstract attribute to ensure that the resolution is defined in subclasses of BaseSelector and because it does not make sense to modify the resolution.

For asyncio, it's a classic attribute because a subclass may want to adjust the granularity if it overrides the time() method, or for other reasons.
msg209213 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-01-25 16:04
> If the patch is accepted, my changes on Python 3.3 should also be
reverted.

I'm sorry, but I'm not convinced.
The selector's granularity is an implementation detail, and I don't think
it should be exposed.
Furthermore, it's not a mere function of the C type used to represent the
timeout to the underlying syscall (long, timeval): it also depends on the
operating system, the hardware, etc.

Once again, what's wrong with your initial approach of ceiling the timeout?
It does seem reasonable to round away from zero, to ensure - not taking OS
bugs/limitations - that select() will wait at least the passed timeout.
msg209238 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-01-25 21:24
Hi,

2014-01-25 Charles-François Natali <report@bugs.python.org>:
> I'm sorry, but I'm not convinced.
> The selector's granularity is an implementation detail, and I don't think
> it should be exposed.

I disagre, it's not a detail because it causes bugs, knowing the
resolution matters.

> Furthermore, it's not a mere function of the C type used to represent the
> timeout to the underlying syscall (long, timeval): it also depends on the
> operating system, the hardware, etc.

I exposed the resolution of the underlying C structure, I know that it
doesn't guarantee anything. But we did the same thing for clocks with
time.get_clock_info(name).resolution.

At least, if the C structure has a resolution of 1 ms, don't expect to
have a resolution better than 1 ms in practice. It's better than not
knowing it.

The OS limitations can be documented.

> Once again, what's wrong with your initial approach of ceiling the timeout?

It looks like changing the rounding method doesn't solve anything.
selector.select(timeout) may still take less than timeout, so it
doesn't give any guarantee.

IMO adding a granularity to asyncio is a better approach. asyncio can
for example adjusts its granularity at runtime if it see that the
announced selector resolution is wrong. And the granularity also uses
the clock resolution.

Victor
msg209245 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-01-25 22:11
> > Once again, what's wrong with your initial approach of ceiling the
timeout?
>
> It looks like changing the rounding method doesn't solve anything.
> selector.select(timeout) may still take less than timeout, so it
> doesn't give any guarantee.

But what problem does it cause if, once in a while, the call takes less
than the passed timeout?
If that's the case, you'll simply perform another loop, an wake up 1ms
later, that's all.

There's a lot of call written this way, and this has never been a problem:
so far, you still didn't give an example of problematic behavior.
msg209247 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-01-25 22:45
Looks like 3b8a2281d323aa9abf497192b01cf906b98ed3d8 broke the buildbots.

e.g. http://buildbot.python.org/all/builders/AMD64%20FreeBSD%2010.0%203.x/builds/1533
msg209248 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-01-25 22:48
FYI: on my local box I saw only the 2 failed tests in test_telnetlib.
msg209249 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-01-25 22:49
A better example:

http://buildbot.python.org/all/builders/AMD64%20FreeBSD%2010.0%203.x/builds/1538
msg209250 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-01-25 22:52
2014-01-25 Charles-François Natali <report@bugs.python.org>:
>> It looks like changing the rounding method doesn't solve anything.
>> selector.select(timeout) may still take less than timeout, so it
>> doesn't give any guarantee.
>
> But what problem does it cause if, once in a while, the call takes less
> than the passed timeout?
> If that's the case, you'll simply perform another loop, an wake up 1ms
> later, that's all.

"perform another loop" is my problem. If possible, I would like
useless calls to the loop. With my latest patch to asyncio, there is
no more useless calls to the loop (at least, not because of a time
rounding issue).

Oh, I forgot to mention that I'm also concerned by the backward
compatibility. Changing how poll and epoll round the timeout changes
the behaviour. It may break applications or maybe change performances
of applications (efficiency or reactivity). Changing the rounding
method in Python 3.3 was not a good idea.

@Charles-François: I'm not sure that we are talking about the same
thing. If you still want to change poll and epoll, go ahead. Since the
performance issue has been fixed directly in asyncio, I don't feel the
need of changing the poll or epoll anymore in Python 3.4. Or maybe you
disagree with my change in the asyncio module?
msg209251 - (view) Author: Roundup Robot (python-dev) Date: 2014-01-25 22:53
New changeset f453443c96e4 by Victor Stinner in branch 'default':
Issue #20311: Fix test_telnetlib, set the resolution of the MockSelector
http://hg.python.org/cpython/rev/f453443c96e4
msg209252 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-01-25 22:56
I don't have the energy to read all the debate here, but over on python-tulip we continue to discuss this. Victor and I have currently agreed to drop the math.ceil() calls from the event loop and instead simply consider any event scheduled within "granularity" from the current clock value to be "ready". Victor will check it in.

Regarding the buildbot failures, Victor will have to review those.  They are in the telnet tests:

======================================================================
ERROR: test_read_until (test.test_telnetlib.ReadTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/test/test_telnetlib.py", line 179, in test_read_until
    data = telnet.read_until(b'match')
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/telnetlib.py", line 312, in read_until
    with _TelnetSelector() as selector:
TypeError: Can't instantiate abstract class MockSelector with abstract methods resolution

======================================================================
ERROR: test_expect (test.test_telnetlib.ExpectTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/test/test_telnetlib.py", line 388, in test_expect
    (_,_,data) = telnet.expect([b'match'])
  File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/telnetlib.py", line 617, in expect
    with _TelnetSelector() as selector:
TypeError: Can't instantiate abstract class MockSelector with abstract methods resolution

----------------------------------------------------------------------
msg209253 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-01-25 22:57
> Looks like 3b8a2281d323aa9abf497192b01cf9
06b98ed3d8 broke the buildbots.

Oh, I didn't watch buildbots, sorry. It should be fixed by f453443c96e4.

My first attempt used an attribute using a default value of None, but
it was harder to use: I had to check if the value is None or not. I
prefer to enforce an implementation to define the resolution. Or do
you know a selector which has no resolution?
msg209286 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-01-26 07:50
> > But what problem does it cause if, once in a while, the call takes less
> > than the passed timeout?
> > If that's the case, you'll simply perform another loop, an wake up 1ms
> > later, that's all.
>
> "perform another loop" is my problem. If possible, I would like
> useless calls to the loop. With my latest patch to asyncio, there is
> no more useless calls to the loop (at least, not because of a time
> rounding issue).

Still, you fail to provide an example: you will only perform another call
if you pass a tilmeout really close to an exact multiple of the
granularity, *and* the syscall returns early.
Just perform a test on your machine, you'll see that early wakeups are
*really* rare.
And even if that's the case, I argue that performing an extra loop isn't a
problem: for example, noone ever complained about the way timpeouts are
rounded in the select module.

If you think otherwise, please back your claim by an example.

> Oh, I forgot to mention that I'm also concerned by the backward
> compatibility. Changing how poll and epoll round the timeout changes
> the behaviour. It may break applications or maybe change performances
> of applications (efficiency or reactivity). Changing the rounding
> method in Python 3.3 was not a good idea.

Don't apply it to 3.3 then.

> @Charles-François: I'm not sure that we are talking about the same
> thing. If you still want to change poll and epoll, go ahead. Since the
> performance issue has been fixed directly in asyncio, I don't feel the
> need of changing the poll or epoll anymore in Python 3.4. Or maybe you
> disagree with my change in the asyncio module?

I disagree with the introduction of a granularity to selectors.
If you really want to keep the change in asyncio, just introduce a fixed
"slack" value, e.g. 1e-3 seconds.
msg209340 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-01-26 18:22
AFAICT the changes to selectmodule.c have been rolled back, both in the 3.3 and the 3.4 (default) branches. That takes care of backwards compatibility. :-)

So what we're talking about is whether (a) the selectors module (new in 3.3) should export a resolution property and what asyncio should do.

The selectors module in the repo does export a resolution property. It is 1 usec for select, 1msec for poll and epoll, and 1 nsec for kqueue. AFAIK these values are derived from the data type and scale used by the corresponding syscall.

The latest code in the asyncio module uses that resolution for exactly one purpose: if, after waking up from a selector.select() call, any events we are waiting for are less than the resolution in the future, we consider them ready.

I like this approach -- it trusts the syscall to do approximately the right thing most of the time, but it doesn't fall into the busy-wait trap. This trap is especially egregious for [e]poll, where the timing resolution is limited to 1 msec due to the syscall interface.

To explain the trap once more: suppose you have an event scheduled 0.9 msec in the future. Because of the 1 msec resolution of the [e]poll syscall API, the syscall is made with an argument of 0, which means it returns (almost) immediately. Given a fast CPU and no other IO ready, this might well take only a few usec. So the event loop does essentially nothing and then comes back to the poll call. The same event is now e.g. 0.89 msec in the future. So we call [e]poll with a timeout of 0 again, and we keep going around without ever releasing the CPU for the full 0.9 msec -- which nowadays is serious time in which a lot of other, more useful things could be accomplished by other threads or processes.

I think what's currently in the repo is all a fine solution to all that. Because the resolution is specified on a per-selector-class basis, we compensate appropriately for [e]poll but don't apply such a coarse granularity to the other syscalls, which are (potentially) more accurate.

If in the future we find that on some systems the kernel systematically rounds the timeout down using some coarser clock resolution we can add a system-specific compensation to the selectors module, and asyncio itself doesn't have to worry about it. (Though I personally don't expect we'll find this behavior.)

One final comment: IIUC the earlier fix involved using ceil() and other cleverness was essentially trying to predict *when* the next tick would wake us up, assuming that if the resolution is R, the kernel would wake us up at times that are integral multiples of R since the clock epoch. I don't think we can assume this at all, so I'm glad Victor rolled that back. :-)
msg209774 - (view) Author: Roundup Robot (python-dev) Date: 2014-01-31 12:04
New changeset f9a09b40bc17 by Victor Stinner in branch 'default':
Issue #20311, #20452: poll and epoll now round the timeout away from zero,
http://hg.python.org/cpython/rev/f9a09b40bc17
msg210008 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-02-02 20:24
Buildbots are happy, I close the issue.
msg210023 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-02-02 21:57
Well, now that timeouts are properly rounded, the granularity is useless.
msg210024 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-02-02 21:58
> Well, now that timeouts are properly rounded, the granularity is useless.

I don't think so. Please read again the issue #20452, for example this message:

http://bugs.python.org/issue20452#msg209772
msg210029 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-02-02 22:03
> I don't think so. Please read again the issue #20452, for example this
message:
>
> http://bugs.python.org/issue20452#msg209772

"""
Ok, it looks better: waiting 99.9 ms took 99.6 ms and 99.9 ms, and waiting
9.9 ms took 9.7 ms. So as I said, the granularity (of 1 ms) is still needed
in asyncio (dt < timeout is sometimes False, but dt+granulary >= timeout is
always True).
"""

Sorry, I still fail to see how waking up after 99.6ms instead of 99.9ms is
an issue: I've asked you several times to provide an actual example of a
problem, and you still haven't.

Once again: a slight early wakeup isn't an issue, you'll just call
epoll()/select() once again.
All even loops work this way, and noone ever complained.
msg210034 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-02-02 22:10
> Once again: a slight early wakeup isn't an issue

That's your opinion, but I disagree.

Please open a new issue with a patch, or reopen at least this issue because it is now closed.

I already spent to much time on this issue. Buildbots are now happy, I don't want to change this again.
msg210042 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-02-02 22:33
> > Once again: a slight early wakeup isn't an issue
>
> That's your opinion, but I disagree.
>
> Please open a new issue with a patch, or reopen at least this issue because it is now closed.
>
> I already spent to much time on this issue. Buildbots are now happy, I don't want to change this again.

So, to sum up:
- you write a fragile and unelegant patch without a good reason
- you commit it without review
- you're asked several times to provide an example of the problems your patch is supposed to solve, but don't give any
- you don't take into account the fact that all proven and successful event loops work just fine without a similar patch
- you insist, and won't neither back your claims by examples, nor revert your patch

Fine, keep it this way, but I'm done with this, I've lost enough time.
msg210058 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-02-03 00:19
> So, to sum up:
> - you write a fragile and unelegant patch without a good reason
> - you commit it without review
> - you're asked several times to provide an example of the problems your patch is supposed to solve, but don't give any
> - you don't take into account the fact that all proven and successful event loops work just fine without a similar patch
> - you insist, and won't neither back your claims by examples, nor revert your patch

My goal is to ensure that each call to asyncio.BaseEventLoop._run_once() executes at least an handle. It is now described as a unit test in Lib/test/test_asyncio/test_events.py: test_timeout_rounding(). There is one exception: I don't care if the selector was interrupted by a signal. (As I wrote, it's not interesting to loop on the selector, it's fine to re-execute the whole _run_once() method.)

If I understood correctly your opinion, you don't want to reach this exact goal, and you would prefer to keep the code simpler. So you don't care if sometimes _run_once() exit without executing any task. Did I understand correctly your opinion?

But I don't understand if you would prefer to revert all my changes, or keep the rounding away from zero in selectors?
msg210060 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-02-03 00:30
neologix, you are getting dangerously close to attacking the person instead of the issue. Can you live with the current state of the code? If so, let us know (or be silent if you prefer). If you cannot live with it, please show example code that fails or is inefficient with the current version.
msg210865 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-10 21:04
New changeset d393df09e139 by Georg Brandl in branch '3.3':
#20311: revert changes to 3.3 branch for now until experts have decided how to resolve the issue.
http://hg.python.org/cpython/rev/d393df09e139
History
Date User Action Args
2014-02-10 21:04:38python-devsetmessages: + msg210865
2014-02-03 00:30:15gvanrossumsetmessages: + msg210060
2014-02-03 00:19:06hayposetmessages: + msg210058
2014-02-02 22:33:23neologixsetnosy: gvanrossum, georg.brandl, pitrou, haypo, neologix, python-dev, eric.snow
messages: + msg210042
2014-02-02 22:10:15hayposetmessages: + msg210034
2014-02-02 22:03:52neologixsetmessages: + msg210029
2014-02-02 21:58:48hayposetmessages: + msg210024
2014-02-02 21:57:10neologixsetmessages: + msg210023
2014-02-02 20:24:02hayposetstatus: open -> closed
resolution: fixed
messages: + msg210008
2014-02-02 12:53:45serhiy.storchakasetnosy: - serhiy.storchaka
2014-01-31 12:04:59python-devsetmessages: + msg209774
2014-01-26 18:22:30gvanrossumsetmessages: + msg209340
2014-01-26 07:50:48neologixsetmessages: + msg209286
2014-01-25 22:57:48hayposetmessages: + msg209253
2014-01-25 22:56:04gvanrossumsetmessages: + msg209252
2014-01-25 22:53:41python-devsetmessages: + msg209251
2014-01-25 22:52:20hayposetmessages: + msg209250
2014-01-25 22:49:47eric.snowsetmessages: + msg209249
2014-01-25 22:48:19eric.snowsetmessages: + msg209248
2014-01-25 22:45:58eric.snowsetnosy: + eric.snow
messages: + msg209247
2014-01-25 22:11:54neologixsetmessages: + msg209245
2014-01-25 21:24:17hayposetmessages: + msg209238
2014-01-25 16:04:04neologixsetmessages: + msg209213
2014-01-25 14:44:27hayposetmessages: + msg209201
2014-01-25 14:02:55python-devsetmessages: + msg209199
2014-01-25 13:44:31python-devsetmessages: + msg209198
2014-01-25 13:40:46python-devsetmessages: + msg209197
2014-01-25 08:10:34georg.brandlsetnosy: + georg.brandl
messages: + msg209164
2014-01-24 10:04:49hayposetfiles: + asyncio_granularity.patch

messages: + msg209056
2014-01-24 07:07:22gvanrossumsetmessages: + msg209040
2014-01-24 05:36:57neologixsetmessages: + msg209039
2014-01-23 23:27:14pitrousetmessages: + msg209021
2014-01-23 23:25:57hayposetmessages: + msg209019
2014-01-23 23:08:13pitrousetmessages: + msg209016
2014-01-23 23:04:38hayposetmessages: + msg209015
2014-01-23 22:21:08neologixsetmessages: + msg209008
2014-01-23 22:01:41serhiy.storchakasetmessages: + msg209006
2014-01-23 22:00:21pitrousetmessages: + msg209005
2014-01-23 21:39:53hayposetmessages: + msg209001
2014-01-23 18:42:47neologixsetmessages: + msg208976
2014-01-23 14:07:36python-devsetmessages: + msg208928
2014-01-23 14:06:47hayposetmessages: + msg208927
2014-01-21 20:01:05python-devsetmessages: + msg208696
2014-01-21 01:38:50python-devsetmessages: + msg208602
2014-01-21 01:28:06hayposetmessages: + msg208601
2014-01-21 01:16:57hayposetmessages: + msg208599
2014-01-21 00:51:06hayposetmessages: + msg208597
2014-01-21 00:49:19python-devsetmessages: + msg208596
2014-01-21 00:43:09python-devsetnosy: + python-dev
messages: + msg208595
2014-01-20 23:14:13pitrousetmessages: + msg208588
2014-01-20 14:08:26hayposetfiles: + select_round_timeout.patch

messages: + msg208540
title: epoll.poll(timeout) must round the timeout to the upper bound -> epoll.poll(timeout) and PollSelector.select(timeout) must round the timeout to the upper bound
2014-01-20 13:11:35neologixsetmessages: + msg208539
2014-01-20 10:58:55pitrousetmessages: + msg208534
2014-01-20 10:54:17hayposetmessages: + msg208533
2014-01-20 10:33:28haypocreate