Issue20311
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2014-01-20 10:33 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
epoll_timeout.patch | vstinner, 2014-01-20 10:33 | review | ||
select_round_timeout.patch | vstinner, 2014-01-20 14:08 | review | ||
asyncio_granularity.patch | vstinner, 2014-01-24 10:04 | review |
Messages (54) | |||
---|---|---|---|
msg208532 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-01-20 10:33 | |
Hi, while comparing performances of Tulip and Trollius projects (asyncio module), I found a bug. The event loop is called too many times when a callback is scheduled with a timeout and the epoll selector is used, especially for timeout close to one millisecond. In my opinion, it's a bug in epoll.poll(): it should wait *at least* timeout seconds if no event is received, not shorter. Pseudo-code of Tulip: --- timeout = scheduled[0].when - time.monotonic() events = selector.select(timeout) process_events(events) while scheduled: if scheduled[0].when > time.monotonic(): break ... --- The problem is that "scheduled[0].when > time.monotonic()" test fails because epoll.poll() returns in less than timeout seconds. The difference between the timeout and elapsed time is very small, less than 1 millisecond. Attached patch fixes this issue by rounding the timeout to the upper bound. The rounding is different than datetime.timedelta constructor for example, because the usecase is different. epoll.poll(0.0001) calls epoll_wait() with a timemout of 1 millisecond (1e-4 rounded to 1e-3). epoll.poll(0.0) still calls epoll_wait() with a timeout of 0 millisecond. I don't think that it's possible to write a reliable unit test for that on our slow buildbots. |
|||
msg208533 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-01-20 10:54 | |
Bug report in Tulip project, the bug affects also the new asyncio module of Python 3.4: http://code.google.com/p/tulip/issues/detail?id=106 |
|||
msg208534 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2014-01-20 10:58 | |
Looks good to me on the principle. |
|||
msg208539 - (view) | Author: Charles-François Natali (neologix) * | Date: 2014-01-20 13:11 | |
AFAICT, this also affects poll(). Although it's supposed to be passed an integer, passing a float will result in a truncation towards 0: """ $ strace -e poll python -c "import select; p = select.poll(); p.poll(0.9)" poll(0x23321b0, 0, 0) = 0 (Timeout) """ See also this line in PollSelector: """ def select(self, timeout=None): timeout = None if timeout is None else max(int(1000 * timeout), 0) """ This will round timeout=1e-4 to 0. |
|||
msg208540 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-01-20 14:08 | |
select.select(), selectors.PollSelector.select(), select.epoll.poll(), select.kqueue.control() have the bug. OS multiplexers: - select: microsecond resolution (10^-6), timeout converted by _PyTime_ObjectToTimeval() which converts 1e-7 to zero => BUG! - poll: millisecond resolution (10^-3) but expects int type, select is ok, selectors uses int(1000 * timeout) which converts 1e-4 to zero => BUG! - epoll: millisecond resolution (10^-3), select uses (int)(dtimeout * 1000.0) which converts 1e-4 to zero => BUG! - devpoll: millisecond resolution (10^-3) but expects int type, select is ok, selectors doesn't support it yet => ok - kqueue: nanosecond resolution (10^-9), timeout converted by _PyTime_ObjectToTimespec() which converts 1e-10 to zero => BUG! _PyTime_ObjectToTimeval() is used in various places: - datetime.datetime.fromtimestamp(), datetime.datetime.utcfromtimestamp() - select.select(), _PyTime_ObjectToTimespec() is used in various places: - posix.utime() - signal.sigtimedwait() - select.kqueue.control() - time.clock_settime() A new parameter should be added to _PyTime_ObjectToTimeval() and _PyTime_ObjectToTimespec() to choose the rounding method. -- I saw the bug with poll and epoll because these functions have the worst resolution (millisecond). Attached patch fixes poll and epoll, and add a generic test on all selectors: select_round_timeout.patch. According to my test, select.select() has an effective resolution of 0.1 ms (10^4) on my Fedora 19 (Linux kernel 3.12), so I don't think that the select rounding issue is important. Tell me if you consider that _PyTime_ObjectToTime*() (select, kqueue) should be fixed. If yes, I will open a separated issue (because the implementation is different and it may impact other unrelated functions). |
|||
msg208588 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2014-01-20 23:14 | |
> A new parameter should be added to _PyTime_ObjectToTimeval() and > _PyTime_ObjectToTimespec() to choose the rounding method. That doesn't sound necessary. Just fix the select and selectors module so that the final (OS-level) timeout is always > 0 when the user timeout is > 0. |
|||
msg208595 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-01-21 00:43 | |
New changeset 033137c12d88 by Victor Stinner in branch '3.3': Issue #20311: select.epoll.poll() now rounds the timeout away from zero, http://hg.python.org/cpython/rev/033137c12d88 New changeset 02f9db3e684e by Victor Stinner in branch 'default': (Merge 3.3) Issue #20311: select.epoll.poll() now rounds the timeout away from http://hg.python.org/cpython/rev/02f9db3e684e |
|||
msg208596 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-01-21 00:49 | |
New changeset e042ea77a152 by Victor Stinner in branch 'default': Issue #20311: selector.PollSelector.select() now rounds the timeout away from http://hg.python.org/cpython/rev/e042ea77a152 |
|||
msg208597 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-01-21 00:51 | |
Rounding issues with poll() and epoll() have been fixed in Python 3.3 and 3.4. I'm now waiting for buildbots before closing the issue. I created a new issue #20320 to address the rounding issue with select() and kqueue (and signal.sigtimedwait). |
|||
msg208599 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-01-21 01:16 | |
The epoll test failed on a buildbot. The test uses a short delay (10 milliseconds) and compare two different clocks: time.perf_counter() and select.epoll.poll(). I didn't expect a failure very the greatest delay, the test uses even shorter delay (the shortest is 0.1 ms). The best would be to trace syscalls to check if epoll_wait() is called with a non-zero timeout, but tracing syscalls is overkill and too low level for such test. It's maybe better to drop this unstable test? Unstable on our slow buildbots. http://buildbot.python.org/all/builders/x86%20Gentoo%20Non-Debug%203.3/builds/1392/steps/test/logs/stdio ====================================================================== FAIL: test_timeout_rounding (test.test_epoll.TestEPoll) ---------------------------------------------------------------------- Traceback (most recent call last): File "/var/lib/buildslave/3.3.murray-gentoo-wide/build/Lib/test/test_epoll.py", line 58, in test_timeout_rounding self.assertGreaterEqual(dt, timeout) AssertionError: 0.008995789801701903 not greater than or equal to 0.01 |
|||
msg208601 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-01-21 01:28 | |
Another failure: http://buildbot.python.org/all/builders/x86%20Ubuntu%20Shared%203.3/builds/1360/steps/test/logs/stdio ====================================================================== FAIL: test_timeout_rounding (test.test_epoll.TestEPoll) ---------------------------------------------------------------------- Traceback (most recent call last): File "/srv/buildbot/buildarea/3.3.bolen-ubuntu/build/Lib/test/test_epoll.py", line 58, in test_timeout_rounding self.assertGreaterEqual(dt, timeout) AssertionError: 0.009388947859406471 not greater than or equal to 0.01 |
|||
msg208602 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-01-21 01:38 | |
New changeset e1619465c49d by Victor Stinner in branch '3.3': Issue #20311: Try to fix the unit test, use time.monotonic() instead of http://hg.python.org/cpython/rev/e1619465c49d New changeset 26c54e9a933b by Victor Stinner in branch 'default': (Merge 3.3) Issue #20311: Try to fix the unit test, use time.monotonic() http://hg.python.org/cpython/rev/26c54e9a933b |
|||
msg208696 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-01-21 20:01 | |
New changeset 7ce7295393c2 by Victor Stinner in branch 'default': Issue #20311: EpollSelector now also rounds the timeout towards zero, as http://hg.python.org/cpython/rev/7ce7295393c2 |
|||
msg208927 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-01-23 14:06 | |
http://buildbot.python.org/all/builders/x86%20Ubuntu%20Shared%203.x/builds/9552 ====================================================================== FAIL: test_timeout_rounding (test.test_epoll.TestEPoll) ---------------------------------------------------------------------- Traceback (most recent call last): File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_epoll.py", line 59, in test_timeout_rounding self.assertGreaterEqual(dt, timeout) AssertionError: 0.009316698648035526 not greater than or equal to 0.01 |
|||
msg208928 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-01-23 14:07 | |
New changeset 3637ccf9516a by Victor Stinner in branch 'default': Issue #20311: add debug help in test_selectors http://hg.python.org/cpython/rev/3637ccf9516a |
|||
msg208976 - (view) | Author: Charles-François Natali (neologix) * | Date: 2014-01-23 18:42 | |
====================================================================== > FAIL: test_timeout_rounding (test.test_epoll.TestEPoll) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_epoll.py", line 59, in test_timeout_rounding > self.assertGreaterEqual(dt, timeout) > AssertionError: 0.009316698648035526 not greater than or equal to 0.01 Those failures are "expected", nothing guarantees that the syscall will take at least the amount of time specified. |
|||
msg209001 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-01-23 21:39 | |
> Those failures are "expected", nothing guarantees that the syscall > will take at least the amount of time specified. Ah? The manual page of epoll_wait() says: "The timeout argument specifies the minimum number of milliseconds that epoll_wait() will block. (This interval will be rounded up to the system clock granularity, and kernel scheduling delays mean that the blocking interval may overrun by a small amount.)" I read minimum, not maximum here :-) If epoll_wait(timeout_ms) may wait less than timeout_ms seconds, asyncio algorithm is wrong, or at least inefficient. It should loop until the time delta is at least total_timeout seconds. See the original issue: http://code.google.com/p/tulip/issues/detail?id=106 |
|||
msg209005 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2014-01-23 22:00 | |
> "The timeout argument specifies the minimum number of milliseconds > that epoll_wait() will block. (This interval will be rounded up to > the system clock granularity, and kernel scheduling delays mean that > the blocking interval may overrun by a small amount.)" > > I read minimum, not maximum here :-) Aren't you using different clocks, though? The excerpt above says "system clock" (which admittedly sounds a bit vague :-)). |
|||
msg209006 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2014-01-23 22:01 | |
Isn't any blocked syscall with a timeout can be interrupted by a signal? epoll_wait() can return EINTR. |
|||
msg209008 - (view) | Author: Charles-François Natali (neologix) * | Date: 2014-01-23 22:21 | |
> Ah? The manual page of epoll_wait() says: > > "The timeout argument specifies the minimum number of milliseconds that epoll_wait() will block. (This interval will be rounded up to the system clock granularity, and kernel scheduling delays mean that the blocking interval may overrun by a small amount.)" > > I read minimum, not maximum here :-) Yes, but we're talking about a 1e-4 accuracy here: I really doubt all hardware supports high-resolution timers. epoll() returning 1e-4s before the passed delay doesn't surprise me. > If epoll_wait(timeout_ms) may wait less than timeout_ms seconds, asyncio algorithm is wrong, or at least inefficient. It should loop until the time delta is at least total_timeout seconds. See the original issue: > http://code.google.com/p/tulip/issues/detail?id=106 Not really: sure, an early wakeup can cause spurious loops, but this should be really rare: how often is the main event loop called with sub-millisecond timeouts? |
|||
msg209015 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-01-23 23:04 | |
Le 23 janv. 2014 23:21, e me. > > If epoll_wait(timeout_ms) may wait less than timeout_ms seconds, asyncio > algorithm is wrong, or at least inefficient. It should loop until the time > delta is at least total_timeout seconds. See the original issue: > > http://code.google.com/p/tulip/issues/detail?id=106 > > Not really: sure, an early wakeup can cause spurious loops, but this should > be really rare: how often is the main event loop called with > sub-millisecond timeouts? Did you read my Tulip? Maybe I didn't explain correctly. Usually, the shorter delay in an application is 10 ms. The problem is that because of the rounding issue (which should now be fixed), the event loop computed a delay smaller than 1 ms. And this delay was rounded to 0 ms by epoll.poll(). Then the deadline was still not reached, the loop was called again... The number of call depends on the performance of your cpu. The high load should be short, probably less than 5 ms, but it occured regulary and was 2totally inefficient. Now asyncio should no more call epoll_wait() in non blocking mode (timeout of 0 ms) if a task is scheduled. So the major inefficient bug is fixed. But it would be nice to ensure that asyncio doesn't loop if nothing happens whereas the deadline is known. For example, it may call again epoll_wait() if it took less than timeout seconds and returned no event, *and* the ready list is empty. For the selector unit test: why does it fail whereas test_epoll has the same test and test_epoll doesn't fail? I fixed the rounding issue in selectmodule.c, but later I made a different fix in selectors which should only be needed for unpatched selectmodule.c. I did that to keep Tulip and Python code identical (including selectors, as asked by Guido). Maybe my rounding formula in selectors.py is wrong? 1e-3 number isn't IEEE 754 friendly, it cannot be stored exactly in binary (in a C double of 64 bits). Maybe the code should be skipped depending on the python version? |
|||
msg209016 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2014-01-23 23:08 | |
> For example, it may call again epoll_wait() if it took less than timeout > seconds and returned no event, *and* the ready list is empty. Easy solution: add 1 ms. to the timeout before calling epoll_wait(). Perhaps we need the same kind of thing for select() and poll() too. |
|||
msg209019 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-01-23 23:25 | |
Le 24 janv. 2014 00:08, "Antoine Pitrou" <report@bugs.python.org> a écrit : > > For example, it may call again epoll_wait() if it took less than timeout > > seconds and returned no event, *and* the ready list is empty. > > Easy solution: add 1 ms. to the timeout before calling epoll_wait(). > Perhaps we need the same kind of thing for select() and poll() too. It doesn't fix the case when EpollSelector.select() got an InterruptedError. Adding 1 ms works around the (now fixed) timeout rounding issue but I prefered to round differently to not sleep longer than requested (try to respect deadline). About the "system clock", it's possibke than Linux uses a different clock for epoll_wait() than CLOCK_MONOTONIC, and than clocks have a different resolution. I didn't test the rounding issue on Windows where time.monotonic() has a resolution between... 1 and 16 ms! |
|||
msg209021 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2014-01-23 23:27 | |
> It doesn't fix the case when EpollSelector.select() got an InterruptedError. That should be very rare. I don't see a problem with retrying on EINTR. > Adding 1 ms works around the (now fixed) timeout rounding issue but I > prefered to round differently to not sleep longer than requested (try to > respect deadline). What are you going to round exactly? The epoll timeout? The deadline? |
|||
msg209039 - (view) | Author: Charles-François Natali (neologix) * | Date: 2014-01-24 05:36 | |
> Did you read my Tulip? Maybe I didn't explain correctly. No, it was quite clear, and I think I understand the original issue: calling epoll_wait(0) when passed a timeout of 0.9ms was bad, because it led to busy-loop during 0.9ms. But here, that's another story: once in a blue moon, depending on the platform, epoll_wait(1ms) might return after e.g. 0.931ms as above. In this case, we'll just call epoll once more with another extra 1ms timeout: this doesn't yield excessive CPU usage, only a slight latency, but this extra latency is completely normal, since we have to round the delay somehow (e.g. if someone passes 0.001ms it will be rounded to 1ms). Someone needing high-resolution wakeups shouldn't rely on epoll/select timeouts, but use timerfd_create or something similar. Furthermore, in practice this shouldn't occur often: if someone passes a 1.001 ms timeout, we'll round it to 2ms, so we won't have the problem. If someone passes a 0.9ms timeout, it will be rounded to 1ms, so even if epoll wakes up a little before 1ms, let's say 0.931ms as in the above buildbot failure, it will still be greater than 0.9ms, so no problem either. So IMO it's not a problem, and even if it is, there's no reason to complicate the code. In your test, just restrict the rounding check for delays which are strictly less than the underlying syscall resolutuion, e.g. for epoll 1-6, 1-5, 1-4. That's enough to check the correct rounding. > Maybe my rounding formula in selectors.py is wrong? 1e-3 number isn't IEEE > 754 friendly, it cannot be stored exactly in binary (in a C double of 64 > bits). Regardless of IEEE754 representation, in the end, epoll_wait() is passed an integer (a C long). So it's either passed 0, 1 or 2. Given the 0.009316698648035526 value, the only integer that was possibly passed is 1 (ms), so I don't think rounding is the issue. |
|||
msg209040 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2014-01-24 07:07 | |
I've lost some context, but perhaps we should have the notion of "granularity" of the poll/select timeout (e.g. 1 msec), and consider events that are in the future by less than that granularity as ready? Then we can round the timeout down: if someone passes a timeout of 1.1 msec, poll would wait for approximately 1 msec, and when it returns the event would be considered "due now" as long as the balance (about 0.1 msec) was under 1 msec. On Thu, Jan 23, 2014 at 9:36 PM, Charles-François Natali <report@bugs.python.org> wrote: > > Charles-François Natali added the comment: > >> Did you read my Tulip? Maybe I didn't explain correctly. > > No, it was quite clear, and I think I understand the original issue: > calling epoll_wait(0) when passed a timeout of 0.9ms was bad, because it > led to busy-loop during 0.9ms. > > But here, that's another story: once in a blue moon, depending on the > platform, epoll_wait(1ms) might return after e.g. 0.931ms as above. > In this case, we'll just call epoll once more with another extra 1ms > timeout: this doesn't yield excessive CPU usage, only a slight latency, but > this extra latency is completely normal, since we have to round the delay > somehow (e.g. if someone passes 0.001ms it will be rounded to 1ms). > > Someone needing high-resolution wakeups shouldn't rely on epoll/select > timeouts, but use timerfd_create or something similar. > > Furthermore, in practice this shouldn't occur often: if someone passes a > 1.001 ms timeout, we'll round it to 2ms, so we won't have the problem. > If someone passes a 0.9ms timeout, it will be rounded to 1ms, so even if > epoll wakes up a little before 1ms, let's say 0.931ms as in the above > buildbot failure, it will still be greater than 0.9ms, so no problem either. > > So IMO it's not a problem, and even if it is, there's no reason to > complicate the code. > > In your test, just restrict the rounding check for delays which are > strictly less than the underlying syscall resolutuion, e.g. for epoll 1-6, > 1-5, 1-4. That's enough to check the correct rounding. > >> Maybe my rounding formula in selectors.py is wrong? 1e-3 number isn't IEEE >> 754 friendly, it cannot be stored exactly in binary (in a C double of 64 >> bits). > > Regardless of IEEE754 representation, in the end, epoll_wait() is passed an > integer (a C long). > So it's either passed 0, 1 or 2. Given the 0.009316698648035526 value, the > only integer that was possibly passed is 1 (ms), so I don't think rounding > is the issue. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue20311> > _______________________________________ |
|||
msg209056 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-01-24 10:04 | |
Guido van Rossum added the comment: > I've lost some context, but perhaps we should have the notion of > "granularity" of the poll/select timeout (e.g. 1 msec), and consider > events that are in the future by less than that granularity as ready? > Then we can round the timeout down: if someone passes a timeout of 1.1 > msec, poll would wait for approximately 1 msec, and when it returns > the event would be considered "due now" as long as the balance (about > 0.1 msec) was under 1 msec. Well, it sounds like a good compromise according to my experimentation in my last commits on epoll and selectors. New attached asyncio_granularity.patch: - Restore the previous rounding method for poll and epoll (in select and selectors modules), remove unit tests which were designed to test the rounding the timeout away from zero - Add a new resolution attribute to selectors.BaseSelector (select: 1e-6, poll and epoll: 1e-3, kqueue: 1e-9) - Add a new granularity attribute to asyncio.BaseEventLoop: maximum of time resolution and selector resolution - Add a unit test for asyncio counting the number of calls to _run_once(), independent of the selector and the platform - BaseEventLoop._run_once() rounds the deadline and current time using granularity I tested the patch on Linux, FreeBSD and Windows (select, selectors and asyncio tests pass). Granularity: - Linux. poll, epoll: 1 ms, select: 1 us (resolution of time.monotonic: 1 ns) - FreeBSD. poll: 1 ms, select: 1 us, kqueue: 11 ns (resolution of time.monotonic: 11 ns) - Windows (tested on my Windows 7 VM): select, proactor: 15.6 ms (resolution of time.monotonic(): 15.6 ms) BaseProactorEventLoop doesn't use the resolution of IocpProactor because I don't know the resolution :-) Anyway, time.monotonic() has a low resolution (15.6 ms) and so it should be fine. If the patch is accepted, my changes on Python 3.3 should also be reverted. |
|||
msg209164 - (view) | Author: Georg Brandl (georg.brandl) * | Date: 2014-01-25 08:10 | |
Since the test is still failing on at least 3 stable buildbots, I've reverted the 3.3 changes for 3.3.4rc1. |
|||
msg209197 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-01-25 13:40 | |
New changeset 900a1ff323bb by Victor Stinner in branch '3.3': Issue #20311: Revert 033137c12d88, select.epoll.poll() rounds again the timeout http://hg.python.org/cpython/rev/900a1ff323bb New changeset caab3e191485 by Victor Stinner in branch 'default': (Merge 3.3) Issue #20311: Revert 033137c12d88 (02f9db3e684e), http://hg.python.org/cpython/rev/caab3e191485 |
|||
msg209198 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-01-25 13:44 | |
New changeset 90354a4c9dde by Victor Stinner in branch 'default': Issue #20311: Revert e042ea77a152 and 7ce7295393c2, PollSelector.select() and http://hg.python.org/cpython/rev/90354a4c9dde |
|||
msg209199 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-01-25 14:02 | |
New changeset 3b8a2281d323 by Victor Stinner in branch 'default': Issue #20311: selectors: Add a resolution attribute to BaseSelector. http://hg.python.org/cpython/rev/3b8a2281d323 New changeset 4bc550c66228 by Victor Stinner in branch 'default': Issue #20311: asyncio: Add a granularity attribute to BaseEventLoop: maximum http://hg.python.org/cpython/rev/4bc550c66228 |
|||
msg209201 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-01-25 14:44 | |
I revert all changes in select an selectors, the timeout is rounded again towards zero, as it was before. I applied my asyncio_granularity.patch: - selectors.BaseSelector has a new abstract "resolution" property - asyncio.BaseEventLoop has a new granularity attribute which is used in _run_once() to round timings I don't like hurry to push changes without review, but I wanted to revert my changes on select and selectors and fix the issue in asyncio issue before Python 3.4 beta 3. I'm now waiting for buildbots. Don't hesitate to review my changes on selectors (new resolution attribute) and asyncio (new granularity attribute). For selectors, I chose an abstract attribute to ensure that the resolution is defined in subclasses of BaseSelector and because it does not make sense to modify the resolution. For asyncio, it's a classic attribute because a subclass may want to adjust the granularity if it overrides the time() method, or for other reasons. |
|||
msg209213 - (view) | Author: Charles-François Natali (neologix) * | Date: 2014-01-25 16:04 | |
> If the patch is accepted, my changes on Python 3.3 should also be reverted. I'm sorry, but I'm not convinced. The selector's granularity is an implementation detail, and I don't think it should be exposed. Furthermore, it's not a mere function of the C type used to represent the timeout to the underlying syscall (long, timeval): it also depends on the operating system, the hardware, etc. Once again, what's wrong with your initial approach of ceiling the timeout? It does seem reasonable to round away from zero, to ensure - not taking OS bugs/limitations - that select() will wait at least the passed timeout. |
|||
msg209238 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-01-25 21:24 | |
Hi, 2014-01-25 Charles-François Natali <report@bugs.python.org>: > I'm sorry, but I'm not convinced. > The selector's granularity is an implementation detail, and I don't think > it should be exposed. I disagre, it's not a detail because it causes bugs, knowing the resolution matters. > Furthermore, it's not a mere function of the C type used to represent the > timeout to the underlying syscall (long, timeval): it also depends on the > operating system, the hardware, etc. I exposed the resolution of the underlying C structure, I know that it doesn't guarantee anything. But we did the same thing for clocks with time.get_clock_info(name).resolution. At least, if the C structure has a resolution of 1 ms, don't expect to have a resolution better than 1 ms in practice. It's better than not knowing it. The OS limitations can be documented. > Once again, what's wrong with your initial approach of ceiling the timeout? It looks like changing the rounding method doesn't solve anything. selector.select(timeout) may still take less than timeout, so it doesn't give any guarantee. IMO adding a granularity to asyncio is a better approach. asyncio can for example adjusts its granularity at runtime if it see that the announced selector resolution is wrong. And the granularity also uses the clock resolution. Victor |
|||
msg209245 - (view) | Author: Charles-François Natali (neologix) * | Date: 2014-01-25 22:11 | |
> > Once again, what's wrong with your initial approach of ceiling the timeout? > > It looks like changing the rounding method doesn't solve anything. > selector.select(timeout) may still take less than timeout, so it > doesn't give any guarantee. But what problem does it cause if, once in a while, the call takes less than the passed timeout? If that's the case, you'll simply perform another loop, an wake up 1ms later, that's all. There's a lot of call written this way, and this has never been a problem: so far, you still didn't give an example of problematic behavior. |
|||
msg209247 - (view) | Author: Eric Snow (eric.snow) * | Date: 2014-01-25 22:45 | |
Looks like 3b8a2281d323aa9abf497192b01cf906b98ed3d8 broke the buildbots. e.g. http://buildbot.python.org/all/builders/AMD64%20FreeBSD%2010.0%203.x/builds/1533 |
|||
msg209248 - (view) | Author: Eric Snow (eric.snow) * | Date: 2014-01-25 22:48 | |
FYI: on my local box I saw only the 2 failed tests in test_telnetlib. |
|||
msg209249 - (view) | Author: Eric Snow (eric.snow) * | Date: 2014-01-25 22:49 | |
A better example: http://buildbot.python.org/all/builders/AMD64%20FreeBSD%2010.0%203.x/builds/1538 |
|||
msg209250 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-01-25 22:52 | |
2014-01-25 Charles-François Natali <report@bugs.python.org>: >> It looks like changing the rounding method doesn't solve anything. >> selector.select(timeout) may still take less than timeout, so it >> doesn't give any guarantee. > > But what problem does it cause if, once in a while, the call takes less > than the passed timeout? > If that's the case, you'll simply perform another loop, an wake up 1ms > later, that's all. "perform another loop" is my problem. If possible, I would like useless calls to the loop. With my latest patch to asyncio, there is no more useless calls to the loop (at least, not because of a time rounding issue). Oh, I forgot to mention that I'm also concerned by the backward compatibility. Changing how poll and epoll round the timeout changes the behaviour. It may break applications or maybe change performances of applications (efficiency or reactivity). Changing the rounding method in Python 3.3 was not a good idea. @Charles-François: I'm not sure that we are talking about the same thing. If you still want to change poll and epoll, go ahead. Since the performance issue has been fixed directly in asyncio, I don't feel the need of changing the poll or epoll anymore in Python 3.4. Or maybe you disagree with my change in the asyncio module? |
|||
msg209251 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-01-25 22:53 | |
New changeset f453443c96e4 by Victor Stinner in branch 'default': Issue #20311: Fix test_telnetlib, set the resolution of the MockSelector http://hg.python.org/cpython/rev/f453443c96e4 |
|||
msg209252 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2014-01-25 22:56 | |
I don't have the energy to read all the debate here, but over on python-tulip we continue to discuss this. Victor and I have currently agreed to drop the math.ceil() calls from the event loop and instead simply consider any event scheduled within "granularity" from the current clock value to be "ready". Victor will check it in. Regarding the buildbot failures, Victor will have to review those. They are in the telnet tests: ====================================================================== ERROR: test_read_until (test.test_telnetlib.ReadTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/test/test_telnetlib.py", line 179, in test_read_until data = telnet.read_until(b'match') File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/telnetlib.py", line 312, in read_until with _TelnetSelector() as selector: TypeError: Can't instantiate abstract class MockSelector with abstract methods resolution ====================================================================== ERROR: test_expect (test.test_telnetlib.ExpectTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/test/test_telnetlib.py", line 388, in test_expect (_,_,data) = telnet.expect([b'match']) File "/usr/home/buildbot/koobs-freebsd10/3.x.koobs-freebsd10/build/Lib/telnetlib.py", line 617, in expect with _TelnetSelector() as selector: TypeError: Can't instantiate abstract class MockSelector with abstract methods resolution ---------------------------------------------------------------------- |
|||
msg209253 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-01-25 22:57 | |
> Looks like 3b8a2281d323aa9abf497192b01cf9 06b98ed3d8 broke the buildbots. Oh, I didn't watch buildbots, sorry. It should be fixed by f453443c96e4. My first attempt used an attribute using a default value of None, but it was harder to use: I had to check if the value is None or not. I prefer to enforce an implementation to define the resolution. Or do you know a selector which has no resolution? |
|||
msg209286 - (view) | Author: Charles-François Natali (neologix) * | Date: 2014-01-26 07:50 | |
> > But what problem does it cause if, once in a while, the call takes less > > than the passed timeout? > > If that's the case, you'll simply perform another loop, an wake up 1ms > > later, that's all. > > "perform another loop" is my problem. If possible, I would like > useless calls to the loop. With my latest patch to asyncio, there is > no more useless calls to the loop (at least, not because of a time > rounding issue). Still, you fail to provide an example: you will only perform another call if you pass a tilmeout really close to an exact multiple of the granularity, *and* the syscall returns early. Just perform a test on your machine, you'll see that early wakeups are *really* rare. And even if that's the case, I argue that performing an extra loop isn't a problem: for example, noone ever complained about the way timpeouts are rounded in the select module. If you think otherwise, please back your claim by an example. > Oh, I forgot to mention that I'm also concerned by the backward > compatibility. Changing how poll and epoll round the timeout changes > the behaviour. It may break applications or maybe change performances > of applications (efficiency or reactivity). Changing the rounding > method in Python 3.3 was not a good idea. Don't apply it to 3.3 then. > @Charles-François: I'm not sure that we are talking about the same > thing. If you still want to change poll and epoll, go ahead. Since the > performance issue has been fixed directly in asyncio, I don't feel the > need of changing the poll or epoll anymore in Python 3.4. Or maybe you > disagree with my change in the asyncio module? I disagree with the introduction of a granularity to selectors. If you really want to keep the change in asyncio, just introduce a fixed "slack" value, e.g. 1e-3 seconds. |
|||
msg209340 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2014-01-26 18:22 | |
AFAICT the changes to selectmodule.c have been rolled back, both in the 3.3 and the 3.4 (default) branches. That takes care of backwards compatibility. :-) So what we're talking about is whether (a) the selectors module (new in 3.3) should export a resolution property and what asyncio should do. The selectors module in the repo does export a resolution property. It is 1 usec for select, 1msec for poll and epoll, and 1 nsec for kqueue. AFAIK these values are derived from the data type and scale used by the corresponding syscall. The latest code in the asyncio module uses that resolution for exactly one purpose: if, after waking up from a selector.select() call, any events we are waiting for are less than the resolution in the future, we consider them ready. I like this approach -- it trusts the syscall to do approximately the right thing most of the time, but it doesn't fall into the busy-wait trap. This trap is especially egregious for [e]poll, where the timing resolution is limited to 1 msec due to the syscall interface. To explain the trap once more: suppose you have an event scheduled 0.9 msec in the future. Because of the 1 msec resolution of the [e]poll syscall API, the syscall is made with an argument of 0, which means it returns (almost) immediately. Given a fast CPU and no other IO ready, this might well take only a few usec. So the event loop does essentially nothing and then comes back to the poll call. The same event is now e.g. 0.89 msec in the future. So we call [e]poll with a timeout of 0 again, and we keep going around without ever releasing the CPU for the full 0.9 msec -- which nowadays is serious time in which a lot of other, more useful things could be accomplished by other threads or processes. I think what's currently in the repo is all a fine solution to all that. Because the resolution is specified on a per-selector-class basis, we compensate appropriately for [e]poll but don't apply such a coarse granularity to the other syscalls, which are (potentially) more accurate. If in the future we find that on some systems the kernel systematically rounds the timeout down using some coarser clock resolution we can add a system-specific compensation to the selectors module, and asyncio itself doesn't have to worry about it. (Though I personally don't expect we'll find this behavior.) One final comment: IIUC the earlier fix involved using ceil() and other cleverness was essentially trying to predict *when* the next tick would wake us up, assuming that if the resolution is R, the kernel would wake us up at times that are integral multiples of R since the clock epoch. I don't think we can assume this at all, so I'm glad Victor rolled that back. :-) |
|||
msg209774 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-01-31 12:04 | |
New changeset f9a09b40bc17 by Victor Stinner in branch 'default': Issue #20311, #20452: poll and epoll now round the timeout away from zero, http://hg.python.org/cpython/rev/f9a09b40bc17 |
|||
msg210008 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-02 20:24 | |
Buildbots are happy, I close the issue. |
|||
msg210023 - (view) | Author: Charles-François Natali (neologix) * | Date: 2014-02-02 21:57 | |
Well, now that timeouts are properly rounded, the granularity is useless. |
|||
msg210024 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-02 21:58 | |
> Well, now that timeouts are properly rounded, the granularity is useless. I don't think so. Please read again the issue #20452, for example this message: http://bugs.python.org/issue20452#msg209772 |
|||
msg210029 - (view) | Author: Charles-François Natali (neologix) * | Date: 2014-02-02 22:03 | |
> I don't think so. Please read again the issue #20452, for example this message: > > http://bugs.python.org/issue20452#msg209772 """ Ok, it looks better: waiting 99.9 ms took 99.6 ms and 99.9 ms, and waiting 9.9 ms took 9.7 ms. So as I said, the granularity (of 1 ms) is still needed in asyncio (dt < timeout is sometimes False, but dt+granulary >= timeout is always True). """ Sorry, I still fail to see how waking up after 99.6ms instead of 99.9ms is an issue: I've asked you several times to provide an actual example of a problem, and you still haven't. Once again: a slight early wakeup isn't an issue, you'll just call epoll()/select() once again. All even loops work this way, and noone ever complained. |
|||
msg210034 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-02 22:10 | |
> Once again: a slight early wakeup isn't an issue That's your opinion, but I disagree. Please open a new issue with a patch, or reopen at least this issue because it is now closed. I already spent to much time on this issue. Buildbots are now happy, I don't want to change this again. |
|||
msg210042 - (view) | Author: Charles-François Natali (neologix) * | Date: 2014-02-02 22:33 | |
> > Once again: a slight early wakeup isn't an issue > > That's your opinion, but I disagree. > > Please open a new issue with a patch, or reopen at least this issue because it is now closed. > > I already spent to much time on this issue. Buildbots are now happy, I don't want to change this again. So, to sum up: - you write a fragile and unelegant patch without a good reason - you commit it without review - you're asked several times to provide an example of the problems your patch is supposed to solve, but don't give any - you don't take into account the fact that all proven and successful event loops work just fine without a similar patch - you insist, and won't neither back your claims by examples, nor revert your patch Fine, keep it this way, but I'm done with this, I've lost enough time. |
|||
msg210058 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-03 00:19 | |
> So, to sum up: > - you write a fragile and unelegant patch without a good reason > - you commit it without review > - you're asked several times to provide an example of the problems your patch is supposed to solve, but don't give any > - you don't take into account the fact that all proven and successful event loops work just fine without a similar patch > - you insist, and won't neither back your claims by examples, nor revert your patch My goal is to ensure that each call to asyncio.BaseEventLoop._run_once() executes at least an handle. It is now described as a unit test in Lib/test/test_asyncio/test_events.py: test_timeout_rounding(). There is one exception: I don't care if the selector was interrupted by a signal. (As I wrote, it's not interesting to loop on the selector, it's fine to re-execute the whole _run_once() method.) If I understood correctly your opinion, you don't want to reach this exact goal, and you would prefer to keep the code simpler. So you don't care if sometimes _run_once() exit without executing any task. Did I understand correctly your opinion? But I don't understand if you would prefer to revert all my changes, or keep the rounding away from zero in selectors? |
|||
msg210060 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2014-02-03 00:30 | |
neologix, you are getting dangerously close to attacking the person instead of the issue. Can you live with the current state of the code? If so, let us know (or be silent if you prefer). If you cannot live with it, please show example code that fails or is inefficient with the current version. |
|||
msg210865 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-02-10 21:04 | |
New changeset d393df09e139 by Georg Brandl in branch '3.3': #20311: revert changes to 3.3 branch for now until experts have decided how to resolve the issue. http://hg.python.org/cpython/rev/d393df09e139 |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:57 | admin | set | github: 64510 |
2014-02-10 21:04:38 | python-dev | set | messages: + msg210865 |
2014-02-03 00:30:15 | gvanrossum | set | messages: + msg210060 |
2014-02-03 00:19:06 | vstinner | set | messages: + msg210058 |
2014-02-02 22:33:23 | neologix | set | nosy:
gvanrossum, georg.brandl, pitrou, vstinner, neologix, python-dev, eric.snow messages: + msg210042 |
2014-02-02 22:10:15 | vstinner | set | messages: + msg210034 |
2014-02-02 22:03:52 | neologix | set | messages: + msg210029 |
2014-02-02 21:58:48 | vstinner | set | messages: + msg210024 |
2014-02-02 21:57:10 | neologix | set | messages: + msg210023 |
2014-02-02 20:24:02 | vstinner | set | status: open -> closed resolution: fixed messages: + msg210008 |
2014-02-02 12:53:45 | serhiy.storchaka | set | nosy:
- serhiy.storchaka |
2014-01-31 12:04:59 | python-dev | set | messages: + msg209774 |
2014-01-26 18:22:30 | gvanrossum | set | messages: + msg209340 |
2014-01-26 07:50:48 | neologix | set | messages: + msg209286 |
2014-01-25 22:57:48 | vstinner | set | messages: + msg209253 |
2014-01-25 22:56:04 | gvanrossum | set | messages: + msg209252 |
2014-01-25 22:53:41 | python-dev | set | messages: + msg209251 |
2014-01-25 22:52:20 | vstinner | set | messages: + msg209250 |
2014-01-25 22:49:47 | eric.snow | set | messages: + msg209249 |
2014-01-25 22:48:19 | eric.snow | set | messages: + msg209248 |
2014-01-25 22:45:58 | eric.snow | set | nosy:
+ eric.snow messages: + msg209247 |
2014-01-25 22:11:54 | neologix | set | messages: + msg209245 |
2014-01-25 21:24:17 | vstinner | set | messages: + msg209238 |
2014-01-25 16:04:04 | neologix | set | messages: + msg209213 |
2014-01-25 14:44:27 | vstinner | set | messages: + msg209201 |
2014-01-25 14:02:55 | python-dev | set | messages: + msg209199 |
2014-01-25 13:44:31 | python-dev | set | messages: + msg209198 |
2014-01-25 13:40:46 | python-dev | set | messages: + msg209197 |
2014-01-25 08:10:34 | georg.brandl | set | nosy:
+ georg.brandl messages: + msg209164 |
2014-01-24 10:04:49 | vstinner | set | files:
+ asyncio_granularity.patch messages: + msg209056 |
2014-01-24 07:07:22 | gvanrossum | set | messages: + msg209040 |
2014-01-24 05:36:57 | neologix | set | messages: + msg209039 |
2014-01-23 23:27:14 | pitrou | set | messages: + msg209021 |
2014-01-23 23:25:57 | vstinner | set | messages: + msg209019 |
2014-01-23 23:08:13 | pitrou | set | messages: + msg209016 |
2014-01-23 23:04:38 | vstinner | set | messages: + msg209015 |
2014-01-23 22:21:08 | neologix | set | messages: + msg209008 |
2014-01-23 22:01:41 | serhiy.storchaka | set | messages: + msg209006 |
2014-01-23 22:00:21 | pitrou | set | messages: + msg209005 |
2014-01-23 21:39:53 | vstinner | set | messages: + msg209001 |
2014-01-23 18:42:47 | neologix | set | messages: + msg208976 |
2014-01-23 14:07:36 | python-dev | set | messages: + msg208928 |
2014-01-23 14:06:47 | vstinner | set | messages: + msg208927 |
2014-01-21 20:01:05 | python-dev | set | messages: + msg208696 |
2014-01-21 01:38:50 | python-dev | set | messages: + msg208602 |
2014-01-21 01:28:06 | vstinner | set | messages: + msg208601 |
2014-01-21 01:16:57 | vstinner | set | messages: + msg208599 |
2014-01-21 00:51:06 | vstinner | set | messages: + msg208597 |
2014-01-21 00:49:19 | python-dev | set | messages: + msg208596 |
2014-01-21 00:43:09 | python-dev | set | nosy:
+ python-dev messages: + msg208595 |
2014-01-20 23:14:13 | pitrou | set | messages: + msg208588 |
2014-01-20 14:08:26 | vstinner | set | files:
+ select_round_timeout.patch messages: + msg208540 title: epoll.poll(timeout) must round the timeout to the upper bound -> epoll.poll(timeout) and PollSelector.select(timeout) must round the timeout to the upper bound |
2014-01-20 13:11:35 | neologix | set | messages: + msg208539 |
2014-01-20 10:58:55 | pitrou | set | messages: + msg208534 |
2014-01-20 10:54:17 | vstinner | set | messages: + msg208533 |
2014-01-20 10:33:28 | vstinner | create |