classification
Title: Remove resolution from selectors and granularity from asyncio
Type: Stage:
Components: Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: gvanrossum, jcea, neologix, python-dev, vstinner
Priority: normal Keywords: patch

Created on 2014-02-03 23:34 by vstinner, last changed 2014-09-26 02:02 by jcea. This issue is now closed.

Files
File name Uploaded Description Edit
remove_granularity.patch vstinner, 2014-02-03 23:34 review
asyncio_accuracy.py vstinner, 2014-02-04 09:20
add_granularity.patch vstinner, 2014-02-04 09:21 review
clock_resolution.patch vstinner, 2014-02-10 18:22 review
clock_resolution-2.patch vstinner, 2014-02-10 22:01 review
Messages (43)
msg210170 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-03 23:34
To solve a performance issue in asyncio, I added a new resolution attribute to selectors.BaseSelector and a new _granularity attribute to asyncio.BaseEventLoop. If I understood correctly, Charles-François (author and so maintainer of the new selectors module) doesn't like the new resolution attribute because it is a lie (if I understood correctly Charles-François complain).

He's right: the kernel can use a clock with a worse resolution. For example, on Linux kernel older than 2.6.28, all selectors (select, poll, epoll) had a resolution of 1/HZ where HZ is usually 100, 250 or 1000. So the best resolution was 1 ms (10^-3), whereas the timeval structure used in select() has a resolution of 1 us (10^-6).

On Windows, there is a different issue: the clock used by time.monotonic() has a resolution of 15.6 ms, whereas the type to describe the timeouf of the proactor selector (GetQueuedCompletionStatus function) has a resolution of 1 ms. So the resolution of the clock, time.get_clock_info('monotonic').resolution, should also be used.

Attached patch removes the two new attributes to make the code simpler.

I created this issue to reopen the discussion, but I would prefer to keep the attributes. I wrote the patch to be able to compare the different options. It's just to ensure that everyone agree on the solution.

The patch is a comprise, it solves partially the asyncio performance issue. With the patch, BaseEventLoop._run_once() may not executed a task even if there was a scheduled task, just because of timing rounding issues. So I modified the existing patch to tolerate "useless calls" to _run_once().

It was hard for me to analyze and choose the best solution for these issues because on my Fedora 20 (Linux kernel 3.12), the timings look "perfect". I'm unable to reproduce the sporadic timing issues (selector sleeping 8 ms or 9 ms whereas we asked to sleep 10 ms), whereas the test failed on many different buildbots. I suppose that depending on the hardware and the Linux kernel version, the exact resolution selectors is very different.

I didn't test the patch on Windows yet.

See issues #20311 and #20452 for the background and raw numbers.

Note: the test_timeout_rounding() test of test_asyncio now pass on all buildbots: various platforms, selectors and hardwares.

Note 2: If we decide to keep the BaseSelector.resolution attribute, I hesitate to replace it with a method because we might improve the read resolution later, which may require to execute code. For example, on Linux older than 2.6.28, the resolution can be computed by retrieving the HZ constant. It requires to check the OS and kernel version.
msg210186 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-04 09:20
Here is a script to measure the accuracy of asyncio: min/max difference between the scheduled time and the real elapsed time. It's not directly related to the attached patch, but it can help if you want to implement a different option like 

Results on my Fedora 20 (Linux kernel 3.12) with Python compiled in debug mode.

Results with the original code:
---
[ asyncio accuracy ]
call_later(0.0001 us): difference in [59.4839 us; 101.6307 us], [+59483938.5%, +101630708.8%]
call_later(0.0005 us): difference in [59.1285 us; 129.9486 us], [+11825694.4%, +25989713.7%]
call_later(0.0010 us): difference in [58.9950 us; 102.4589 us], [+5899502.5%, +10245891.9%]
call_later(0.0015 us): difference in [59.0737 us; 100.6987 us], [+3938245.9%, +6713245.6%]
call_later(0.0020 us): difference in [58.6790 us; 112.8620 us], [+2933950.3%, +5643097.8%]
call_later(0.0030 us): difference in [58.9260 us; 118.2042 us], [+1964199.0%, +3940139.5%]
call_later(0.0040 us): difference in [59.3839 us; 117.6248 us], [+1484597.0%, +2940620.9%]
call_later(0.0050 us): difference in [58.9361 us; 115.3991 us], [+1178721.5%, +2307982.8%]
call_later(0.1 us): difference in [61.1 us; 133.5 us], [+61131.0%, +133512.9%]
call_later(0.5 us): difference in [59.0 us; 124.8 us], [+11793.8%, +24953.6%]
call_later(1.0 us): difference in [57.7 us; 113.7 us], [+5770.7%, +11372.1%]
call_later(1.5 us): difference in [57.2 us; 113.5 us], [+3816.0%, +7563.7%]
call_later(2.0 us): difference in [56.4 us; 130.1 us], [+2822.0%, +6506.8%]
call_later(3.0 us): difference in [55.1 us; 99.7 us], [+1836.5%, +3324.5%]
call_later(4.0 us): difference in [53.8 us; 98.5 us], [+1345.3%, +2462.4%]
call_later(5.0 us): difference in [54.6 us; 97.0 us], [+1091.2%, +1939.9%]
call_later(10.0 us): difference in [76.8 us; 114.3 us], [+767.6%, +1142.5%]
call_later(50.0 us): difference in [44.5 us; 1154.6 us], [+88.9%, +2309.3%]
call_later(0.1 ms): difference in [1.1 ms; 1.2 ms], [+1061.3%, +1153.8%]
call_later(100.0 us): difference in [1066.1 us; 1134.3 us], [+1066.1%, +1134.3%]
call_later(150.0 us): difference in [1005.0 us; 1104.6 us], [+670.0%, +736.4%]
call_later(200.0 us): difference in [940.3 us; 1024.9 us], [+470.2%, +512.4%]
call_later(300.0 us): difference in [854.2 us; 942.4 us], [+284.7%, +314.1%]
call_later(400.0 us): difference in [756.3 us; 841.4 us], [+189.1%, +210.3%]
call_later(0.5 ms): difference in [0.6 ms; 0.8 ms], [+129.2%, +151.6%]
call_later(500.0 us): difference in [648.7 us; 734.8 us], [+129.7%, +147.0%]
call_later(1.0 ms): difference in [0.1 ms; 0.3 ms], [+14.7%, +26.0%]
call_later(1.5 ms): difference in [0.7 ms; 0.8 ms], [+43.5%, +51.3%]
call_later(2.0 ms): difference in [0.2 ms; 0.2 ms], [+7.7%, +10.6%]
call_later(3.0 ms): difference in [0.2 ms; 0.2 ms], [+6.2%, +8.1%]
call_later(4.0 ms): difference in [0.2 ms; 0.2 ms], [+4.1%, +6.0%]
call_later(5.0 ms): difference in [0.2 ms; 0.3 ms], [+3.0%, +5.6%]
call_later(10.0 ms): difference in [0.1 ms; 0.3 ms], [+1.3%, +2.6%]
call_later(10.0 ms): difference in [0.2 ms; 0.3 ms], [+2.0%, +2.8%]
call_later(15.0 ms): difference in [0.2 ms; 0.3 ms], [+1.2%, +1.7%]
call_later(20.0 ms): difference in [0.2 ms; 1.9 ms], [+1.1%, +9.3%]
call_later(30.0 ms): difference in [0.2 ms; 0.3 ms], [+0.7%, +1.0%]
call_later(40.0 ms): difference in [0.2 ms; 0.3 ms], [+0.5%, +0.8%]
call_later(50.0 ms): difference in [0.2 ms; 0.3 ms], [+0.4%, +0.5%]
call_later(100.0 ms): difference in [0.3 ms; 0.4 ms], [+0.3%, +0.4%]
call_later(100.000 ms): difference in [0.254 ms; 0.384 ms], [+0.3%, +0.4%]
call_later(150.0 ms): difference in [0.3 ms; 0.4 ms], [+0.2%, +0.3%]
call_later(200.0 ms): difference in [0.2 ms; 0.4 ms], [+0.1%, +0.2%]
call_later(300.0 ms): difference in [0.5 ms; 0.5 ms], [+0.2%, +0.2%]
call_later(400.0 ms): difference in [0.5 ms; 0.7 ms], [+0.1%, +0.2%]
call_later(500.0 ms): difference in [0.7 ms; 0.8 ms], [+0.1%, +0.2%]
call_later(500.000 ms): difference in [0.297 ms; 0.786 ms], [+0.1%, +0.2%]
call_later(1000.000 ms): difference in [0.439 ms; 1.285 ms], [+0.0%, +0.1%]
call_later(1500.000 ms): difference in [0.916 ms; 1.759 ms], [+0.1%, +0.1%]
call_later(2000.000 ms): difference in [0.678 ms; 2.297 ms], [+0.0%, +0.1%]
call_later(3000.000 ms): difference in [0.444 ms; 3.275 ms], [+0.0%, +0.1%]
call_later(4000.000 ms): difference in [0.516 ms; 4.255 ms], [+0.0%, +0.1%]
call_later(5000.000 ms): difference in [1.051 ms; 5.242 ms], [+0.0%, +0.1%]
Global accuracy in [44.5 us; 5.2 ms]

Loop granularity: 1.0 ms
Selector resolution: 1.0 ms
Event loop: _UnixSelectorEventLoop
Selector: EpollSelector
---

IMO a maximum difference of 5.2 ms with a granularity of 1 ms is very good. For a call scheduled in 1 ms, the difference is in range [0.1 ms; 0.3 ms] which is also very good.

Results with  remove_granularity.patch:
---
[ asyncio accuracy ]
call_later(0.0001 us): difference in [65.1941 us; 95.9378 us], [+65194110.0%, +95937766.7%]
call_later(0.0005 us): difference in [64.9974 us; 109.2616 us], [+12999486.8%, +21852313.2%]
call_later(0.0010 us): difference in [61.7960 us; 106.4841 us], [+6179597.8%, +10648409.5%]
call_later(0.0015 us): difference in [61.6555 us; 106.4587 us], [+4110369.8%, +7097245.5%]
call_later(0.0020 us): difference in [61.0229 us; 105.4591 us], [+3051145.6%, +5272955.3%]
call_later(0.0030 us): difference in [61.7439 us; 115.1779 us], [+2058130.7%, +3839261.8%]
call_later(0.0040 us): difference in [60.6982 us; 106.1491 us], [+1517455.2%, +2653727.0%]
call_later(0.0050 us): difference in [61.3559 us; 105.6279 us], [+1227117.7%, +2112558.7%]
call_later(0.1 us): difference in [68.4 us; 114.5 us], [+68436.0%, +114519.0%]
call_later(0.5 us): difference in [63.2 us; 99.4 us], [+12644.4%, +19883.2%]
call_later(1.0 us): difference in [61.6 us; 99.5 us], [+6164.7%, +9948.8%]
call_later(1.5 us): difference in [60.8 us; 96.3 us], [+4055.1%, +6417.1%]
call_later(2.0 us): difference in [59.9 us; 100.7 us], [+2995.2%, +5034.9%]
call_later(3.0 us): difference in [58.6 us; 94.5 us], [+1952.4%, +3150.7%]
call_later(4.0 us): difference in [57.3 us; 106.6 us], [+1432.8%, +2664.1%]
call_later(5.0 us): difference in [61.1 us; 97.1 us], [+1221.9%, +1942.7%]
call_later(10.0 us): difference in [86.5 us; 143.5 us], [+865.3%, +1434.7%]
call_later(50.0 us): difference in [33.3 us; 1164.4 us], [+66.6%, +2328.8%]
call_later(0.1 ms): difference in [1.1 ms; 1.2 ms], [+1055.1%, +1178.2%]
call_later(100.0 us): difference in [1071.8 us; 1147.2 us], [+1071.8%, +1147.2%]
call_later(150.0 us): difference in [997.9 us; 1097.1 us], [+665.3%, +731.4%]
call_later(200.0 us): difference in [974.7 us; 1040.0 us], [+487.3%, +520.0%]
call_later(300.0 us): difference in [868.3 us; 932.4 us], [+289.4%, +310.8%]
call_later(400.0 us): difference in [770.8 us; 830.7 us], [+192.7%, +207.7%]
call_later(0.5 ms): difference in [0.6 ms; 0.8 ms], [+129.2%, +151.6%]
call_later(500.0 us): difference in [675.5 us; 723.7 us], [+135.1%, +144.7%]
call_later(1.0 ms): difference in [0.1 ms; 0.3 ms], [+14.6%, +30.2%]
call_later(1.5 ms): difference in [0.7 ms; 0.8 ms], [+43.8%, +52.0%]
call_later(2.0 ms): difference in [0.1 ms; 0.2 ms], [+7.3%, +12.2%]
call_later(3.0 ms): difference in [0.2 ms; 0.3 ms], [+5.8%, +9.0%]
call_later(4.0 ms): difference in [0.2 ms; 0.2 ms], [+3.9%, +6.1%]
call_later(5.0 ms): difference in [0.2 ms; 0.3 ms], [+3.7%, +5.6%]
call_later(10.0 ms): difference in [0.2 ms; 0.3 ms], [+1.8%, +2.6%]
call_later(10.0 ms): difference in [0.2 ms; 0.3 ms], [+2.1%, +2.8%]
call_later(15.0 ms): difference in [0.2 ms; 0.3 ms], [+1.4%, +1.8%]
call_later(20.0 ms): difference in [0.2 ms; 0.3 ms], [+1.1%, +1.5%]
call_later(30.0 ms): difference in [0.2 ms; 0.3 ms], [+0.7%, +1.0%]
call_later(40.0 ms): difference in [0.2 ms; 0.3 ms], [+0.6%, +0.7%]
call_later(50.0 ms): difference in [0.2 ms; 0.3 ms], [+0.4%, +0.6%]
call_later(100.0 ms): difference in [0.3 ms; 0.3 ms], [+0.3%, +0.3%]
call_later(100.000 ms): difference in [0.269 ms; 0.325 ms], [+0.3%, +0.3%]
call_later(150.0 ms): difference in [0.3 ms; 0.4 ms], [+0.2%, +0.3%]
call_later(200.0 ms): difference in [0.4 ms; 0.5 ms], [+0.2%, +0.2%]
call_later(300.0 ms): difference in [0.3 ms; 0.6 ms], [+0.1%, +0.2%]
call_later(400.0 ms): difference in [0.5 ms; 0.7 ms], [+0.1%, +0.2%]
call_later(500.0 ms): difference in [0.7 ms; 0.8 ms], [+0.1%, +0.2%]
call_later(500.000 ms): difference in [0.698 ms; 0.748 ms], [+0.1%, +0.1%]
call_later(1000.000 ms): difference in [1.026 ms; 1.286 ms], [+0.1%, +0.1%]
call_later(1500.000 ms): difference in [0.645 ms; 1.744 ms], [+0.0%, +0.1%]
call_later(2000.000 ms): difference in [1.007 ms; 2.304 ms], [+0.1%, +0.1%]
call_later(3000.000 ms): difference in [1.165 ms; 3.256 ms], [+0.0%, +0.1%]
call_later(4000.000 ms): difference in [0.742 ms; 4.235 ms], [+0.0%, +0.1%]
call_later(5000.000 ms): difference in [1.064 ms; 5.267 ms], [+0.0%, +0.1%]
Global accuracy in [33.3 us; 5.3 ms]

Event loop: _UnixSelectorEventLoop
Selector: EpollSelector
---

No surprise, the accuracy is the same.

Another approach: always adds granularity to the timeout in asyncio (and selector rounds toward zero, as in Python 3.3). It looks like the test_timeout_rounding() test still pass with this option. Results with add_granularity.patch:
---
[ asyncio accuracy ]
call_later(0.0001 us): difference in [1136.6910 us; 1234.2710 us], [+1136690976.7%, +1234270997.9%]
call_later(0.0005 us): difference in [1134.2053 us; 1230.9015 us], [+226841068.5%, +246180307.7%]
call_later(0.0010 us): difference in [1149.2229 us; 1219.4788 us], [+114922288.5%, +121947883.3%]
call_later(0.0015 us): difference in [1162.8546 us; 1222.8143 us], [+77523641.3%, +81520955.4%]
call_later(0.0020 us): difference in [1163.5149 us; 1217.9579 us], [+58175744.6%, +60897895.7%]
call_later(0.0030 us): difference in [1179.6120 us; 1227.1221 us], [+39320401.2%, +40904068.7%]
call_later(0.0040 us): difference in [1165.2680 us; 1209.9680 us], [+29131699.2%, +30249199.0%]
call_later(0.0050 us): difference in [1170.0682 us; 1226.5110 us], [+23401363.4%, +24530219.5%]
call_later(0.1 us): difference in [1162.8 us; 1248.8 us], [+1162765.9%, +1248821.0%]
call_later(0.5 us): difference in [1154.8 us; 1247.5 us], [+230968.4%, +249492.4%]
call_later(1.0 us): difference in [1146.2 us; 1231.9 us], [+114622.4%, +123185.5%]
call_later(1.5 us): difference in [1125.9 us; 1220.8 us], [+75061.3%, +81385.5%]
call_later(2.0 us): difference in [1121.4 us; 1211.2 us], [+56071.7%, +60558.3%]
call_later(3.0 us): difference in [1138.9 us; 1228.9 us], [+37962.0%, +40962.5%]
call_later(4.0 us): difference in [1146.9 us; 1227.2 us], [+28673.0%, +30680.0%]
call_later(5.0 us): difference in [1149.5 us; 1247.1 us], [+22990.4%, +24942.4%]
call_later(10.0 us): difference in [1174.3 us; 1231.1 us], [+11743.4%, +12311.1%]
call_later(50.0 us): difference in [1112.6 us; 1187.8 us], [+2225.1%, +2375.6%]
call_later(0.1 ms): difference in [1.1 ms; 1.2 ms], [+1059.0%, +1154.9%]
call_later(100.0 us): difference in [1061.7 us; 1133.5 us], [+1061.7%, +1133.5%]
call_later(150.0 us): difference in [1016.0 us; 1080.4 us], [+677.4%, +720.2%]
call_later(200.0 us): difference in [938.1 us; 1194.4 us], [+469.0%, +597.2%]
call_later(300.0 us): difference in [834.9 us; 930.1 us], [+278.3%, +310.0%]
call_later(400.0 us): difference in [754.3 us; 852.6 us], [+188.6%, +213.1%]
call_later(0.5 ms): difference in [0.6 ms; 0.7 ms], [+124.3%, +149.0%]
call_later(500.0 us): difference in [646.8 us; 738.0 us], [+129.4%, +147.6%]
call_later(1.0 ms): difference in [0.1 ms; 0.3 ms], [+11.8%, +26.1%]
call_later(1.5 ms): difference in [0.6 ms; 0.8 ms], [+41.1%, +50.3%]
call_later(2.0 ms): difference in [0.1 ms; 0.2 ms], [+6.2%, +10.8%]
call_later(3.0 ms): difference in [0.1 ms; 0.2 ms], [+4.3%, +7.9%]
call_later(4.0 ms): difference in [0.1 ms; 0.2 ms], [+3.4%, +5.7%]
call_later(5.0 ms): difference in [0.2 ms; 0.4 ms], [+3.2%, +7.1%]
call_later(10.0 ms): difference in [0.1 ms; 0.2 ms], [+1.4%, +2.4%]
call_later(10.0 ms): difference in [0.2 ms; 0.3 ms], [+1.6%, +2.6%]
call_later(15.0 ms): difference in [0.2 ms; 0.3 ms], [+1.3%, +1.7%]
call_later(20.0 ms): difference in [0.2 ms; 0.3 ms], [+1.0%, +1.5%]
call_later(30.0 ms): difference in [0.2 ms; 0.3 ms], [+0.8%, +0.9%]
call_later(40.0 ms): difference in [0.2 ms; 0.3 ms], [+0.5%, +0.7%]
call_later(50.0 ms): difference in [0.2 ms; 0.3 ms], [+0.4%, +0.5%]
call_later(100.0 ms): difference in [0.2 ms; 0.3 ms], [+0.2%, +0.3%]
call_later(100.000 ms): difference in [0.247 ms; 0.345 ms], [+0.2%, +0.3%]
call_later(150.0 ms): difference in [0.3 ms; 0.4 ms], [+0.2%, +0.3%]
call_later(200.0 ms): difference in [0.3 ms; 0.4 ms], [+0.2%, +0.2%]
call_later(300.0 ms): difference in [0.2 ms; 0.5 ms], [+0.1%, +0.2%]
call_later(400.0 ms): difference in [0.6 ms; 0.6 ms], [+0.1%, +0.2%]
call_later(500.0 ms): difference in [0.4 ms; 0.7 ms], [+0.1%, +0.1%]
call_later(500.000 ms): difference in [0.365 ms; 0.781 ms], [+0.1%, +0.2%]
call_later(1000.000 ms): difference in [1.176 ms; 1.239 ms], [+0.1%, +0.1%]
call_later(1500.000 ms): difference in [1.650 ms; 1.727 ms], [+0.1%, +0.1%]
call_later(2000.000 ms): difference in [0.318 ms; 2.241 ms], [+0.0%, +0.1%]
call_later(3000.000 ms): difference in [1.169 ms; 3.254 ms], [+0.0%, +0.1%]
call_later(4000.000 ms): difference in [1.297 ms; 4.238 ms], [+0.0%, +0.1%]
call_later(5000.000 ms): difference in [0.975 ms; 5.272 ms], [+0.0%, +0.1%]
Global accuracy in [117.8 us; 5.3 ms]

Loop granularity: 1.0 ms
Selector resolution: 1.0 ms
Event loop: _UnixSelectorEventLoop
Selector: EpollSelector
---

The accuracy is almost the same, except that the minimum accuracy (117.8 us) is bigger than the current implementation (44.5 us). But I don't know if my micro-benchmark is reliable.

Note: I also tested to always add 2 * granularity to asyncio timeout, but the minimum difference increases to 1 ms (as expected) which is much worse.
msg210343 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-02-05 19:56
> To solve a performance issue in asyncio, I added a new resolution
attribute to selectors.BaseSelector and a new _granularity attribute to
asyncio.BaseEventLoop. If I understood correctly, Charles-François (author
and so maintainer of the new selectors module) doesn't like the new
resolution attribute because it is a lie (if I understood correctly
Charles-François complain).

Here are the reasons I don't like this attribute:
- it's a low-level implementation detail, which shouldn't be part of the API
- it's actually impossible to provide a sensible value for the granularity,
because it depends not only on the operating system, but also on the actual
version on the OS, and the hardware, as Victor's extensive tests showed
- what's the definition of this "granularity" anyway?
- but most importantly, it's useless: the performance problem initially
identified was due to the rounding of select/epoll/etc of timeouts towards
zero, which means that it they got passed a timeout smaller than the
syscall resolution, the syscall would end up being called in a tight loop.
Now that the timeouts are rounded away from 0, that's not an issue anymore

Let me state this last point once again: no busy loop can occur now that
timeouts are rounded up.
Sure, some syscalls, on some OS, can sometimes return a little bit earlier
than expected, e.g. epoll can return after 0.98ms instead of 1ms. But
that's not an issue, if that's the case you'll just loop an extra time,
*all* event loops just work this way.
Furthermore, note than due to rounding, the actual probability of having
the syscall return earlier than expected is really low:
for example, if the loop wants to wake up in 5.3ms, it will be rounded to
6ms, and even a slight early wakeup won't be enough to wake up before 5.3ms
(typically it'll return after 5.98ms or so).

 > The patch is a comprise, it solves partially the asyncio performance
issue. With the patch, BaseEventLoop._run_once() may not executed a task
even if there was a scheduled task, just because of timing rounding issues.
So I modified the existing patch to tolerate "useless calls" to _run_once().

If you want to keep the current approach, nothing prevents from using a
fixed "slack" value, independant of the selector (e.g. 1ms seems
reasonable).
msg210411 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-02-06 20:06
>
> Let me state this last point once again: no busy loop can occur now that
> timeouts are rounded up.
>

How sure are you? Suppose I use poll() with a 0.5 msec timeout. This
presumably gets rounded up to 1 msec. But if the system clock has e.g. a 10
msec resolution, won't this still wait 0 msec? Or will it wait until the
next "tick" occurs, which could be anywhere between 0 and 10 msec in the
future? But if so, why wouldn't a poll() with a 0 msec timeout also wait
between 0 and 10 msec?
msg210442 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-02-07 09:13
> How sure are you? Suppose I use poll() with a 0.5 msec timeout. This
> presumably gets rounded up to 1 msec. But if the system clock has e.g. a
10
> msec resolution, won't this still wait 0 msec? Or will it wait until the
> next "tick" occurs, which could be anywhere between 0 and 10 msec in the
> future?

It depends :-)
With high-resolution timers, you'll get 1ms.
Without, you'll likely get 10ms (time quantum).
An implementation returning without delay would be seriously broken.

> But if so, why wouldn't a poll() with a 0 msec timeout also wait
> between 0 and 10 msec?

Because the kernel doesn't suspend the current task/thread, it just checks
the list of currently FDs, and returns immediately.
See e.g.
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/select.c?id=797a796a13df6b84a4791e57306737059b5b2384#n772
msg210484 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-02-07 15:48
OK, this sounds like rounding up is important to avoid busy-wait (but maybe
only when rounding down would give us zero), and we shouldn't have to worry
about a courser timer, it will just make us late and that's always
acceptable.

On Fri, Feb 7, 2014 at 1:13 AM, Charles-François Natali <
report@bugs.python.org> wrote:

>
> Charles-François Natali added the comment:
>
> > How sure are you? Suppose I use poll() with a 0.5 msec timeout. This
> > presumably gets rounded up to 1 msec. But if the system clock has e.g. a
> 10
> > msec resolution, won't this still wait 0 msec? Or will it wait until the
> > next "tick" occurs, which could be anywhere between 0 and 10 msec in the
> > future?
>
> It depends :-)
> With high-resolution timers, you'll get 1ms.
> Without, you'll likely get 10ms (time quantum).
> An implementation returning without delay would be seriously broken.
>
> > But if so, why wouldn't a poll() with a 0 msec timeout also wait
> > between 0 and 10 msec?
>
> Because the kernel doesn't suspend the current task/thread, it just checks
> the list of currently FDs, and returns immediately.
> See e.g.
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/select.c?id=797a796a13df6b84a4791e57306737059b5b2384#n772
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue20505>
> _______________________________________
>
msg210487 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-07 16:18
About Windows, it looks like select() and GetQueuedCompletionStatus() (poll function the IocpProactor) round the timeout away from zero (the timeout is a number of milliseconds, an integer).

Examples with GetQueuedCompletionStatus():
- 2 ms (timeout) => 15 ms (elasped time)
- 100 ms => 109 ms
- 157 ms => 172 ms

Example with select():
- 2 ms => 16 ms
- 100 ms => 110 ms
- 156 ms => 171 ms
msg210489 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-07 16:39
"If you want to keep the current approach, nothing prevents from using a
fixed "slack" value, independant of the selector (e.g. 1ms seems
reasonable)."

select() and kqueue() are able to sleep less than 1 ms. Using a slack of 1 ms would reduce the accuracy. I don't see why we should limit the accuracy. Why 1 ms? Because of poll/epoll? What about Windows and its resolution of 15.6 ms?

Well, under 1 ms, don't expect accurate results :-) For example on my fast Linux box, a select() with a timeout of 61 us takes 137 us (elapsed time measued in Python).
msg210493 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-07 16:54
"Let me state this last point once again: no busy loop can occur now that timeouts are rounded up."

Agreed. The busy loop issue was solved by rounding the timeout away from zero in select and selectors module.

"Sure, some syscalls, on some OS, can sometimes return a little bit earlier than expected, e.g. epoll can return after 0.98ms instead of 1ms. But that's not an issue, if that's the case you'll just loop an extra time, *all* event loops just work this way."

I would like to avoid this *useless* loop, because such case can be common. What's the point of polling again? We just polled all file descriptors.

In msg209772: I saw on "x86 Gentoo 3.x" that "waiting 99.9 ms took 99.6 ms and 99.9 ms, and waiting 9.9 ms took 9.7 ms."

But I remember that I saw larger difference, but I don't remember in which state the code was. So let's get more data from buildbots: I add again the "dt < timeout" message.
msg210494 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-07 16:57
New changeset 3b94a4ef244e by Victor Stinner in branch 'default':
Issue #20505: add debug info
http://hg.python.org/cpython/rev/3b94a4ef244e
msg210501 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-07 17:37
> New changeset 3b94a4ef244e by Victor Stinner in branch 'default':
> Issue #20505: add debug info

Some results.

AMD64 Windows7 SP1 3.x:
==   Windows-7-6.1.7601-SP1 little-endian
SelectSelector.select(156.001 ms) took 156.000 ms (granularity=15.600 ms, resolution=0.001 ms)

x86 Ubuntu Shared 3.x:
==   Linux-2.6.31.5-linode21-i686-with-debian-lenny-sid little-endian
EpollSelector.select(99.947 ms) took 99.534 ms (granularity=1.000 ms, resolution=1.000 ms)

Oh, for Windows: select() was called with a timeout of 157 ms but slept during 156 ms. So sometimes it's a little less than the timeout (1 ms).
msg210532 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-02-07 19:25
> select() and kqueue() are able to sleep less than 1 ms. Using a slack of
1 ms would reduce the accuracy. I don't see why we should limit the
accuracy. Why 1 ms? Because of poll/epoll? What about Windows and its
resolution of 15.6 ms?
>
> Well, under 1 ms, don't expect accurate results :-) For example on my
fast Linux box, a select() with a timeout of 61 us takes 137 us (elapsed
time measued in Python).

That's precisely why I suggested 1ms.

And, by definition, it would only be used if the select()/poll() call
returns earlier in the first place, so it doesn't reduce the accuracy.

> I would like to avoid this *useless* loop, because such case can be
common.
> What's the point of polling again? We just polled all file descriptors.

"If it ain't broke, don't fix it": for the 10th time, all event loops work
like this, and this has *never* been an issue.
So I suggest we stop making up complicated solutions to non-existent
problems.
msg210558 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-02-07 21:22
In the end I think Charles-Francois is right -- the rounding up in selectmodule.c (for epoll) and selectors.py (for poll and, redundantly[*], for epoll) should be sufficient to avoid the busy-wait behavior with delays under 1ms.

We're not promising real-time guarantees at any level, so both returning too early and too late must be acceptable. If a test fails frequently despite relaxed constraints, that's a problem with the test.

So I think Victor should apply his first patch (remove_granularity.patch) to both CPython and Tulip and then we can adjust any tests that still cause occasional red buildbots.

[*] It's not redundant when using Tulip on Python 3.3, so I prefer to keep it.
msg210573 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-07 22:37
New changeset d853955491a2 by Victor Stinner in branch 'default':
Issue #20505: Remove resolution and _granularity from selectors and asyncio
http://hg.python.org/cpython/rev/d853955491a2
msg210574 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-07 22:43
More data from buildbots (before d853955491a2).

x86 Ubuntu Shared 3.x:
== Linux-2.6.31.5-linode21-i686-with-debian-lenny-sid little-endian
EpollSelector.select(99.947 ms) took 99.291 ms (granularity=1.000 ms, resolution=1.000 ms)
EpollSelector.select(99.928 ms) took 99.553 ms (granularity=1.000 ms, resolution=1.000 ms)
EpollSelector.select(499.962 ms) took 100.384 ms (granularity=1.000 ms, resolution=1.000 ms)
EpollSelector.select(398.823 ms) took 398.241 ms (granularity=1.000 ms, resolution=1.000 ms)
EpollSelector.select(9.918 ms) took 9.209 ms (granularity=1.000 ms, resolution=1.000 ms)
PollSelector.select(499.963 ms) took 101.114 ms (granularity=1.000 ms, resolution=1.000 ms)
SelectSelector.select(499.940 ms) took 101.417 ms (granularity=0.001 ms, resolution=0.001 ms)

x86 Gentoo Non-Debug 3.x
==   Linux-2.6.35-vs2.3.0.36.32-gentoo-i686-Intel-R-_Core-TM-2_CPU_6600_@_2.40GHz-with-gentoo-2.1 little-endian
[130/389] test_asyncio
EpollSelector.select(99.986 ms) took 99.796 ms (granularity=1.000 ms, resolution=1.000 ms)
EpollSelector.select(99.980 ms) took 99.346 ms (granularity=1.000 ms, resolution=1.000 ms)
EpollSelector.select(499.987 ms) took 100.093 ms (granularity=1.000 ms, resolution=1.000 ms)
EpollSelector.select(9.979 ms) took 9.573 ms (granularity=1.000 ms, resolution=1.000 ms)
PollSelector.select(499.987 ms) took 100.038 ms (granularity=1.000 ms, resolution=1.000 ms)
SelectSelector.select(499.987 ms) took 100.048 ms (granularity=0.001 ms, resolution=0.001 ms)

x86 Gentoo 3.x:
==   Linux-2.6.35-vs2.3.0.36.32-gentoo-i686-Intel-R-_Core-TM-2_CPU_6600_@_2.40GHz-with-gentoo-2.1 little-endian
[302/389/1] test_asyncio
EpollSelector.select(99.971 ms) took 99.771 ms (granularity=1.000 ms, resolution=1.000 ms)
EpollSelector.select(99.955 ms) took 99.737 ms (granularity=1.000 ms, resolution=1.000 ms)
EpollSelector.select(499.973 ms) took 100.077 ms (granularity=1.000 ms, resolution=1.000 ms)
EpollSelector.select(399.545 ms) took 399.015 ms (granularity=1.000 ms, resolution=1.000 ms)
EpollSelector.select(9.949 ms) took 9.552 ms (granularity=1.000 ms, resolution=1.000 ms)
PollSelector.select(499.973 ms) took 100.059 ms (granularity=1.000 ms, resolution=1.000 ms)
SelectSelector.select(499.973 ms) took 100.072 ms (granularity=0.001 ms, resolution=0.001 ms)

"select(499.973 ms) took 100.077" are tests on signals interrupting the select.
msg210575 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-07 22:49
> So I think Victor should apply his first patch (remove_granularity.patch) to both CPython and Tulip and then we can adjust any tests that still cause occasional red buildbots.

Ok, I did that. So I understand that the code was a micro-optimization for a minor or zero gain.

Sorry Charles-François for having insisted so much to keep it :)
msg210821 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-10 10:40
Sorry, I have to reopen this topic again, a buildbot failed :-( Builds 10136, 10134 and 10131 failed on "failure x86 XP-4 3.x" buildbot. It's sporadic probably because the test fails sometimes depending how the timing is rounded.

http://buildbot.python.org/all/builders/x86%20XP-4%203.x/builds/10136/steps/test/logs/stdio

======================================================================
FAIL: test_timeout_rounding (test.test_asyncio.test_events.SelectEventLoopTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\test\test_asyncio\test_events.py", line 1185, in test_timeout_rounding
    self.assertLessEqual(self.loop._run_once_counter, 8)
AssertionError: 114 not less than or equal to 8

http://buildbot.python.org/all/builders/x86%20XP-4%203.x/builds/10134/steps/test/logs/stdio

======================================================================
FAIL: test_timeout_rounding (test.test_asyncio.test_events.SelectEventLoopTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\test\test_asyncio\test_events.py", line 1185, in test_timeout_rounding
    self.assertLessEqual(self.loop._run_once_counter, 8)
AssertionError: 181 not less than or equal to 8

http://buildbot.python.org/all/builders/x86%20XP-4%203.x/builds/10131/steps/test/logs/stdio

======================================================================
FAIL: test_timeout_rounding (test.test_asyncio.test_events.SelectEventLoopTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\test\test_asyncio\test_events.py", line 1185, in test_timeout_rounding
    self.assertLessEqual(self.loop._run_once_counter, 8)
AssertionError: 158 not less than or equal to 8
msg210822 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-10 10:48
New changeset 32dc137f5e26 by Victor Stinner in branch 'default':
Issue #20505: Add debug info to analyze sporaric failures of
http://hg.python.org/cpython/rev/32dc137f5e26
msg210855 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-10 17:56
Ok, I reproduced the issue on my Windows 7 after enabling HPET:

== CPython 3.4.0b3+ (default:1b96d08c3895, Feb 10 2014, 18:44:24) [MSC v.1600 32
 bit (Intel)]
==   Windows-7-6.1.7601-SP1 little-endian
[1/1] test_asyncio
test_timeout_rounding (test.test_asyncio.test_events.ProactorEventLoopTests) ...
 asyncio: selector.select(0.100 ms) took 0.000 ms
asyncio: selector.select(0.100 ms) took 0.000 ms
asyncio: selector.select(0.100 ms) took 0.000 ms
asyncio: selector.select(0.100 ms) took 0.000 ms
asyncio: selector.select(0.100 ms) took 0.000 ms
asyncio: selector.select(0.100 ms) took 0.000 ms
asyncio: selector.select(0.100 ms) took 0.000 ms
asyncio: selector.select(0.100 ms) took 0.000 ms
asyncio: selector.select(0.100 ms) took 0.000 ms
asyncio: selector.select(0.100 ms) took 0.000 ms
asyncio: selector.select(0.100 ms) took 0.000 ms
asyncio: selector.select(0.100 ms) took 0.000 ms
FAIL
test_timeout_rounding (test.test_asyncio.test_events.SelectEventLoopTests) ... a
syncio: selector.select(0.100 ms) took 0.000 ms
asyncio: selector.select(0.100 ms) took 0.000 ms
asyncio: selector.select(0.100 ms) took 0.000 ms
asyncio: selector.select(0.100 ms) took 0.000 ms
asyncio: selector.select(0.100 ms) took 0.000 ms
asyncio: selector.select(0.100 ms) took 0.000 ms
asyncio: selector.select(0.100 ms) took 0.000 ms
FAIL

======================================================================
FAIL: test_timeout_rounding (test.test_asyncio.test_events.ProactorEventLoopTest
s)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\haypo\python_default\lib\test\test_asyncio\test_events.py", line 1188
, in test_timeout_rounding
    'selector': self.loop._selector.__class__.__name__})
AssertionError: 18 not less than or equal to 8 : {'time_info': namespace(adjusta
ble=True, implementation='GetSystemTimeAsFileTime()', monotonic=False, resolutio
n=0.015600099999999999), 'selector': 'IocpProactor', 'monotonic_info': namespace
(adjustable=False, implementation='GetTickCount64()', monotonic=True, resolution
=0.015600099999999999)}

======================================================================
FAIL: test_timeout_rounding (test.test_asyncio.test_events.SelectEventLoopTests)

----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\haypo\python_default\lib\test\test_asyncio\test_events.py", line 1188
, in test_timeout_rounding
    'selector': self.loop._selector.__class__.__name__})
AssertionError: 13 not less than or equal to 8 : {'time_info': namespace(adjusta
ble=True, implementation='GetSystemTimeAsFileTime()', monotonic=False, resolutio
n=0.015600099999999999), 'selector': 'SelectSelector', 'monotonic_info': namespa
ce(adjustable=False, implementation='GetTickCount64()', monotonic=True, resoluti
on=0.015600099999999999)}

It looks like GetQueuedCompletionStatus(ms) takes between (ms - 1) and (ms) milliseconds. Examples:

GetQueuedCompletionStatus(1 ms)->None took 0.417 ms (monotonic: 0.000 ms)
GetQueuedCompletionStatus(1 ms)->None took 0.307 ms (monotonic: 0.000 ms)
GetQueuedCompletionStatus(2 ms)->None took 1.101 ms (monotonic: 16.000 ms)
GetQueuedCompletionStatus(10 ms)->None took 9.971 ms (monotonic: 0.000 ms)
GetQueuedCompletionStatus(10 ms)->None took 9.908 ms (monotonic: 15.000 ms)
GetQueuedCompletionStatus(2 ms)->None took 1.634 ms (monotonic: 0.000 ms)

I used time.perf_counter() to measure elapsed time, with elapsed "monotonic" time in parenthesis. As you can see, the monotonic clock is usually unchanged for such short sleep.
msg210856 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-10 18:12
On the same Windows 7 virtual machine, but with HPET disabled, the test pass.

[1/1] test_asyncio
test_timeout_rounding (test.test_asyncio.test_events.ProactorEventLoopTests) ...
 GetQueuedCompletionStatus(2 ms)->None took 4.350 ms (monotonic: 16.000 ms)
GetQueuedCompletionStatus(10 ms)->None took 14.737 ms (monotonic: 16.000 ms)
GetQueuedCompletionStatus(2 ms)->None took 14.700 ms (monotonic: 15.000 ms)
GetQueuedCompletionStatus(2 ms)->None took 12.553 ms (monotonic: 16.000 ms)
GetQueuedCompletionStatus(2 ms)->None took 15.076 ms (monotonic: 15.000 ms)
GetQueuedCompletionStatus(2 ms)->None took 14.524 ms (monotonic: 16.000 ms)
GetQueuedCompletionStatus(2 ms)->None took 14.861 ms (monotonic: 16.000 ms)
GetQueuedCompletionStatus(2 ms)->None took 14.617 ms (monotonic: 15.000 ms)
GetQueuedCompletionStatus(0 ms)->None took 0.012 ms (monotonic: 0.000 ms)
ok

As you can see, the elapsed monotonic time if only zero when the timeout was zero, otherwise it is at least 15 ms.

So to fix the performance issue on Windows with HPET enabled, I see 3 options:

- round the timeout of GetQueuedCompletionStatus() (asyncio module) and select.select() (select module) using the resolution of the clock
- use a more precise clock: time.perf_counter(). I don't like this clock because it is not reliable, especially on duration longer than 2 minutes
- reintroduce the granularity in asyncio (and maybe resolution in selectors): for this specific issue (Windows with HPET enabled), only the resolution of the clock matters in fact

IMO the granularity is the best option because it only modify asyncio and is simple. We can try to leave selectors unchanged.

Note: only poll and epoll selectors round away from zero. select and kqueue round towards zero. But it's not important IMO.
msg210857 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-10 18:18
New changeset 20dc8d6430eb by Victor Stinner in branch 'default':
Issue #20505: Use even shorter sleep in test_timeout_rounding() to make the
http://hg.python.org/cpython/rev/20dc8d6430eb
msg210858 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-10 18:22
Here is clock_resolution.patch: reintroduce the old "granularity" but only use the resolution of the clock (expect the selector to round away from zero).

clock_resolution.patch fixes test_timeout_rounding() on my Windows 7 with HPET enabled, even with my more agressive version of the test (test also sleep close to nanoseconds, not only close to milliseconds).

Maybe we should also fix #20320: round away from zero for select(), kqueue and signal.sigtimedwait().
msg210859 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-02-10 18:29
How do you know that the timer used by the select/poll/etc. call has the same resolution?

The variable 'now' should probably be given a more suitable name.

Can the clock resolution be zero? If not, I recommend adjusting the comparisons so that an event scheduled at exactly the rounded-up 'now' value is not considered ready -- it should be strictly before. (But if the clock resolution happens to be zero, an event scheduled at exactly 'now' should be considered ready.)
msg210875 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-10 22:01
> How do you know that the timer used by the select/poll/etc. call has the same resolution?

If I understood correctly, there a 3 kind of clocks on Windows:

- kernel heartbeat: GetSystemTimeAdjustment() gives the resolution (a few milliseconds)
- multimedia timers
- performance counter: the resolution is 1 / QueryPerformanceFrequency() (at least 1 microsecond)

GetSystemTimeAsFileTime() (time.time), GetTickCount[64]() (time.monotonic) and GetProcessTimes() (time.process_time) use the kernel heartbeat (I invented this name :-)). GetTickCount() is not adjusted.

QueryPerformanceCounter() is the performance counter, it is used by time.perf_counter().

GetSystemTimeAdjustment():
http://msdn.microsoft.com/en-us/library/windows/desktop/ms724394%28v=vs.85%29.aspx

For more information, see the PEP 418:
http://www.python.org/dev/peps/pep-0418/

> Can the clock resolution be zero?

There is a unit test to ensure that the resolution of all clocks is greater than 0 and smaller or equal than 1.0.

> If not, I recommend adjusting the comparisons so that an event scheduled at exactly the rounded-up 'now' value is not considered ready -- it should be strictly before.

Ok, here is an updated patch.
msg210879 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-02-10 22:12
OK, since the resolution is 1 nsec on my Ubuntu and OSX systems and 15.6 msec on my Windows box, this patch looks fine.
msg210884 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-10 22:43
New changeset 6b850722849e by Victor Stinner in branch 'default':
Issue #20505: BaseEventLoop uses again the resolution of the clock to decide if
http://hg.python.org/cpython/rev/6b850722849e
msg210885 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-10 22:53
> OK, since the resolution is 1 nsec on my Ubuntu

Yeah, Linux always announce 1 nanosecond, even the real resolution is not so good:
https://lkml.org/lkml/2012/2/9/100

On Mac OS X, the resolution is also probably hardcoded to 1 nanosecond. (Python reads the resolution at runtime on Linux, Mac OS X and Windows.)

Since the granularity is back, why not using also the resolution of the selector in asyncio? :-) It would avoid useless calls for free. It can be a private attribute if you don't want to make it public. But I don't see why you would like to hide such data since time.get_clock_info(name).resolution is the same lie (at least on Linux and Mac OS X) :-) It's just a question of exposing OS data. No need to compute the effective resolution.
msg210886 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-02-10 22:59
Please, no. This has to stop.
msg210903 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-11 08:04
New changeset 75881b85695f by Victor Stinner in branch 'default':
Issue #20505: Fix TestLoop, set the clock resolution
http://hg.python.org/cpython/rev/75881b85695f
msg210904 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-02-11 08:07
Wow, 10 messages in one night...
Could you try to gather all your finding at once, because reading so many
messages in difficult to follow?

> GetQueuedCompletionStatus(1 ms)->None took 0.307 ms (monotonic: 0.000 ms)

So basically, on Windows, select(1ms) can be non-blocking?

If yes, this OS is even more broken than I thought, and using the
granularity (in asyncio :-) is fine.
msg210905 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-11 08:12
Victor>> Since the granularity is back, why not using also the resolution of the selector in asyncio? :-)
Guido> Please, no. This has to stop.

Ok, sorry. If we still have some sporadic failures on UNIX, I think I will maybe use Charles-François's suggestion: use at least 1 ms for the resolution in asyncio loop: max(time.get_clock_info('monotonic').resolution, 1e-3).

But it looks like the issue is now fixed again on Windows.
msg210906 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-11 08:14
>> GetQueuedCompletionStatus(1 ms)->None took 0.307 ms (monotonic: 0.000 ms)
> So basically, on Windows, select(1ms) can be non-blocking?

I would not say non-blocking: it's just that select(N milliseconds) waits between N-1 and N milliseconds on Windows when HPET hardware (High Precision Event Timer) is present and enabled. Example:

GetQueuedCompletionStatus(10 ms)->None took 9.971 ms (monotonic: 0.000 ms)
msg210908 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-11 08:50
> Wow, 10 messages in one night... Could you try to gather all your finding at once, because reading so many messages in difficult to follow?

It's not easy because I collect informations from various buildbots and different virtual machines. But I planned to try to summarize the overall work done on time in asyncio, select and selectors.

The first problem is that clocks and selectors have a different resolution. On Linux, poll() has a resolution of 1 ms, whereas clock_gettime(CLOCK_MONOTONIC) has a resolution better than 1 us (0.3 us according to my last test). On Windows, GetTickCount64() has a resolution of 15.6 ms, whereas GetQueuedCompletionStatus() has a resolution of 1 ms when HPET is enabled.

Note: It looks like GetQueuedCompletionStatus() has the same resolution than GetTickCount64() when HPET is disabled.

The second problem is that before my changes asyncio expected selector.select(timeout) to take at least timeout seconds, whereas it may take 1 or 2 ms less than timeout. This difference is very small (1 ms), but it's enough to create a busy loop. This busy loop is quite short (1 ms), but it may occur each time than call_at() or call_later() is used (ex: each time syncio.sleep() is used).


Theorical resolution of selectors (resolution of the C types):

* poll, epoll, GetQueuedCompletionStatus: 1 ms (10^-3)
* select: 1 us (10^-6)
* kqueue: 1 ns (10^-9)


Effective resolution of selectors:

* poll, epoll: 1 ms... but may sleep 1 or 2 ms less than the timeout, ex: on Linux, epoll_wait(100) (100 ms) took 99.6 ms according to clock_gettime(CLOCK_MONOTONIC)
* GetQueuedCompletionStatus, HPET enabled: 1 ms... but may sleep 1 ms less than the timeout. Example on Windows with HPET enabled, GetQueuedCompletionStatus(10 ms) took 9.971 ms according to the performance counter)
* GetQueuedCompletionStatus, HPET disabled: 15.6 ms

For select, kqueue, I don't have reliable numbers, but I'm less worried by them since a Python application is usually slower than the resolution of these selectors.


Effective resolution of clocks (according to tables in PEP 418):

* QueryPerformanceCounter(): 10 ns
* clock_gettime(CLOCK_MONOTONIC) on Linux: 322 ns
* GetTickCount64(): 15.6 ms


To fix the busy loop, poll and epoll selectors have been modified to round the timeout away from zero, instead of rounding towards zero. So a timeout of 1 ns is converted to a timeout of 1 ms instead of 0.


On Windows, the clock resolution (15.6 ms) is worse than the resolution of the selector (~1 ms). The problem is that for timeout smaller than 15.6 ms, the clock value is unchanged after the sleep. So asyncio now uses the resolution of the clock to check if a scheduled task should be executed.


asyncio now has a unit test scheduling task with very short delay (between 10 ms and 0.1 ns) and counts the number of calls to _run_once() to detect busy loops. The unit test knows that the selector sometimes sleep 1 or 2 ms less than the timeout and tolerates that in its checks.


For selectors, select and kqueue still towards zero. I reopened #20320 which has a patch to round aways from zero and so avoids very short busy loop (1 us).


If the test_timeout_rounding() test still fail on UNIX even if #20320 is fixed, we should use a more coarse resolution in asyncio event loop (BaseEventLoop._clock_resolution).
msg210910 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-11 09:09
New changeset 03cb6ddc7040 by Victor Stinner in branch 'default':
Issue #20505: Improve debug info in asyncio event loop
http://hg.python.org/cpython/rev/03cb6ddc7040
msg210911 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-11 09:10
New changeset 9c1840e8d643 by Victor Stinner in branch 'default':
Issue #20505: Oops, only print debug info if selector.select(timeout) took less
http://hg.python.org/cpython/rev/9c1840e8d643
msg210912 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-02-11 09:14
> It's not easy because I collect informations from various buildbots and
different virtual machines. But I planned to try to summarize the overall
work done on time in asyncio, select and selectors.

Thanks for the summary.

> The first problem is that clocks and selectors have a different
resolution. On Linux, poll() has a resolution of 1 ms, whereas
clock_gettime(CLOCK_MONOTONIC) has a resolution better than 1 us (0.3 us
according to my last test).

time() having a better resolution than select() isn't an issue, so we're
good.

> On Windows, GetTickCount64() has a resolution of 15.6 ms, whereas
GetQueuedCompletionStatus() has a resolution of 1 ms when HPET is enabled.
>
> Note: It looks like GetQueuedCompletionStatus() has the same resolution
than GetTickCount64() when HPET is disabled.

OK, so basically this means that with HPET, select() does block, but the
clock resolution is too low, and reports an inaccurate elapsed time (and
sometimes returns 0).
So actually, no busy loop occurs even in this case.
Wouldn't it be possible to use another clock for asyncio on Windows? I find
surprising that we can't get better than 15.6ms with HPET...
msg210913 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-11 09:29
New changeset a631b01d1715 by Victor Stinner in branch 'default':
Issue #20505: use also the monotonic time to decide if asyncio debug traces
http://hg.python.org/cpython/rev/a631b01d1715
msg210914 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-11 09:33
More debug traces, Windows 7 with HPET enabled:

asyncio: IocpProactor.select(10.0000 ms) took 9.486 ms (monotonic=0.000 ms, clock res=15.600 ms)
asyncio: IocpProactor.select(0.0010 ms) took 0.942 ms (monotonic=0.000 ms, clock  res=15.600 ms)
asyncio: IocpProactor.select(0.0000 ms) took 0.553 ms (monotonic=0.000 ms, clock res=15.600 ms)
asyncio: IocpProactor.select(0.0000 ms) took 0.517 ms (monotonic=0.000 ms, clock res=15.600 ms)

asyncio: SelectSelector.select(0.1000 ms) took 2.276 ms (monotonic=0.000 ms, clock res=15.600 ms)
asyncio: SelectSelector.select(1.0000 us) took 30.810 us (monotonic=0.000 us, clock res=15600.100 us)
asyncio: SelectSelector.select(0.0100 us) took 30.350 us (monotonic=0.000 us, clock res=15600.100 us)
asyncio: SelectSelector.select(0.0001 us) took 28.930 us (monotonic=0.000 us, clock res=15600.100 us)

Note: IocpProactor.select() rounds the timeout aways from zero with a resolution of 1 ms, whereas SelectSelector.select() rounds towards zero. It may explain why IocpSelector sleeps at least 500 us, whereas SelectSelector sleeps sometimes 30 us.
msg210915 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-11 09:38
Windows 7, HPET disabled:

asyncio: IocpProactor.select(100.0000 ms) took 99.871 ms (monotonic=109.000 ms, clock res=15.600 ms)
asyncio: IocpProactor.select(10.0000 ms) took 3.779 ms (monotonic=16.000 ms, clock res=15.600 ms)

asyncio: SelectSelector.select(100.0000 ms) took 99.608 ms (monotonic=109.000 ms, clock res=15.600 ms)
asyncio: SelectSelector.select(10.0000 ms) took 2.830 ms (monotonic=16.000 ms, clock res=15.600 ms)
asyncio: SelectSelector.select(1.0000 us) took 42.760 us (monotonic=0.000 us, clock res=15600.100 us)
asyncio: SelectSelector.select(0.0100 us) took 42.220 us (monotonic=0.000 us, clock res=15600.100 us)
asyncio: SelectSelector.select(0.0001 us) took 41.530 us (monotonic=0.000 us, clock res=15600.100 us)

If you only look at the monotonic time, we always get elapsed > timeout with IocpProactor, but not with SelectSelector.select().
msg210969 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-11 16:49
test_timeout_rounding() pass on all major buildbots. test_asyncio hangs on fails on some buildbots, but there are dedicated issues (like #20495 and #20600).

I'm closing this one. I reopened the rounding issue for select and kqueue, please help me to close this last issue! #20505

--

For the record, some debug info on Windows:

AMD64 Windows7 SP1 3.x:

asyncio: IocpProactor.select(10.0000 ms) took 6.764 ms (monotonic=15.000 ms, clock res=15.600 ms)
asyncio: SelectSelector.select(100.0000 ms) took 97.473 ms (monotonic=109.000 ms, clock res=15.600 ms)

asyncio: SelectSelector.select(100.0000 ms) took 97.397 ms (monotonic=109.000 ms, clock res=15.600 ms)
asyncio: SelectSelector.select(0.9998 us) took 32.643 us (monotonic=0.000 us, clock res=15600.100 us)
asyncio: SelectSelector.select(0.0098 us) took 28.419 us (monotonic=0.000 us, clock res=15600.100 us)

x86 Windows Server 2003 [SB] 3.x:

asyncio: IocpProactor.select(100.0000 ms) took 96.972 ms (monotonic=110.000 ms, clock res=15.625 ms)
asyncio: IocpProactor.select(10.0000 ms) took 8.663 ms (monotonic=16.000 ms, clock res=15.625 ms)

asyncio: SelectSelector.select(100.0000 ms) took 98.644 ms (monotonic=110.000 ms, clock res=15.625 ms)
asyncio: SelectSelector.select(10.0000 ms) took 5.911 ms (monotonic=16.000 ms, clock res=15.625 ms)
asyncio: SelectSelector.select(0.9998 us) took 75.635 us (monotonic=0.000 us, clock res=15625.000 us)
asyncio: SelectSelector.select(0.0098 us) took 74.903 us (monotonic=0.000 us, clock res=15625.000 us)

AMD64 Windows Server 2008 [SB] 3.x:

asyncio: IocpProactor.select(100.0000 ms) took 95.484 ms (monotonic=109.000 ms, clock res=15.600 ms)
asyncio: IocpProactor.select(10.0000 ms) took 1.819 ms (monotonic=15.000 ms, clock res=15.600 ms)

asyncio: SelectSelector.select(100.0000 ms) took 97.741 ms (monotonic=109.000 ms, clock res=15.600 ms)
asyncio: SelectSelector.select(10.0000 ms) took 1.536 ms (monotonic=16.000 ms, clock res=15.600 ms)
asyncio: SelectSelector.select(0.0100 us) took 46.560 us (monotonic=0.000 us, clock res=15600.100 us)
asyncio: SelectSelector.select(0.0001 us) took 46.080 us (monotonic=0.000 us, clock res=15600.100 us)
msg210970 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-11 16:50
Thanks again Guido and Charles-François for your help on this tricky issue. Sorry for having flood your mail box :-)
msg210975 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-11 16:54
New changeset 360976a6d8b9 by Victor Stinner in branch 'default':
Issue #20505: Remove debug code
http://hg.python.org/cpython/rev/360976a6d8b9
msg213805 - (view) Author: Roundup Robot (python-dev) Date: 2014-03-17 06:30
New changeset 4f1df287392c by Victor Stinner in branch '3.4':
Issue #20505: Use even shorter sleep in test_timeout_rounding() to make the
http://hg.python.org/cpython/rev/4f1df287392c

New changeset 6733d9dfffbb by Victor Stinner in branch '3.4':
Issue #20505: BaseEventLoop uses again the resolution of the clock to decide if
http://hg.python.org/cpython/rev/6733d9dfffbb

New changeset 375e4f9c9732 by Victor Stinner in branch '3.4':
Issue #20505: Fix TestLoop, set the clock resolution
http://hg.python.org/cpython/rev/375e4f9c9732

New changeset e44ff3b7a497 by Victor Stinner in branch '3.4':
Issue #20505: Improve debug info in asyncio event loop
http://hg.python.org/cpython/rev/e44ff3b7a497

New changeset af840e781700 by Victor Stinner in branch '3.4':
Issue #20505: Oops, only print debug info if selector.select(timeout) took less
http://hg.python.org/cpython/rev/af840e781700

New changeset 2faf4b7c52ed by Victor Stinner in branch '3.4':
Issue #20505: use also the monotonic time to decide if asyncio debug traces
http://hg.python.org/cpython/rev/2faf4b7c52ed

New changeset 702b20fa7af2 by Victor Stinner in branch '3.4':
Issue #20505: Remove debug code
http://hg.python.org/cpython/rev/702b20fa7af2
History
Date User Action Args
2014-09-26 02:02:58jceasetnosy: + jcea
2014-03-17 06:30:49python-devsetmessages: + msg213805
2014-02-11 16:54:19python-devsetmessages: + msg210975
2014-02-11 16:50:24vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg210970
2014-02-11 16:49:50vstinnersetmessages: + msg210969
2014-02-11 09:38:04vstinnersetmessages: + msg210915
2014-02-11 09:33:50vstinnersetmessages: + msg210914
2014-02-11 09:29:02python-devsetmessages: + msg210913
2014-02-11 09:14:10neologixsetmessages: + msg210912
2014-02-11 09:10:50python-devsetmessages: + msg210911
2014-02-11 09:09:33python-devsetmessages: + msg210910
2014-02-11 08:50:38vstinnersetmessages: + msg210908
2014-02-11 08:14:38vstinnersetmessages: + msg210906
2014-02-11 08:12:25vstinnersetmessages: + msg210905
2014-02-11 08:07:03neologixsetmessages: + msg210904
2014-02-11 08:04:34python-devsetmessages: + msg210903
2014-02-10 22:59:00gvanrossumsetmessages: + msg210886
2014-02-10 22:53:04vstinnersetmessages: + msg210885
2014-02-10 22:43:18python-devsetmessages: + msg210884
2014-02-10 22:12:24gvanrossumsetmessages: + msg210879
2014-02-10 22:01:45vstinnersetfiles: + clock_resolution-2.patch

messages: + msg210875
2014-02-10 18:29:32gvanrossumsetmessages: + msg210859
2014-02-10 18:22:31vstinnersetfiles: + clock_resolution.patch

messages: + msg210858
2014-02-10 18:18:33python-devsetmessages: + msg210857
2014-02-10 18:12:24vstinnersetmessages: + msg210856
2014-02-10 17:56:44vstinnersetmessages: + msg210855
2014-02-10 10:48:24python-devsetmessages: + msg210822
2014-02-10 10:40:47vstinnersetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg210821
2014-02-07 22:49:19vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg210575
2014-02-07 22:43:38vstinnersetmessages: + msg210574
2014-02-07 22:37:00python-devsetmessages: + msg210573
2014-02-07 21:22:21gvanrossumsetmessages: + msg210558
2014-02-07 19:25:49neologixsetmessages: + msg210532
2014-02-07 17:37:45vstinnersetmessages: + msg210501
2014-02-07 16:57:58python-devsetnosy: + python-dev
messages: + msg210494
2014-02-07 16:54:39vstinnersetmessages: + msg210493
2014-02-07 16:39:32vstinnersetmessages: + msg210489
2014-02-07 16:18:22vstinnersetmessages: + msg210487
2014-02-07 15:48:35gvanrossumsetmessages: + msg210484
2014-02-07 09:13:31neologixsetmessages: + msg210442
2014-02-06 20:06:25gvanrossumsetmessages: + msg210411
2014-02-05 19:56:18neologixsetmessages: + msg210343
2014-02-04 09:21:55vstinnersetfiles: + add_granularity.patch
2014-02-04 09:20:43vstinnersetfiles: + asyncio_accuracy.py

messages: + msg210186
2014-02-03 23:34:16vstinnercreate