Issue20505
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2014-02-03 23:34 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
remove_granularity.patch | vstinner, 2014-02-03 23:34 | review | ||
asyncio_accuracy.py | vstinner, 2014-02-04 09:20 | |||
add_granularity.patch | vstinner, 2014-02-04 09:21 | review | ||
clock_resolution.patch | vstinner, 2014-02-10 18:22 | review | ||
clock_resolution-2.patch | vstinner, 2014-02-10 22:01 | review |
Messages (43) | |||
---|---|---|---|
msg210170 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-03 23:34 | |
To solve a performance issue in asyncio, I added a new resolution attribute to selectors.BaseSelector and a new _granularity attribute to asyncio.BaseEventLoop. If I understood correctly, Charles-François (author and so maintainer of the new selectors module) doesn't like the new resolution attribute because it is a lie (if I understood correctly Charles-François complain). He's right: the kernel can use a clock with a worse resolution. For example, on Linux kernel older than 2.6.28, all selectors (select, poll, epoll) had a resolution of 1/HZ where HZ is usually 100, 250 or 1000. So the best resolution was 1 ms (10^-3), whereas the timeval structure used in select() has a resolution of 1 us (10^-6). On Windows, there is a different issue: the clock used by time.monotonic() has a resolution of 15.6 ms, whereas the type to describe the timeouf of the proactor selector (GetQueuedCompletionStatus function) has a resolution of 1 ms. So the resolution of the clock, time.get_clock_info('monotonic').resolution, should also be used. Attached patch removes the two new attributes to make the code simpler. I created this issue to reopen the discussion, but I would prefer to keep the attributes. I wrote the patch to be able to compare the different options. It's just to ensure that everyone agree on the solution. The patch is a comprise, it solves partially the asyncio performance issue. With the patch, BaseEventLoop._run_once() may not executed a task even if there was a scheduled task, just because of timing rounding issues. So I modified the existing patch to tolerate "useless calls" to _run_once(). It was hard for me to analyze and choose the best solution for these issues because on my Fedora 20 (Linux kernel 3.12), the timings look "perfect". I'm unable to reproduce the sporadic timing issues (selector sleeping 8 ms or 9 ms whereas we asked to sleep 10 ms), whereas the test failed on many different buildbots. I suppose that depending on the hardware and the Linux kernel version, the exact resolution selectors is very different. I didn't test the patch on Windows yet. See issues #20311 and #20452 for the background and raw numbers. Note: the test_timeout_rounding() test of test_asyncio now pass on all buildbots: various platforms, selectors and hardwares. Note 2: If we decide to keep the BaseSelector.resolution attribute, I hesitate to replace it with a method because we might improve the read resolution later, which may require to execute code. For example, on Linux older than 2.6.28, the resolution can be computed by retrieving the HZ constant. It requires to check the OS and kernel version. |
|||
msg210186 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-04 09:20 | |
Here is a script to measure the accuracy of asyncio: min/max difference between the scheduled time and the real elapsed time. It's not directly related to the attached patch, but it can help if you want to implement a different option like Results on my Fedora 20 (Linux kernel 3.12) with Python compiled in debug mode. Results with the original code: --- [ asyncio accuracy ] call_later(0.0001 us): difference in [59.4839 us; 101.6307 us], [+59483938.5%, +101630708.8%] call_later(0.0005 us): difference in [59.1285 us; 129.9486 us], [+11825694.4%, +25989713.7%] call_later(0.0010 us): difference in [58.9950 us; 102.4589 us], [+5899502.5%, +10245891.9%] call_later(0.0015 us): difference in [59.0737 us; 100.6987 us], [+3938245.9%, +6713245.6%] call_later(0.0020 us): difference in [58.6790 us; 112.8620 us], [+2933950.3%, +5643097.8%] call_later(0.0030 us): difference in [58.9260 us; 118.2042 us], [+1964199.0%, +3940139.5%] call_later(0.0040 us): difference in [59.3839 us; 117.6248 us], [+1484597.0%, +2940620.9%] call_later(0.0050 us): difference in [58.9361 us; 115.3991 us], [+1178721.5%, +2307982.8%] call_later(0.1 us): difference in [61.1 us; 133.5 us], [+61131.0%, +133512.9%] call_later(0.5 us): difference in [59.0 us; 124.8 us], [+11793.8%, +24953.6%] call_later(1.0 us): difference in [57.7 us; 113.7 us], [+5770.7%, +11372.1%] call_later(1.5 us): difference in [57.2 us; 113.5 us], [+3816.0%, +7563.7%] call_later(2.0 us): difference in [56.4 us; 130.1 us], [+2822.0%, +6506.8%] call_later(3.0 us): difference in [55.1 us; 99.7 us], [+1836.5%, +3324.5%] call_later(4.0 us): difference in [53.8 us; 98.5 us], [+1345.3%, +2462.4%] call_later(5.0 us): difference in [54.6 us; 97.0 us], [+1091.2%, +1939.9%] call_later(10.0 us): difference in [76.8 us; 114.3 us], [+767.6%, +1142.5%] call_later(50.0 us): difference in [44.5 us; 1154.6 us], [+88.9%, +2309.3%] call_later(0.1 ms): difference in [1.1 ms; 1.2 ms], [+1061.3%, +1153.8%] call_later(100.0 us): difference in [1066.1 us; 1134.3 us], [+1066.1%, +1134.3%] call_later(150.0 us): difference in [1005.0 us; 1104.6 us], [+670.0%, +736.4%] call_later(200.0 us): difference in [940.3 us; 1024.9 us], [+470.2%, +512.4%] call_later(300.0 us): difference in [854.2 us; 942.4 us], [+284.7%, +314.1%] call_later(400.0 us): difference in [756.3 us; 841.4 us], [+189.1%, +210.3%] call_later(0.5 ms): difference in [0.6 ms; 0.8 ms], [+129.2%, +151.6%] call_later(500.0 us): difference in [648.7 us; 734.8 us], [+129.7%, +147.0%] call_later(1.0 ms): difference in [0.1 ms; 0.3 ms], [+14.7%, +26.0%] call_later(1.5 ms): difference in [0.7 ms; 0.8 ms], [+43.5%, +51.3%] call_later(2.0 ms): difference in [0.2 ms; 0.2 ms], [+7.7%, +10.6%] call_later(3.0 ms): difference in [0.2 ms; 0.2 ms], [+6.2%, +8.1%] call_later(4.0 ms): difference in [0.2 ms; 0.2 ms], [+4.1%, +6.0%] call_later(5.0 ms): difference in [0.2 ms; 0.3 ms], [+3.0%, +5.6%] call_later(10.0 ms): difference in [0.1 ms; 0.3 ms], [+1.3%, +2.6%] call_later(10.0 ms): difference in [0.2 ms; 0.3 ms], [+2.0%, +2.8%] call_later(15.0 ms): difference in [0.2 ms; 0.3 ms], [+1.2%, +1.7%] call_later(20.0 ms): difference in [0.2 ms; 1.9 ms], [+1.1%, +9.3%] call_later(30.0 ms): difference in [0.2 ms; 0.3 ms], [+0.7%, +1.0%] call_later(40.0 ms): difference in [0.2 ms; 0.3 ms], [+0.5%, +0.8%] call_later(50.0 ms): difference in [0.2 ms; 0.3 ms], [+0.4%, +0.5%] call_later(100.0 ms): difference in [0.3 ms; 0.4 ms], [+0.3%, +0.4%] call_later(100.000 ms): difference in [0.254 ms; 0.384 ms], [+0.3%, +0.4%] call_later(150.0 ms): difference in [0.3 ms; 0.4 ms], [+0.2%, +0.3%] call_later(200.0 ms): difference in [0.2 ms; 0.4 ms], [+0.1%, +0.2%] call_later(300.0 ms): difference in [0.5 ms; 0.5 ms], [+0.2%, +0.2%] call_later(400.0 ms): difference in [0.5 ms; 0.7 ms], [+0.1%, +0.2%] call_later(500.0 ms): difference in [0.7 ms; 0.8 ms], [+0.1%, +0.2%] call_later(500.000 ms): difference in [0.297 ms; 0.786 ms], [+0.1%, +0.2%] call_later(1000.000 ms): difference in [0.439 ms; 1.285 ms], [+0.0%, +0.1%] call_later(1500.000 ms): difference in [0.916 ms; 1.759 ms], [+0.1%, +0.1%] call_later(2000.000 ms): difference in [0.678 ms; 2.297 ms], [+0.0%, +0.1%] call_later(3000.000 ms): difference in [0.444 ms; 3.275 ms], [+0.0%, +0.1%] call_later(4000.000 ms): difference in [0.516 ms; 4.255 ms], [+0.0%, +0.1%] call_later(5000.000 ms): difference in [1.051 ms; 5.242 ms], [+0.0%, +0.1%] Global accuracy in [44.5 us; 5.2 ms] Loop granularity: 1.0 ms Selector resolution: 1.0 ms Event loop: _UnixSelectorEventLoop Selector: EpollSelector --- IMO a maximum difference of 5.2 ms with a granularity of 1 ms is very good. For a call scheduled in 1 ms, the difference is in range [0.1 ms; 0.3 ms] which is also very good. Results with remove_granularity.patch: --- [ asyncio accuracy ] call_later(0.0001 us): difference in [65.1941 us; 95.9378 us], [+65194110.0%, +95937766.7%] call_later(0.0005 us): difference in [64.9974 us; 109.2616 us], [+12999486.8%, +21852313.2%] call_later(0.0010 us): difference in [61.7960 us; 106.4841 us], [+6179597.8%, +10648409.5%] call_later(0.0015 us): difference in [61.6555 us; 106.4587 us], [+4110369.8%, +7097245.5%] call_later(0.0020 us): difference in [61.0229 us; 105.4591 us], [+3051145.6%, +5272955.3%] call_later(0.0030 us): difference in [61.7439 us; 115.1779 us], [+2058130.7%, +3839261.8%] call_later(0.0040 us): difference in [60.6982 us; 106.1491 us], [+1517455.2%, +2653727.0%] call_later(0.0050 us): difference in [61.3559 us; 105.6279 us], [+1227117.7%, +2112558.7%] call_later(0.1 us): difference in [68.4 us; 114.5 us], [+68436.0%, +114519.0%] call_later(0.5 us): difference in [63.2 us; 99.4 us], [+12644.4%, +19883.2%] call_later(1.0 us): difference in [61.6 us; 99.5 us], [+6164.7%, +9948.8%] call_later(1.5 us): difference in [60.8 us; 96.3 us], [+4055.1%, +6417.1%] call_later(2.0 us): difference in [59.9 us; 100.7 us], [+2995.2%, +5034.9%] call_later(3.0 us): difference in [58.6 us; 94.5 us], [+1952.4%, +3150.7%] call_later(4.0 us): difference in [57.3 us; 106.6 us], [+1432.8%, +2664.1%] call_later(5.0 us): difference in [61.1 us; 97.1 us], [+1221.9%, +1942.7%] call_later(10.0 us): difference in [86.5 us; 143.5 us], [+865.3%, +1434.7%] call_later(50.0 us): difference in [33.3 us; 1164.4 us], [+66.6%, +2328.8%] call_later(0.1 ms): difference in [1.1 ms; 1.2 ms], [+1055.1%, +1178.2%] call_later(100.0 us): difference in [1071.8 us; 1147.2 us], [+1071.8%, +1147.2%] call_later(150.0 us): difference in [997.9 us; 1097.1 us], [+665.3%, +731.4%] call_later(200.0 us): difference in [974.7 us; 1040.0 us], [+487.3%, +520.0%] call_later(300.0 us): difference in [868.3 us; 932.4 us], [+289.4%, +310.8%] call_later(400.0 us): difference in [770.8 us; 830.7 us], [+192.7%, +207.7%] call_later(0.5 ms): difference in [0.6 ms; 0.8 ms], [+129.2%, +151.6%] call_later(500.0 us): difference in [675.5 us; 723.7 us], [+135.1%, +144.7%] call_later(1.0 ms): difference in [0.1 ms; 0.3 ms], [+14.6%, +30.2%] call_later(1.5 ms): difference in [0.7 ms; 0.8 ms], [+43.8%, +52.0%] call_later(2.0 ms): difference in [0.1 ms; 0.2 ms], [+7.3%, +12.2%] call_later(3.0 ms): difference in [0.2 ms; 0.3 ms], [+5.8%, +9.0%] call_later(4.0 ms): difference in [0.2 ms; 0.2 ms], [+3.9%, +6.1%] call_later(5.0 ms): difference in [0.2 ms; 0.3 ms], [+3.7%, +5.6%] call_later(10.0 ms): difference in [0.2 ms; 0.3 ms], [+1.8%, +2.6%] call_later(10.0 ms): difference in [0.2 ms; 0.3 ms], [+2.1%, +2.8%] call_later(15.0 ms): difference in [0.2 ms; 0.3 ms], [+1.4%, +1.8%] call_later(20.0 ms): difference in [0.2 ms; 0.3 ms], [+1.1%, +1.5%] call_later(30.0 ms): difference in [0.2 ms; 0.3 ms], [+0.7%, +1.0%] call_later(40.0 ms): difference in [0.2 ms; 0.3 ms], [+0.6%, +0.7%] call_later(50.0 ms): difference in [0.2 ms; 0.3 ms], [+0.4%, +0.6%] call_later(100.0 ms): difference in [0.3 ms; 0.3 ms], [+0.3%, +0.3%] call_later(100.000 ms): difference in [0.269 ms; 0.325 ms], [+0.3%, +0.3%] call_later(150.0 ms): difference in [0.3 ms; 0.4 ms], [+0.2%, +0.3%] call_later(200.0 ms): difference in [0.4 ms; 0.5 ms], [+0.2%, +0.2%] call_later(300.0 ms): difference in [0.3 ms; 0.6 ms], [+0.1%, +0.2%] call_later(400.0 ms): difference in [0.5 ms; 0.7 ms], [+0.1%, +0.2%] call_later(500.0 ms): difference in [0.7 ms; 0.8 ms], [+0.1%, +0.2%] call_later(500.000 ms): difference in [0.698 ms; 0.748 ms], [+0.1%, +0.1%] call_later(1000.000 ms): difference in [1.026 ms; 1.286 ms], [+0.1%, +0.1%] call_later(1500.000 ms): difference in [0.645 ms; 1.744 ms], [+0.0%, +0.1%] call_later(2000.000 ms): difference in [1.007 ms; 2.304 ms], [+0.1%, +0.1%] call_later(3000.000 ms): difference in [1.165 ms; 3.256 ms], [+0.0%, +0.1%] call_later(4000.000 ms): difference in [0.742 ms; 4.235 ms], [+0.0%, +0.1%] call_later(5000.000 ms): difference in [1.064 ms; 5.267 ms], [+0.0%, +0.1%] Global accuracy in [33.3 us; 5.3 ms] Event loop: _UnixSelectorEventLoop Selector: EpollSelector --- No surprise, the accuracy is the same. Another approach: always adds granularity to the timeout in asyncio (and selector rounds toward zero, as in Python 3.3). It looks like the test_timeout_rounding() test still pass with this option. Results with add_granularity.patch: --- [ asyncio accuracy ] call_later(0.0001 us): difference in [1136.6910 us; 1234.2710 us], [+1136690976.7%, +1234270997.9%] call_later(0.0005 us): difference in [1134.2053 us; 1230.9015 us], [+226841068.5%, +246180307.7%] call_later(0.0010 us): difference in [1149.2229 us; 1219.4788 us], [+114922288.5%, +121947883.3%] call_later(0.0015 us): difference in [1162.8546 us; 1222.8143 us], [+77523641.3%, +81520955.4%] call_later(0.0020 us): difference in [1163.5149 us; 1217.9579 us], [+58175744.6%, +60897895.7%] call_later(0.0030 us): difference in [1179.6120 us; 1227.1221 us], [+39320401.2%, +40904068.7%] call_later(0.0040 us): difference in [1165.2680 us; 1209.9680 us], [+29131699.2%, +30249199.0%] call_later(0.0050 us): difference in [1170.0682 us; 1226.5110 us], [+23401363.4%, +24530219.5%] call_later(0.1 us): difference in [1162.8 us; 1248.8 us], [+1162765.9%, +1248821.0%] call_later(0.5 us): difference in [1154.8 us; 1247.5 us], [+230968.4%, +249492.4%] call_later(1.0 us): difference in [1146.2 us; 1231.9 us], [+114622.4%, +123185.5%] call_later(1.5 us): difference in [1125.9 us; 1220.8 us], [+75061.3%, +81385.5%] call_later(2.0 us): difference in [1121.4 us; 1211.2 us], [+56071.7%, +60558.3%] call_later(3.0 us): difference in [1138.9 us; 1228.9 us], [+37962.0%, +40962.5%] call_later(4.0 us): difference in [1146.9 us; 1227.2 us], [+28673.0%, +30680.0%] call_later(5.0 us): difference in [1149.5 us; 1247.1 us], [+22990.4%, +24942.4%] call_later(10.0 us): difference in [1174.3 us; 1231.1 us], [+11743.4%, +12311.1%] call_later(50.0 us): difference in [1112.6 us; 1187.8 us], [+2225.1%, +2375.6%] call_later(0.1 ms): difference in [1.1 ms; 1.2 ms], [+1059.0%, +1154.9%] call_later(100.0 us): difference in [1061.7 us; 1133.5 us], [+1061.7%, +1133.5%] call_later(150.0 us): difference in [1016.0 us; 1080.4 us], [+677.4%, +720.2%] call_later(200.0 us): difference in [938.1 us; 1194.4 us], [+469.0%, +597.2%] call_later(300.0 us): difference in [834.9 us; 930.1 us], [+278.3%, +310.0%] call_later(400.0 us): difference in [754.3 us; 852.6 us], [+188.6%, +213.1%] call_later(0.5 ms): difference in [0.6 ms; 0.7 ms], [+124.3%, +149.0%] call_later(500.0 us): difference in [646.8 us; 738.0 us], [+129.4%, +147.6%] call_later(1.0 ms): difference in [0.1 ms; 0.3 ms], [+11.8%, +26.1%] call_later(1.5 ms): difference in [0.6 ms; 0.8 ms], [+41.1%, +50.3%] call_later(2.0 ms): difference in [0.1 ms; 0.2 ms], [+6.2%, +10.8%] call_later(3.0 ms): difference in [0.1 ms; 0.2 ms], [+4.3%, +7.9%] call_later(4.0 ms): difference in [0.1 ms; 0.2 ms], [+3.4%, +5.7%] call_later(5.0 ms): difference in [0.2 ms; 0.4 ms], [+3.2%, +7.1%] call_later(10.0 ms): difference in [0.1 ms; 0.2 ms], [+1.4%, +2.4%] call_later(10.0 ms): difference in [0.2 ms; 0.3 ms], [+1.6%, +2.6%] call_later(15.0 ms): difference in [0.2 ms; 0.3 ms], [+1.3%, +1.7%] call_later(20.0 ms): difference in [0.2 ms; 0.3 ms], [+1.0%, +1.5%] call_later(30.0 ms): difference in [0.2 ms; 0.3 ms], [+0.8%, +0.9%] call_later(40.0 ms): difference in [0.2 ms; 0.3 ms], [+0.5%, +0.7%] call_later(50.0 ms): difference in [0.2 ms; 0.3 ms], [+0.4%, +0.5%] call_later(100.0 ms): difference in [0.2 ms; 0.3 ms], [+0.2%, +0.3%] call_later(100.000 ms): difference in [0.247 ms; 0.345 ms], [+0.2%, +0.3%] call_later(150.0 ms): difference in [0.3 ms; 0.4 ms], [+0.2%, +0.3%] call_later(200.0 ms): difference in [0.3 ms; 0.4 ms], [+0.2%, +0.2%] call_later(300.0 ms): difference in [0.2 ms; 0.5 ms], [+0.1%, +0.2%] call_later(400.0 ms): difference in [0.6 ms; 0.6 ms], [+0.1%, +0.2%] call_later(500.0 ms): difference in [0.4 ms; 0.7 ms], [+0.1%, +0.1%] call_later(500.000 ms): difference in [0.365 ms; 0.781 ms], [+0.1%, +0.2%] call_later(1000.000 ms): difference in [1.176 ms; 1.239 ms], [+0.1%, +0.1%] call_later(1500.000 ms): difference in [1.650 ms; 1.727 ms], [+0.1%, +0.1%] call_later(2000.000 ms): difference in [0.318 ms; 2.241 ms], [+0.0%, +0.1%] call_later(3000.000 ms): difference in [1.169 ms; 3.254 ms], [+0.0%, +0.1%] call_later(4000.000 ms): difference in [1.297 ms; 4.238 ms], [+0.0%, +0.1%] call_later(5000.000 ms): difference in [0.975 ms; 5.272 ms], [+0.0%, +0.1%] Global accuracy in [117.8 us; 5.3 ms] Loop granularity: 1.0 ms Selector resolution: 1.0 ms Event loop: _UnixSelectorEventLoop Selector: EpollSelector --- The accuracy is almost the same, except that the minimum accuracy (117.8 us) is bigger than the current implementation (44.5 us). But I don't know if my micro-benchmark is reliable. Note: I also tested to always add 2 * granularity to asyncio timeout, but the minimum difference increases to 1 ms (as expected) which is much worse. |
|||
msg210343 - (view) | Author: Charles-François Natali (neologix) * | Date: 2014-02-05 19:56 | |
> To solve a performance issue in asyncio, I added a new resolution attribute to selectors.BaseSelector and a new _granularity attribute to asyncio.BaseEventLoop. If I understood correctly, Charles-François (author and so maintainer of the new selectors module) doesn't like the new resolution attribute because it is a lie (if I understood correctly Charles-François complain). Here are the reasons I don't like this attribute: - it's a low-level implementation detail, which shouldn't be part of the API - it's actually impossible to provide a sensible value for the granularity, because it depends not only on the operating system, but also on the actual version on the OS, and the hardware, as Victor's extensive tests showed - what's the definition of this "granularity" anyway? - but most importantly, it's useless: the performance problem initially identified was due to the rounding of select/epoll/etc of timeouts towards zero, which means that it they got passed a timeout smaller than the syscall resolution, the syscall would end up being called in a tight loop. Now that the timeouts are rounded away from 0, that's not an issue anymore Let me state this last point once again: no busy loop can occur now that timeouts are rounded up. Sure, some syscalls, on some OS, can sometimes return a little bit earlier than expected, e.g. epoll can return after 0.98ms instead of 1ms. But that's not an issue, if that's the case you'll just loop an extra time, *all* event loops just work this way. Furthermore, note than due to rounding, the actual probability of having the syscall return earlier than expected is really low: for example, if the loop wants to wake up in 5.3ms, it will be rounded to 6ms, and even a slight early wakeup won't be enough to wake up before 5.3ms (typically it'll return after 5.98ms or so). > The patch is a comprise, it solves partially the asyncio performance issue. With the patch, BaseEventLoop._run_once() may not executed a task even if there was a scheduled task, just because of timing rounding issues. So I modified the existing patch to tolerate "useless calls" to _run_once(). If you want to keep the current approach, nothing prevents from using a fixed "slack" value, independant of the selector (e.g. 1ms seems reasonable). |
|||
msg210411 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2014-02-06 20:06 | |
> > Let me state this last point once again: no busy loop can occur now that > timeouts are rounded up. > How sure are you? Suppose I use poll() with a 0.5 msec timeout. This presumably gets rounded up to 1 msec. But if the system clock has e.g. a 10 msec resolution, won't this still wait 0 msec? Or will it wait until the next "tick" occurs, which could be anywhere between 0 and 10 msec in the future? But if so, why wouldn't a poll() with a 0 msec timeout also wait between 0 and 10 msec? |
|||
msg210442 - (view) | Author: Charles-François Natali (neologix) * | Date: 2014-02-07 09:13 | |
> How sure are you? Suppose I use poll() with a 0.5 msec timeout. This > presumably gets rounded up to 1 msec. But if the system clock has e.g. a 10 > msec resolution, won't this still wait 0 msec? Or will it wait until the > next "tick" occurs, which could be anywhere between 0 and 10 msec in the > future? It depends :-) With high-resolution timers, you'll get 1ms. Without, you'll likely get 10ms (time quantum). An implementation returning without delay would be seriously broken. > But if so, why wouldn't a poll() with a 0 msec timeout also wait > between 0 and 10 msec? Because the kernel doesn't suspend the current task/thread, it just checks the list of currently FDs, and returns immediately. See e.g. http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/select.c?id=797a796a13df6b84a4791e57306737059b5b2384#n772 |
|||
msg210484 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2014-02-07 15:48 | |
OK, this sounds like rounding up is important to avoid busy-wait (but maybe only when rounding down would give us zero), and we shouldn't have to worry about a courser timer, it will just make us late and that's always acceptable. On Fri, Feb 7, 2014 at 1:13 AM, Charles-François Natali < report@bugs.python.org> wrote: > > Charles-François Natali added the comment: > > > How sure are you? Suppose I use poll() with a 0.5 msec timeout. This > > presumably gets rounded up to 1 msec. But if the system clock has e.g. a > 10 > > msec resolution, won't this still wait 0 msec? Or will it wait until the > > next "tick" occurs, which could be anywhere between 0 and 10 msec in the > > future? > > It depends :-) > With high-resolution timers, you'll get 1ms. > Without, you'll likely get 10ms (time quantum). > An implementation returning without delay would be seriously broken. > > > But if so, why wouldn't a poll() with a 0 msec timeout also wait > > between 0 and 10 msec? > > Because the kernel doesn't suspend the current task/thread, it just checks > the list of currently FDs, and returns immediately. > See e.g. > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/select.c?id=797a796a13df6b84a4791e57306737059b5b2384#n772 > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue20505> > _______________________________________ > |
|||
msg210487 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-07 16:18 | |
About Windows, it looks like select() and GetQueuedCompletionStatus() (poll function the IocpProactor) round the timeout away from zero (the timeout is a number of milliseconds, an integer). Examples with GetQueuedCompletionStatus(): - 2 ms (timeout) => 15 ms (elasped time) - 100 ms => 109 ms - 157 ms => 172 ms Example with select(): - 2 ms => 16 ms - 100 ms => 110 ms - 156 ms => 171 ms |
|||
msg210489 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-07 16:39 | |
"If you want to keep the current approach, nothing prevents from using a fixed "slack" value, independant of the selector (e.g. 1ms seems reasonable)." select() and kqueue() are able to sleep less than 1 ms. Using a slack of 1 ms would reduce the accuracy. I don't see why we should limit the accuracy. Why 1 ms? Because of poll/epoll? What about Windows and its resolution of 15.6 ms? Well, under 1 ms, don't expect accurate results :-) For example on my fast Linux box, a select() with a timeout of 61 us takes 137 us (elapsed time measued in Python). |
|||
msg210493 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-07 16:54 | |
"Let me state this last point once again: no busy loop can occur now that timeouts are rounded up." Agreed. The busy loop issue was solved by rounding the timeout away from zero in select and selectors module. "Sure, some syscalls, on some OS, can sometimes return a little bit earlier than expected, e.g. epoll can return after 0.98ms instead of 1ms. But that's not an issue, if that's the case you'll just loop an extra time, *all* event loops just work this way." I would like to avoid this *useless* loop, because such case can be common. What's the point of polling again? We just polled all file descriptors. In msg209772: I saw on "x86 Gentoo 3.x" that "waiting 99.9 ms took 99.6 ms and 99.9 ms, and waiting 9.9 ms took 9.7 ms." But I remember that I saw larger difference, but I don't remember in which state the code was. So let's get more data from buildbots: I add again the "dt < timeout" message. |
|||
msg210494 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-02-07 16:57 | |
New changeset 3b94a4ef244e by Victor Stinner in branch 'default': Issue #20505: add debug info http://hg.python.org/cpython/rev/3b94a4ef244e |
|||
msg210501 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-07 17:37 | |
> New changeset 3b94a4ef244e by Victor Stinner in branch 'default': > Issue #20505: add debug info Some results. AMD64 Windows7 SP1 3.x: == Windows-7-6.1.7601-SP1 little-endian SelectSelector.select(156.001 ms) took 156.000 ms (granularity=15.600 ms, resolution=0.001 ms) x86 Ubuntu Shared 3.x: == Linux-2.6.31.5-linode21-i686-with-debian-lenny-sid little-endian EpollSelector.select(99.947 ms) took 99.534 ms (granularity=1.000 ms, resolution=1.000 ms) Oh, for Windows: select() was called with a timeout of 157 ms but slept during 156 ms. So sometimes it's a little less than the timeout (1 ms). |
|||
msg210532 - (view) | Author: Charles-François Natali (neologix) * | Date: 2014-02-07 19:25 | |
> select() and kqueue() are able to sleep less than 1 ms. Using a slack of 1 ms would reduce the accuracy. I don't see why we should limit the accuracy. Why 1 ms? Because of poll/epoll? What about Windows and its resolution of 15.6 ms? > > Well, under 1 ms, don't expect accurate results :-) For example on my fast Linux box, a select() with a timeout of 61 us takes 137 us (elapsed time measued in Python). That's precisely why I suggested 1ms. And, by definition, it would only be used if the select()/poll() call returns earlier in the first place, so it doesn't reduce the accuracy. > I would like to avoid this *useless* loop, because such case can be common. > What's the point of polling again? We just polled all file descriptors. "If it ain't broke, don't fix it": for the 10th time, all event loops work like this, and this has *never* been an issue. So I suggest we stop making up complicated solutions to non-existent problems. |
|||
msg210558 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2014-02-07 21:22 | |
In the end I think Charles-Francois is right -- the rounding up in selectmodule.c (for epoll) and selectors.py (for poll and, redundantly[*], for epoll) should be sufficient to avoid the busy-wait behavior with delays under 1ms. We're not promising real-time guarantees at any level, so both returning too early and too late must be acceptable. If a test fails frequently despite relaxed constraints, that's a problem with the test. So I think Victor should apply his first patch (remove_granularity.patch) to both CPython and Tulip and then we can adjust any tests that still cause occasional red buildbots. [*] It's not redundant when using Tulip on Python 3.3, so I prefer to keep it. |
|||
msg210573 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-02-07 22:37 | |
New changeset d853955491a2 by Victor Stinner in branch 'default': Issue #20505: Remove resolution and _granularity from selectors and asyncio http://hg.python.org/cpython/rev/d853955491a2 |
|||
msg210574 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-07 22:43 | |
More data from buildbots (before d853955491a2). x86 Ubuntu Shared 3.x: == Linux-2.6.31.5-linode21-i686-with-debian-lenny-sid little-endian EpollSelector.select(99.947 ms) took 99.291 ms (granularity=1.000 ms, resolution=1.000 ms) EpollSelector.select(99.928 ms) took 99.553 ms (granularity=1.000 ms, resolution=1.000 ms) EpollSelector.select(499.962 ms) took 100.384 ms (granularity=1.000 ms, resolution=1.000 ms) EpollSelector.select(398.823 ms) took 398.241 ms (granularity=1.000 ms, resolution=1.000 ms) EpollSelector.select(9.918 ms) took 9.209 ms (granularity=1.000 ms, resolution=1.000 ms) PollSelector.select(499.963 ms) took 101.114 ms (granularity=1.000 ms, resolution=1.000 ms) SelectSelector.select(499.940 ms) took 101.417 ms (granularity=0.001 ms, resolution=0.001 ms) x86 Gentoo Non-Debug 3.x == Linux-2.6.35-vs2.3.0.36.32-gentoo-i686-Intel-R-_Core-TM-2_CPU_6600_@_2.40GHz-with-gentoo-2.1 little-endian [130/389] test_asyncio EpollSelector.select(99.986 ms) took 99.796 ms (granularity=1.000 ms, resolution=1.000 ms) EpollSelector.select(99.980 ms) took 99.346 ms (granularity=1.000 ms, resolution=1.000 ms) EpollSelector.select(499.987 ms) took 100.093 ms (granularity=1.000 ms, resolution=1.000 ms) EpollSelector.select(9.979 ms) took 9.573 ms (granularity=1.000 ms, resolution=1.000 ms) PollSelector.select(499.987 ms) took 100.038 ms (granularity=1.000 ms, resolution=1.000 ms) SelectSelector.select(499.987 ms) took 100.048 ms (granularity=0.001 ms, resolution=0.001 ms) x86 Gentoo 3.x: == Linux-2.6.35-vs2.3.0.36.32-gentoo-i686-Intel-R-_Core-TM-2_CPU_6600_@_2.40GHz-with-gentoo-2.1 little-endian [302/389/1] test_asyncio EpollSelector.select(99.971 ms) took 99.771 ms (granularity=1.000 ms, resolution=1.000 ms) EpollSelector.select(99.955 ms) took 99.737 ms (granularity=1.000 ms, resolution=1.000 ms) EpollSelector.select(499.973 ms) took 100.077 ms (granularity=1.000 ms, resolution=1.000 ms) EpollSelector.select(399.545 ms) took 399.015 ms (granularity=1.000 ms, resolution=1.000 ms) EpollSelector.select(9.949 ms) took 9.552 ms (granularity=1.000 ms, resolution=1.000 ms) PollSelector.select(499.973 ms) took 100.059 ms (granularity=1.000 ms, resolution=1.000 ms) SelectSelector.select(499.973 ms) took 100.072 ms (granularity=0.001 ms, resolution=0.001 ms) "select(499.973 ms) took 100.077" are tests on signals interrupting the select. |
|||
msg210575 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-07 22:49 | |
> So I think Victor should apply his first patch (remove_granularity.patch) to both CPython and Tulip and then we can adjust any tests that still cause occasional red buildbots. Ok, I did that. So I understand that the code was a micro-optimization for a minor or zero gain. Sorry Charles-François for having insisted so much to keep it :) |
|||
msg210821 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-10 10:40 | |
Sorry, I have to reopen this topic again, a buildbot failed :-( Builds 10136, 10134 and 10131 failed on "failure x86 XP-4 3.x" buildbot. It's sporadic probably because the test fails sometimes depending how the timing is rounded. http://buildbot.python.org/all/builders/x86%20XP-4%203.x/builds/10136/steps/test/logs/stdio ====================================================================== FAIL: test_timeout_rounding (test.test_asyncio.test_events.SelectEventLoopTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\test\test_asyncio\test_events.py", line 1185, in test_timeout_rounding self.assertLessEqual(self.loop._run_once_counter, 8) AssertionError: 114 not less than or equal to 8 http://buildbot.python.org/all/builders/x86%20XP-4%203.x/builds/10134/steps/test/logs/stdio ====================================================================== FAIL: test_timeout_rounding (test.test_asyncio.test_events.SelectEventLoopTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\test\test_asyncio\test_events.py", line 1185, in test_timeout_rounding self.assertLessEqual(self.loop._run_once_counter, 8) AssertionError: 181 not less than or equal to 8 http://buildbot.python.org/all/builders/x86%20XP-4%203.x/builds/10131/steps/test/logs/stdio ====================================================================== FAIL: test_timeout_rounding (test.test_asyncio.test_events.SelectEventLoopTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\test\test_asyncio\test_events.py", line 1185, in test_timeout_rounding self.assertLessEqual(self.loop._run_once_counter, 8) AssertionError: 158 not less than or equal to 8 |
|||
msg210822 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-02-10 10:48 | |
New changeset 32dc137f5e26 by Victor Stinner in branch 'default': Issue #20505: Add debug info to analyze sporaric failures of http://hg.python.org/cpython/rev/32dc137f5e26 |
|||
msg210855 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-10 17:56 | |
Ok, I reproduced the issue on my Windows 7 after enabling HPET: == CPython 3.4.0b3+ (default:1b96d08c3895, Feb 10 2014, 18:44:24) [MSC v.1600 32 bit (Intel)] == Windows-7-6.1.7601-SP1 little-endian [1/1] test_asyncio test_timeout_rounding (test.test_asyncio.test_events.ProactorEventLoopTests) ... asyncio: selector.select(0.100 ms) took 0.000 ms asyncio: selector.select(0.100 ms) took 0.000 ms asyncio: selector.select(0.100 ms) took 0.000 ms asyncio: selector.select(0.100 ms) took 0.000 ms asyncio: selector.select(0.100 ms) took 0.000 ms asyncio: selector.select(0.100 ms) took 0.000 ms asyncio: selector.select(0.100 ms) took 0.000 ms asyncio: selector.select(0.100 ms) took 0.000 ms asyncio: selector.select(0.100 ms) took 0.000 ms asyncio: selector.select(0.100 ms) took 0.000 ms asyncio: selector.select(0.100 ms) took 0.000 ms asyncio: selector.select(0.100 ms) took 0.000 ms FAIL test_timeout_rounding (test.test_asyncio.test_events.SelectEventLoopTests) ... a syncio: selector.select(0.100 ms) took 0.000 ms asyncio: selector.select(0.100 ms) took 0.000 ms asyncio: selector.select(0.100 ms) took 0.000 ms asyncio: selector.select(0.100 ms) took 0.000 ms asyncio: selector.select(0.100 ms) took 0.000 ms asyncio: selector.select(0.100 ms) took 0.000 ms asyncio: selector.select(0.100 ms) took 0.000 ms FAIL ====================================================================== FAIL: test_timeout_rounding (test.test_asyncio.test_events.ProactorEventLoopTest s) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\haypo\python_default\lib\test\test_asyncio\test_events.py", line 1188 , in test_timeout_rounding 'selector': self.loop._selector.__class__.__name__}) AssertionError: 18 not less than or equal to 8 : {'time_info': namespace(adjusta ble=True, implementation='GetSystemTimeAsFileTime()', monotonic=False, resolutio n=0.015600099999999999), 'selector': 'IocpProactor', 'monotonic_info': namespace (adjustable=False, implementation='GetTickCount64()', monotonic=True, resolution =0.015600099999999999)} ====================================================================== FAIL: test_timeout_rounding (test.test_asyncio.test_events.SelectEventLoopTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\haypo\python_default\lib\test\test_asyncio\test_events.py", line 1188 , in test_timeout_rounding 'selector': self.loop._selector.__class__.__name__}) AssertionError: 13 not less than or equal to 8 : {'time_info': namespace(adjusta ble=True, implementation='GetSystemTimeAsFileTime()', monotonic=False, resolutio n=0.015600099999999999), 'selector': 'SelectSelector', 'monotonic_info': namespa ce(adjustable=False, implementation='GetTickCount64()', monotonic=True, resoluti on=0.015600099999999999)} It looks like GetQueuedCompletionStatus(ms) takes between (ms - 1) and (ms) milliseconds. Examples: GetQueuedCompletionStatus(1 ms)->None took 0.417 ms (monotonic: 0.000 ms) GetQueuedCompletionStatus(1 ms)->None took 0.307 ms (monotonic: 0.000 ms) GetQueuedCompletionStatus(2 ms)->None took 1.101 ms (monotonic: 16.000 ms) GetQueuedCompletionStatus(10 ms)->None took 9.971 ms (monotonic: 0.000 ms) GetQueuedCompletionStatus(10 ms)->None took 9.908 ms (monotonic: 15.000 ms) GetQueuedCompletionStatus(2 ms)->None took 1.634 ms (monotonic: 0.000 ms) I used time.perf_counter() to measure elapsed time, with elapsed "monotonic" time in parenthesis. As you can see, the monotonic clock is usually unchanged for such short sleep. |
|||
msg210856 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-10 18:12 | |
On the same Windows 7 virtual machine, but with HPET disabled, the test pass. [1/1] test_asyncio test_timeout_rounding (test.test_asyncio.test_events.ProactorEventLoopTests) ... GetQueuedCompletionStatus(2 ms)->None took 4.350 ms (monotonic: 16.000 ms) GetQueuedCompletionStatus(10 ms)->None took 14.737 ms (monotonic: 16.000 ms) GetQueuedCompletionStatus(2 ms)->None took 14.700 ms (monotonic: 15.000 ms) GetQueuedCompletionStatus(2 ms)->None took 12.553 ms (monotonic: 16.000 ms) GetQueuedCompletionStatus(2 ms)->None took 15.076 ms (monotonic: 15.000 ms) GetQueuedCompletionStatus(2 ms)->None took 14.524 ms (monotonic: 16.000 ms) GetQueuedCompletionStatus(2 ms)->None took 14.861 ms (monotonic: 16.000 ms) GetQueuedCompletionStatus(2 ms)->None took 14.617 ms (monotonic: 15.000 ms) GetQueuedCompletionStatus(0 ms)->None took 0.012 ms (monotonic: 0.000 ms) ok As you can see, the elapsed monotonic time if only zero when the timeout was zero, otherwise it is at least 15 ms. So to fix the performance issue on Windows with HPET enabled, I see 3 options: - round the timeout of GetQueuedCompletionStatus() (asyncio module) and select.select() (select module) using the resolution of the clock - use a more precise clock: time.perf_counter(). I don't like this clock because it is not reliable, especially on duration longer than 2 minutes - reintroduce the granularity in asyncio (and maybe resolution in selectors): for this specific issue (Windows with HPET enabled), only the resolution of the clock matters in fact IMO the granularity is the best option because it only modify asyncio and is simple. We can try to leave selectors unchanged. Note: only poll and epoll selectors round away from zero. select and kqueue round towards zero. But it's not important IMO. |
|||
msg210857 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-02-10 18:18 | |
New changeset 20dc8d6430eb by Victor Stinner in branch 'default': Issue #20505: Use even shorter sleep in test_timeout_rounding() to make the http://hg.python.org/cpython/rev/20dc8d6430eb |
|||
msg210858 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-10 18:22 | |
Here is clock_resolution.patch: reintroduce the old "granularity" but only use the resolution of the clock (expect the selector to round away from zero). clock_resolution.patch fixes test_timeout_rounding() on my Windows 7 with HPET enabled, even with my more agressive version of the test (test also sleep close to nanoseconds, not only close to milliseconds). Maybe we should also fix #20320: round away from zero for select(), kqueue and signal.sigtimedwait(). |
|||
msg210859 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2014-02-10 18:29 | |
How do you know that the timer used by the select/poll/etc. call has the same resolution? The variable 'now' should probably be given a more suitable name. Can the clock resolution be zero? If not, I recommend adjusting the comparisons so that an event scheduled at exactly the rounded-up 'now' value is not considered ready -- it should be strictly before. (But if the clock resolution happens to be zero, an event scheduled at exactly 'now' should be considered ready.) |
|||
msg210875 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-10 22:01 | |
> How do you know that the timer used by the select/poll/etc. call has the same resolution? If I understood correctly, there a 3 kind of clocks on Windows: - kernel heartbeat: GetSystemTimeAdjustment() gives the resolution (a few milliseconds) - multimedia timers - performance counter: the resolution is 1 / QueryPerformanceFrequency() (at least 1 microsecond) GetSystemTimeAsFileTime() (time.time), GetTickCount[64]() (time.monotonic) and GetProcessTimes() (time.process_time) use the kernel heartbeat (I invented this name :-)). GetTickCount() is not adjusted. QueryPerformanceCounter() is the performance counter, it is used by time.perf_counter(). GetSystemTimeAdjustment(): http://msdn.microsoft.com/en-us/library/windows/desktop/ms724394%28v=vs.85%29.aspx For more information, see the PEP 418: http://www.python.org/dev/peps/pep-0418/ > Can the clock resolution be zero? There is a unit test to ensure that the resolution of all clocks is greater than 0 and smaller or equal than 1.0. > If not, I recommend adjusting the comparisons so that an event scheduled at exactly the rounded-up 'now' value is not considered ready -- it should be strictly before. Ok, here is an updated patch. |
|||
msg210879 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2014-02-10 22:12 | |
OK, since the resolution is 1 nsec on my Ubuntu and OSX systems and 15.6 msec on my Windows box, this patch looks fine. |
|||
msg210884 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-02-10 22:43 | |
New changeset 6b850722849e by Victor Stinner in branch 'default': Issue #20505: BaseEventLoop uses again the resolution of the clock to decide if http://hg.python.org/cpython/rev/6b850722849e |
|||
msg210885 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-10 22:53 | |
> OK, since the resolution is 1 nsec on my Ubuntu Yeah, Linux always announce 1 nanosecond, even the real resolution is not so good: https://lkml.org/lkml/2012/2/9/100 On Mac OS X, the resolution is also probably hardcoded to 1 nanosecond. (Python reads the resolution at runtime on Linux, Mac OS X and Windows.) Since the granularity is back, why not using also the resolution of the selector in asyncio? :-) It would avoid useless calls for free. It can be a private attribute if you don't want to make it public. But I don't see why you would like to hide such data since time.get_clock_info(name).resolution is the same lie (at least on Linux and Mac OS X) :-) It's just a question of exposing OS data. No need to compute the effective resolution. |
|||
msg210886 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2014-02-10 22:59 | |
Please, no. This has to stop. |
|||
msg210903 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-02-11 08:04 | |
New changeset 75881b85695f by Victor Stinner in branch 'default': Issue #20505: Fix TestLoop, set the clock resolution http://hg.python.org/cpython/rev/75881b85695f |
|||
msg210904 - (view) | Author: Charles-François Natali (neologix) * | Date: 2014-02-11 08:07 | |
Wow, 10 messages in one night... Could you try to gather all your finding at once, because reading so many messages in difficult to follow? > GetQueuedCompletionStatus(1 ms)->None took 0.307 ms (monotonic: 0.000 ms) So basically, on Windows, select(1ms) can be non-blocking? If yes, this OS is even more broken than I thought, and using the granularity (in asyncio :-) is fine. |
|||
msg210905 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-11 08:12 | |
Victor>> Since the granularity is back, why not using also the resolution of the selector in asyncio? :-) Guido> Please, no. This has to stop. Ok, sorry. If we still have some sporadic failures on UNIX, I think I will maybe use Charles-François's suggestion: use at least 1 ms for the resolution in asyncio loop: max(time.get_clock_info('monotonic').resolution, 1e-3). But it looks like the issue is now fixed again on Windows. |
|||
msg210906 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-11 08:14 | |
>> GetQueuedCompletionStatus(1 ms)->None took 0.307 ms (monotonic: 0.000 ms) > So basically, on Windows, select(1ms) can be non-blocking? I would not say non-blocking: it's just that select(N milliseconds) waits between N-1 and N milliseconds on Windows when HPET hardware (High Precision Event Timer) is present and enabled. Example: GetQueuedCompletionStatus(10 ms)->None took 9.971 ms (monotonic: 0.000 ms) |
|||
msg210908 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-11 08:50 | |
> Wow, 10 messages in one night... Could you try to gather all your finding at once, because reading so many messages in difficult to follow? It's not easy because I collect informations from various buildbots and different virtual machines. But I planned to try to summarize the overall work done on time in asyncio, select and selectors. The first problem is that clocks and selectors have a different resolution. On Linux, poll() has a resolution of 1 ms, whereas clock_gettime(CLOCK_MONOTONIC) has a resolution better than 1 us (0.3 us according to my last test). On Windows, GetTickCount64() has a resolution of 15.6 ms, whereas GetQueuedCompletionStatus() has a resolution of 1 ms when HPET is enabled. Note: It looks like GetQueuedCompletionStatus() has the same resolution than GetTickCount64() when HPET is disabled. The second problem is that before my changes asyncio expected selector.select(timeout) to take at least timeout seconds, whereas it may take 1 or 2 ms less than timeout. This difference is very small (1 ms), but it's enough to create a busy loop. This busy loop is quite short (1 ms), but it may occur each time than call_at() or call_later() is used (ex: each time syncio.sleep() is used). Theorical resolution of selectors (resolution of the C types): * poll, epoll, GetQueuedCompletionStatus: 1 ms (10^-3) * select: 1 us (10^-6) * kqueue: 1 ns (10^-9) Effective resolution of selectors: * poll, epoll: 1 ms... but may sleep 1 or 2 ms less than the timeout, ex: on Linux, epoll_wait(100) (100 ms) took 99.6 ms according to clock_gettime(CLOCK_MONOTONIC) * GetQueuedCompletionStatus, HPET enabled: 1 ms... but may sleep 1 ms less than the timeout. Example on Windows with HPET enabled, GetQueuedCompletionStatus(10 ms) took 9.971 ms according to the performance counter) * GetQueuedCompletionStatus, HPET disabled: 15.6 ms For select, kqueue, I don't have reliable numbers, but I'm less worried by them since a Python application is usually slower than the resolution of these selectors. Effective resolution of clocks (according to tables in PEP 418): * QueryPerformanceCounter(): 10 ns * clock_gettime(CLOCK_MONOTONIC) on Linux: 322 ns * GetTickCount64(): 15.6 ms To fix the busy loop, poll and epoll selectors have been modified to round the timeout away from zero, instead of rounding towards zero. So a timeout of 1 ns is converted to a timeout of 1 ms instead of 0. On Windows, the clock resolution (15.6 ms) is worse than the resolution of the selector (~1 ms). The problem is that for timeout smaller than 15.6 ms, the clock value is unchanged after the sleep. So asyncio now uses the resolution of the clock to check if a scheduled task should be executed. asyncio now has a unit test scheduling task with very short delay (between 10 ms and 0.1 ns) and counts the number of calls to _run_once() to detect busy loops. The unit test knows that the selector sometimes sleep 1 or 2 ms less than the timeout and tolerates that in its checks. For selectors, select and kqueue still towards zero. I reopened #20320 which has a patch to round aways from zero and so avoids very short busy loop (1 us). If the test_timeout_rounding() test still fail on UNIX even if #20320 is fixed, we should use a more coarse resolution in asyncio event loop (BaseEventLoop._clock_resolution). |
|||
msg210910 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-02-11 09:09 | |
New changeset 03cb6ddc7040 by Victor Stinner in branch 'default': Issue #20505: Improve debug info in asyncio event loop http://hg.python.org/cpython/rev/03cb6ddc7040 |
|||
msg210911 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-02-11 09:10 | |
New changeset 9c1840e8d643 by Victor Stinner in branch 'default': Issue #20505: Oops, only print debug info if selector.select(timeout) took less http://hg.python.org/cpython/rev/9c1840e8d643 |
|||
msg210912 - (view) | Author: Charles-François Natali (neologix) * | Date: 2014-02-11 09:14 | |
> It's not easy because I collect informations from various buildbots and different virtual machines. But I planned to try to summarize the overall work done on time in asyncio, select and selectors. Thanks for the summary. > The first problem is that clocks and selectors have a different resolution. On Linux, poll() has a resolution of 1 ms, whereas clock_gettime(CLOCK_MONOTONIC) has a resolution better than 1 us (0.3 us according to my last test). time() having a better resolution than select() isn't an issue, so we're good. > On Windows, GetTickCount64() has a resolution of 15.6 ms, whereas GetQueuedCompletionStatus() has a resolution of 1 ms when HPET is enabled. > > Note: It looks like GetQueuedCompletionStatus() has the same resolution than GetTickCount64() when HPET is disabled. OK, so basically this means that with HPET, select() does block, but the clock resolution is too low, and reports an inaccurate elapsed time (and sometimes returns 0). So actually, no busy loop occurs even in this case. Wouldn't it be possible to use another clock for asyncio on Windows? I find surprising that we can't get better than 15.6ms with HPET... |
|||
msg210913 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-02-11 09:29 | |
New changeset a631b01d1715 by Victor Stinner in branch 'default': Issue #20505: use also the monotonic time to decide if asyncio debug traces http://hg.python.org/cpython/rev/a631b01d1715 |
|||
msg210914 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-11 09:33 | |
More debug traces, Windows 7 with HPET enabled: asyncio: IocpProactor.select(10.0000 ms) took 9.486 ms (monotonic=0.000 ms, clock res=15.600 ms) asyncio: IocpProactor.select(0.0010 ms) took 0.942 ms (monotonic=0.000 ms, clock res=15.600 ms) asyncio: IocpProactor.select(0.0000 ms) took 0.553 ms (monotonic=0.000 ms, clock res=15.600 ms) asyncio: IocpProactor.select(0.0000 ms) took 0.517 ms (monotonic=0.000 ms, clock res=15.600 ms) asyncio: SelectSelector.select(0.1000 ms) took 2.276 ms (monotonic=0.000 ms, clock res=15.600 ms) asyncio: SelectSelector.select(1.0000 us) took 30.810 us (monotonic=0.000 us, clock res=15600.100 us) asyncio: SelectSelector.select(0.0100 us) took 30.350 us (monotonic=0.000 us, clock res=15600.100 us) asyncio: SelectSelector.select(0.0001 us) took 28.930 us (monotonic=0.000 us, clock res=15600.100 us) Note: IocpProactor.select() rounds the timeout aways from zero with a resolution of 1 ms, whereas SelectSelector.select() rounds towards zero. It may explain why IocpSelector sleeps at least 500 us, whereas SelectSelector sleeps sometimes 30 us. |
|||
msg210915 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-11 09:38 | |
Windows 7, HPET disabled: asyncio: IocpProactor.select(100.0000 ms) took 99.871 ms (monotonic=109.000 ms, clock res=15.600 ms) asyncio: IocpProactor.select(10.0000 ms) took 3.779 ms (monotonic=16.000 ms, clock res=15.600 ms) asyncio: SelectSelector.select(100.0000 ms) took 99.608 ms (monotonic=109.000 ms, clock res=15.600 ms) asyncio: SelectSelector.select(10.0000 ms) took 2.830 ms (monotonic=16.000 ms, clock res=15.600 ms) asyncio: SelectSelector.select(1.0000 us) took 42.760 us (monotonic=0.000 us, clock res=15600.100 us) asyncio: SelectSelector.select(0.0100 us) took 42.220 us (monotonic=0.000 us, clock res=15600.100 us) asyncio: SelectSelector.select(0.0001 us) took 41.530 us (monotonic=0.000 us, clock res=15600.100 us) If you only look at the monotonic time, we always get elapsed > timeout with IocpProactor, but not with SelectSelector.select(). |
|||
msg210969 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-11 16:49 | |
test_timeout_rounding() pass on all major buildbots. test_asyncio hangs on fails on some buildbots, but there are dedicated issues (like #20495 and #20600). I'm closing this one. I reopened the rounding issue for select and kqueue, please help me to close this last issue! #20505 -- For the record, some debug info on Windows: AMD64 Windows7 SP1 3.x: asyncio: IocpProactor.select(10.0000 ms) took 6.764 ms (monotonic=15.000 ms, clock res=15.600 ms) asyncio: SelectSelector.select(100.0000 ms) took 97.473 ms (monotonic=109.000 ms, clock res=15.600 ms) asyncio: SelectSelector.select(100.0000 ms) took 97.397 ms (monotonic=109.000 ms, clock res=15.600 ms) asyncio: SelectSelector.select(0.9998 us) took 32.643 us (monotonic=0.000 us, clock res=15600.100 us) asyncio: SelectSelector.select(0.0098 us) took 28.419 us (monotonic=0.000 us, clock res=15600.100 us) x86 Windows Server 2003 [SB] 3.x: asyncio: IocpProactor.select(100.0000 ms) took 96.972 ms (monotonic=110.000 ms, clock res=15.625 ms) asyncio: IocpProactor.select(10.0000 ms) took 8.663 ms (monotonic=16.000 ms, clock res=15.625 ms) asyncio: SelectSelector.select(100.0000 ms) took 98.644 ms (monotonic=110.000 ms, clock res=15.625 ms) asyncio: SelectSelector.select(10.0000 ms) took 5.911 ms (monotonic=16.000 ms, clock res=15.625 ms) asyncio: SelectSelector.select(0.9998 us) took 75.635 us (monotonic=0.000 us, clock res=15625.000 us) asyncio: SelectSelector.select(0.0098 us) took 74.903 us (monotonic=0.000 us, clock res=15625.000 us) AMD64 Windows Server 2008 [SB] 3.x: asyncio: IocpProactor.select(100.0000 ms) took 95.484 ms (monotonic=109.000 ms, clock res=15.600 ms) asyncio: IocpProactor.select(10.0000 ms) took 1.819 ms (monotonic=15.000 ms, clock res=15.600 ms) asyncio: SelectSelector.select(100.0000 ms) took 97.741 ms (monotonic=109.000 ms, clock res=15.600 ms) asyncio: SelectSelector.select(10.0000 ms) took 1.536 ms (monotonic=16.000 ms, clock res=15.600 ms) asyncio: SelectSelector.select(0.0100 us) took 46.560 us (monotonic=0.000 us, clock res=15600.100 us) asyncio: SelectSelector.select(0.0001 us) took 46.080 us (monotonic=0.000 us, clock res=15600.100 us) |
|||
msg210970 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-02-11 16:50 | |
Thanks again Guido and Charles-François for your help on this tricky issue. Sorry for having flood your mail box :-) |
|||
msg210975 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-02-11 16:54 | |
New changeset 360976a6d8b9 by Victor Stinner in branch 'default': Issue #20505: Remove debug code http://hg.python.org/cpython/rev/360976a6d8b9 |
|||
msg213805 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-03-17 06:30 | |
New changeset 4f1df287392c by Victor Stinner in branch '3.4': Issue #20505: Use even shorter sleep in test_timeout_rounding() to make the http://hg.python.org/cpython/rev/4f1df287392c New changeset 6733d9dfffbb by Victor Stinner in branch '3.4': Issue #20505: BaseEventLoop uses again the resolution of the clock to decide if http://hg.python.org/cpython/rev/6733d9dfffbb New changeset 375e4f9c9732 by Victor Stinner in branch '3.4': Issue #20505: Fix TestLoop, set the clock resolution http://hg.python.org/cpython/rev/375e4f9c9732 New changeset e44ff3b7a497 by Victor Stinner in branch '3.4': Issue #20505: Improve debug info in asyncio event loop http://hg.python.org/cpython/rev/e44ff3b7a497 New changeset af840e781700 by Victor Stinner in branch '3.4': Issue #20505: Oops, only print debug info if selector.select(timeout) took less http://hg.python.org/cpython/rev/af840e781700 New changeset 2faf4b7c52ed by Victor Stinner in branch '3.4': Issue #20505: use also the monotonic time to decide if asyncio debug traces http://hg.python.org/cpython/rev/2faf4b7c52ed New changeset 702b20fa7af2 by Victor Stinner in branch '3.4': Issue #20505: Remove debug code http://hg.python.org/cpython/rev/702b20fa7af2 |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:58 | admin | set | github: 64704 |
2014-09-26 02:02:58 | jcea | set | nosy:
+ jcea |
2014-03-17 06:30:49 | python-dev | set | messages: + msg213805 |
2014-02-11 16:54:19 | python-dev | set | messages: + msg210975 |
2014-02-11 16:50:24 | vstinner | set | status: open -> closed resolution: fixed messages: + msg210970 |
2014-02-11 16:49:50 | vstinner | set | messages: + msg210969 |
2014-02-11 09:38:04 | vstinner | set | messages: + msg210915 |
2014-02-11 09:33:50 | vstinner | set | messages: + msg210914 |
2014-02-11 09:29:02 | python-dev | set | messages: + msg210913 |
2014-02-11 09:14:10 | neologix | set | messages: + msg210912 |
2014-02-11 09:10:50 | python-dev | set | messages: + msg210911 |
2014-02-11 09:09:33 | python-dev | set | messages: + msg210910 |
2014-02-11 08:50:38 | vstinner | set | messages: + msg210908 |
2014-02-11 08:14:38 | vstinner | set | messages: + msg210906 |
2014-02-11 08:12:25 | vstinner | set | messages: + msg210905 |
2014-02-11 08:07:03 | neologix | set | messages: + msg210904 |
2014-02-11 08:04:34 | python-dev | set | messages: + msg210903 |
2014-02-10 22:59:00 | gvanrossum | set | messages: + msg210886 |
2014-02-10 22:53:04 | vstinner | set | messages: + msg210885 |
2014-02-10 22:43:18 | python-dev | set | messages: + msg210884 |
2014-02-10 22:12:24 | gvanrossum | set | messages: + msg210879 |
2014-02-10 22:01:45 | vstinner | set | files:
+ clock_resolution-2.patch messages: + msg210875 |
2014-02-10 18:29:32 | gvanrossum | set | messages: + msg210859 |
2014-02-10 18:22:31 | vstinner | set | files:
+ clock_resolution.patch messages: + msg210858 |
2014-02-10 18:18:33 | python-dev | set | messages: + msg210857 |
2014-02-10 18:12:24 | vstinner | set | messages: + msg210856 |
2014-02-10 17:56:44 | vstinner | set | messages: + msg210855 |
2014-02-10 10:48:24 | python-dev | set | messages: + msg210822 |
2014-02-10 10:40:47 | vstinner | set | status: closed -> open resolution: fixed -> (no value) messages: + msg210821 |
2014-02-07 22:49:19 | vstinner | set | status: open -> closed resolution: fixed messages: + msg210575 |
2014-02-07 22:43:38 | vstinner | set | messages: + msg210574 |
2014-02-07 22:37:00 | python-dev | set | messages: + msg210573 |
2014-02-07 21:22:21 | gvanrossum | set | messages: + msg210558 |
2014-02-07 19:25:49 | neologix | set | messages: + msg210532 |
2014-02-07 17:37:45 | vstinner | set | messages: + msg210501 |
2014-02-07 16:57:58 | python-dev | set | nosy:
+ python-dev messages: + msg210494 |
2014-02-07 16:54:39 | vstinner | set | messages: + msg210493 |
2014-02-07 16:39:32 | vstinner | set | messages: + msg210489 |
2014-02-07 16:18:22 | vstinner | set | messages: + msg210487 |
2014-02-07 15:48:35 | gvanrossum | set | messages: + msg210484 |
2014-02-07 09:13:31 | neologix | set | messages: + msg210442 |
2014-02-06 20:06:25 | gvanrossum | set | messages: + msg210411 |
2014-02-05 19:56:18 | neologix | set | messages: + msg210343 |
2014-02-04 09:21:55 | vstinner | set | files: + add_granularity.patch |
2014-02-04 09:20:43 | vstinner | set | files:
+ asyncio_accuracy.py messages: + msg210186 |
2014-02-03 23:34:16 | vstinner | create |