Message 210908 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	gvanrossum, neologix, python-dev, vstinner
Date	2014-02-11.08:50:35
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1392108638.15.0.275813153182.issue20505@psf.upfronthosting.co.za>
In-reply-to

Content
> Wow, 10 messages in one night... Could you try to gather all your finding at once, because reading so many messages in difficult to follow? It's not easy because I collect informations from various buildbots and different virtual machines. But I planned to try to summarize the overall work done on time in asyncio, select and selectors. The first problem is that clocks and selectors have a different resolution. On Linux, poll() has a resolution of 1 ms, whereas clock_gettime(CLOCK_MONOTONIC) has a resolution better than 1 us (0.3 us according to my last test). On Windows, GetTickCount64() has a resolution of 15.6 ms, whereas GetQueuedCompletionStatus() has a resolution of 1 ms when HPET is enabled. Note: It looks like GetQueuedCompletionStatus() has the same resolution than GetTickCount64() when HPET is disabled. The second problem is that before my changes asyncio expected selector.select(timeout) to take at least timeout seconds, whereas it may take 1 or 2 ms less than timeout. This difference is very small (1 ms), but it's enough to create a busy loop. This busy loop is quite short (1 ms), but it may occur each time than call_at() or call_later() is used (ex: each time syncio.sleep() is used). Theorical resolution of selectors (resolution of the C types): * poll, epoll, GetQueuedCompletionStatus: 1 ms (10^-3) * select: 1 us (10^-6) * kqueue: 1 ns (10^-9) Effective resolution of selectors: * poll, epoll: 1 ms... but may sleep 1 or 2 ms less than the timeout, ex: on Linux, epoll_wait(100) (100 ms) took 99.6 ms according to clock_gettime(CLOCK_MONOTONIC) * GetQueuedCompletionStatus, HPET enabled: 1 ms... but may sleep 1 ms less than the timeout. Example on Windows with HPET enabled, GetQueuedCompletionStatus(10 ms) took 9.971 ms according to the performance counter) * GetQueuedCompletionStatus, HPET disabled: 15.6 ms For select, kqueue, I don't have reliable numbers, but I'm less worried by them since a Python application is usually slower than the resolution of these selectors. Effective resolution of clocks (according to tables in PEP 418): * QueryPerformanceCounter(): 10 ns * clock_gettime(CLOCK_MONOTONIC) on Linux: 322 ns * GetTickCount64(): 15.6 ms To fix the busy loop, poll and epoll selectors have been modified to round the timeout away from zero, instead of rounding towards zero. So a timeout of 1 ns is converted to a timeout of 1 ms instead of 0. On Windows, the clock resolution (15.6 ms) is worse than the resolution of the selector (~1 ms). The problem is that for timeout smaller than 15.6 ms, the clock value is unchanged after the sleep. So asyncio now uses the resolution of the clock to check if a scheduled task should be executed. asyncio now has a unit test scheduling task with very short delay (between 10 ms and 0.1 ns) and counts the number of calls to _run_once() to detect busy loops. The unit test knows that the selector sometimes sleep 1 or 2 ms less than the timeout and tolerates that in its checks. For selectors, select and kqueue still towards zero. I reopened #20320 which has a patch to round aways from zero and so avoids very short busy loop (1 us). If the test_timeout_rounding() test still fail on UNIX even if #20320 is fixed, we should use a more coarse resolution in asyncio event loop (BaseEventLoop._clock_resolution).

> Wow, 10 messages in one night... Could you try to gather all your finding at once, because reading so many messages in difficult to follow?

It's not easy because I collect informations from various buildbots and different virtual machines. But I planned to try to summarize the overall work done on time in asyncio, select and selectors.

The first problem is that clocks and selectors have a different resolution. On Linux, poll() has a resolution of 1 ms, whereas clock_gettime(CLOCK_MONOTONIC) has a resolution better than 1 us (0.3 us according to my last test). On Windows, GetTickCount64() has a resolution of 15.6 ms, whereas GetQueuedCompletionStatus() has a resolution of 1 ms when HPET is enabled.

Note: It looks like GetQueuedCompletionStatus() has the same resolution than GetTickCount64() when HPET is disabled.

The second problem is that before my changes asyncio expected selector.select(timeout) to take at least timeout seconds, whereas it may take 1 or 2 ms less than timeout. This difference is very small (1 ms), but it's enough to create a busy loop. This busy loop is quite short (1 ms), but it may occur each time than call_at() or call_later() is used (ex: each time syncio.sleep() is used).


Theorical resolution of selectors (resolution of the C types):

* poll, epoll, GetQueuedCompletionStatus: 1 ms (10^-3)
* select: 1 us (10^-6)
* kqueue: 1 ns (10^-9)


Effective resolution of selectors:

* poll, epoll: 1 ms... but may sleep 1 or 2 ms less than the timeout, ex: on Linux, epoll_wait(100) (100 ms) took 99.6 ms according to clock_gettime(CLOCK_MONOTONIC)
* GetQueuedCompletionStatus, HPET enabled: 1 ms... but may sleep 1 ms less than the timeout. Example on Windows with HPET enabled, GetQueuedCompletionStatus(10 ms) took 9.971 ms according to the performance counter)
* GetQueuedCompletionStatus, HPET disabled: 15.6 ms

For select, kqueue, I don't have reliable numbers, but I'm less worried by them since a Python application is usually slower than the resolution of these selectors.


Effective resolution of clocks (according to tables in PEP 418):

* QueryPerformanceCounter(): 10 ns
* clock_gettime(CLOCK_MONOTONIC) on Linux: 322 ns
* GetTickCount64(): 15.6 ms


To fix the busy loop, poll and epoll selectors have been modified to round the timeout away from zero, instead of rounding towards zero. So a timeout of 1 ns is converted to a timeout of 1 ms instead of 0.


On Windows, the clock resolution (15.6 ms) is worse than the resolution of the selector (~1 ms). The problem is that for timeout smaller than 15.6 ms, the clock value is unchanged after the sleep. So asyncio now uses the resolution of the clock to check if a scheduled task should be executed.


asyncio now has a unit test scheduling task with very short delay (between 10 ms and 0.1 ns) and counts the number of calls to _run_once() to detect busy loops. The unit test knows that the selector sometimes sleep 1 or 2 ms less than the timeout and tolerates that in its checks.


For selectors, select and kqueue still towards zero. I reopened #20320 which has a patch to round aways from zero and so avoids very short busy loop (1 us).


If the test_timeout_rounding() test still fail on UNIX even if #20320 is fixed, we should use a more coarse resolution in asyncio event loop (BaseEventLoop._clock_resolution).

History
Date	User	Action	Args
2014-02-11 08:50:38	vstinner	set	recipients: + vstinner, gvanrossum, neologix, python-dev
2014-02-11 08:50:38	vstinner	set	messageid: <1392108638.15.0.275813153182.issue20505@psf.upfronthosting.co.za>
2014-02-11 08:50:38	vstinner	link	issue20505 messages
2014-02-11 08:50:35	vstinner	create