Message 373736 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	eryksun
Recipients	Dennis Sweeney, SD, eryksun, ned.deily, paul.moore, pitrou, steve.dower, tim.golden, zach.ware
Date	2020-07-16.02:54:00
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1594868041.43.0.600433810581.issue41299@roundup.psfhosted.org>
In-reply-to

Content
> On the smaller scale, it looks quantized to multiples of ~15ms (?), > but then it gets more accurate as the times get larger. I don't > think it's a measurement error since the first measurement manages > microseconds. NT's default system timer resolution for thread dispatching is 15.625 ms. If a wait is for 17 ms, expect to see actual wait times mostly in the range 17 ms up to 31.25 ms. A few outliers may wait longer because the due time on the kernel wait is when the thread is ready to be dispatched, but other threads may preempt it. The system timer resolution can be increased to about 1 ms (or 500 us with the undocumented NtSetSystemTime system call). But stricter timing still requires combining a dispatcher wait for long course-grained waits with a performance-counter busy wait for short precise waits. --- But something else is also involved here. I lowered the system timer resolution to 1 ms, and this allowed time.sleep(0.017) to wait for 17-18 ms in about 95% of cases. But the wait for acquiring a _thread.lock stubbornly refused to cooperate. Ultimately it's just calling WaitForSingleObjectEx on a semaphore, so I used ctypes to make a simple alternative lock via CreateSemaphoreW, ReleaseSemaphore, and WaitForSingleObject. This simple implementation performed exactly like the time.sleep wait with regard to the system timer resolution, so the difference is in the _thread.lock wait. I traced it to the following code in EnterNonRecursiveMutex in Python/thread_nt.h: /* wait at least until the target / ULONGLONG now, target = GetTickCount64() + milliseconds; while (mutex->locked) { if (PyCOND_TIMEDWAIT(&mutex->cv, &mutex->cs, (long long)milliseconds1000) < 0) { result = WAIT_FAILED; break; } now = GetTickCount64(); if (target <= now) break; milliseconds = (DWORD)(target-now); } GetTickCount64 is documented to be "limited to the resolution of the system timer", but what they don't say is that changing the resolution of the system timer has no effect on the minimum increment of the tick count. It still increments by 15-16 ms even if the system timer resolution is set to 1 ms. OTOH, the value of QueryInterruptTime [1] and QueryUnbiasedInterruptTime [2] is incremented with the timer interrupt. The biased version [1] is preferred for long waits that may straddle a system sleep or hibernate, but it's only available in Windows 10. I patched EnterNonRecursiveMutex to call QueryInterruptTime instead of GetTickCount64, and this did enable increased precision when waiting on a lock. For example (patched behavior): >>> lock = _thread.allocate_lock() >>> lock.acquire() True >>> setup = 'from __main__ import lock' >>> stmt = 'lock.acquire(True, 0.017)' 15.625 ms system timer: >>> timeit.timeit(stmt, setup, number=1000) 30.173713599999985 1 ms system timer: >>> with set_timer_resolution(0.001): ... timeit.timeit(stmt, setup, number=1000) ... 17.66828049999998 That said, increasing the timer resolution is discouraged in most cases, so we may simply document that lock waits are limited to the default system timer resolution of 15.625 ms, and increasing the system timer resolution has no effect on this limit. [1]: https://docs.microsoft.com/en-us/windows/win32/api/realtimeapiset/nf-realtimeapiset-queryinterrupttime [2]: https://docs.microsoft.com/en-us/windows/win32/api/realtimeapiset/nf-realtimeapiset-queryunbiasedinterrupttime --- Note that this is unrelated to cancel support via Ctrl+C. Windows Python has no support for canceling a wait on a _thread.lock. It's just a single-object wait in _PyCOND_WAIT_MS, not a multiple-object wait that we set up to include the SIGINT event when called on the main thread (or a variant that I like, which queues a user APC to the main thread for SIGINT instead of using an event, and switches to using alertable waits with SleepEx, WaitForSingleObjectEx, and without needing a wait slot in WaitForMultipleObjectsEx). It's possible to implement a Ctrl+C cancel as long as the lock implementation waits on a kernel semaphore object. However, some effort has gone into developing a different implementation based on condition variables and SRW locks. I don't know whether there's a way to cancel SleepConditionVariableSRW, or whether maybe a different implementation could be used for _thread.lock instead of sharing an implementation with the GIL.

> On the smaller scale, it looks quantized to multiples of ~15ms (?), 
> but then it gets more accurate as the times get larger. I don't 
> think it's a measurement error since the first measurement manages
> microseconds.

NT's default system timer resolution for thread dispatching is 15.625 ms. If a wait is for 17 ms, expect to see actual wait times mostly in the range 17 ms up to 31.25 ms. A few outliers may wait longer because the due time on the kernel wait is when the thread is ready to be dispatched, but other threads may preempt it.

The system timer resolution can be increased to about 1 ms (or 500 us with the undocumented NtSetSystemTime system call). But stricter timing still requires combining a dispatcher wait for long course-grained waits with a performance-counter busy wait for short precise waits.

---

But something else is also involved here. I lowered the system timer resolution to 1 ms, and this allowed time.sleep(0.017) to wait for 17-18 ms in about 95% of cases. But the wait for acquiring a _thread.lock stubbornly refused to cooperate. Ultimately it's just calling WaitForSingleObjectEx on a semaphore, so I used ctypes to make a simple alternative lock via CreateSemaphoreW, ReleaseSemaphore, and WaitForSingleObject. This simple implementation performed exactly like the time.sleep wait with regard to the system timer resolution, so the difference is in the _thread.lock wait. I traced it to the following code in EnterNonRecursiveMutex in Python/thread_nt.h:

        /* wait at least until the target */
        ULONGLONG now, target = GetTickCount64() + milliseconds;
        while (mutex->locked) {
            if (PyCOND_TIMEDWAIT(&mutex->cv, &mutex->cs, (long long)milliseconds*1000) < 0) {
                result = WAIT_FAILED;
                break;
            }
            now = GetTickCount64();
            if (target <= now)
                break;
            milliseconds = (DWORD)(target-now);
        }

GetTickCount64 is documented to be "limited to the resolution of the system timer", but what they don't say is that changing the resolution of the system timer has no effect on the minimum increment of the tick count. It still increments by 15-16 ms even if the system timer resolution is set to 1 ms. 

OTOH, the value of QueryInterruptTime [1] and QueryUnbiasedInterruptTime [2] is incremented with the timer interrupt. The biased version [1] is preferred for long waits that may straddle a system sleep or hibernate, but it's only available in Windows 10. I patched EnterNonRecursiveMutex to call QueryInterruptTime instead of GetTickCount64, and this did enable increased precision when waiting on a lock. For example (patched behavior):

    >>> lock = _thread.allocate_lock()
    >>> lock.acquire()
    True
    >>> setup = 'from __main__ import lock'
    >>> stmt = 'lock.acquire(True, 0.017)'

15.625 ms system timer:

    >>> timeit.timeit(stmt, setup, number=1000)
    30.173713599999985

1 ms system timer:

    >>> with set_timer_resolution(0.001):
    ...     timeit.timeit(stmt, setup, number=1000)
    ...
    17.66828049999998

That said, increasing the timer resolution is discouraged in most cases, so  we may simply document that lock waits are limited to the default system timer resolution of 15.625 ms, and increasing the system timer resolution has no effect on this limit.

[1]: https://docs.microsoft.com/en-us/windows/win32/api/realtimeapiset/nf-realtimeapiset-queryinterrupttime
[2]: https://docs.microsoft.com/en-us/windows/win32/api/realtimeapiset/nf-realtimeapiset-queryunbiasedinterrupttime

---

Note that this is unrelated to cancel support via Ctrl+C. Windows Python has no support for canceling a wait on a _thread.lock. It's just a single-object wait in _PyCOND_WAIT_MS, not a multiple-object wait that we set up to include the SIGINT event when called on the main thread (or a variant that I like, which queues a user APC to the main thread for SIGINT instead of using an event, and switches to using alertable waits with SleepEx, WaitForSingleObjectEx, and without needing a wait slot in WaitForMultipleObjectsEx). It's possible to implement a Ctrl+C cancel as long as the lock implementation waits on a kernel semaphore object. However, some effort has gone into developing a different implementation based on condition variables and SRW locks. I don't know whether there's a way to cancel SleepConditionVariableSRW, or whether maybe a different implementation could be used for _thread.lock instead of sharing an implementation with the GIL.

History
Date	User	Action	Args
2020-07-16 02:54:01	eryksun	set	recipients: + eryksun, paul.moore, pitrou, tim.golden, ned.deily, zach.ware, steve.dower, Dennis Sweeney, SD
2020-07-16 02:54:01	eryksun	set	messageid: <1594868041.43.0.600433810581.issue41299@roundup.psfhosted.org>
2020-07-16 02:54:01	eryksun	link	issue41299 messages
2020-07-16 02:54:00	eryksun	create