classification
Title: time.monotonic() should use a different clock source on Windows
Type: performance Stage:
Components: Library (Lib), Windows Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, eryksun, lunixbochs2, p-ganssle, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords:

Created on 2021-06-06 22:05 by lunixbochs2, last changed 2021-06-14 21:10 by eryksun.

Messages (12)
msg395221 - (view) Author: Ryan Hileman (lunixbochs2) * Date: 2021-06-06 22:05
Related to https://bugs.python.org/issue41299#msg395220

Presumably `time.monotonic()` on Windows historically used GetTickCount64() because QueryPerformanceCounter() could fail. However, that hasn't been the case since Windows XP: https://docs.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter

> On systems that run Windows XP or later, the function will always succeed and will thus never return zero

I've run into issues with this when porting python-based applications to Windows. On other platforms, time.monotonic() was a decent precision so I used it. When I ported to Windows, I had to replace all of my time.monotonic() calls with time.perf_counter(). I would pretty much never knowingly call time.monotonic() if I knew ahead of time it could be quantized to 16ms.

My opinion is that the GetTickCount64() monotonic time code in CPython should be removed entirely and only the QueryPerformanceCounter() path should be used.

I also think some of the failure checks could be removed from QueryPerformanceCounter() / QueryPerformanceFrequency(), as they're documented to never fail in modern Windows and CPython has been dropping support for older versions of Windows, but that's less of a firm opinion.
msg395238 - (view) Author: Ryan Hileman (lunixbochs2) * Date: 2021-06-07 01:05
I found these two references:
- https://stackoverflow.com/questions/35601880/windows-timing-drift-of-performancecounter-c
- https://bugs.python.org/issue10278#msg143209

Which suggest QueryPerformanceCounter() may be bad because it can drift. However, these posts are fairly old and the StackOverflow post also says the drift is small on newer hardware / Windows.

Microsoft's current stance is that QueryPerformanceCounter() is good: https://docs.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps

> Guidance for acquiring time stamps
> Windows has and will continue to invest in providing a reliable and efficient performance counter. When you need time stamps with a resolution of 1 microsecond or better and you don't need the time stamps to be synchronized to an external time reference, choose QueryPerformanceCounter

I looked into how a few other languages provide monotonic time on Windows:

Golang seems to read the interrupt time (presumably equivalent to QueryInterruptTime) directly by address. https://github.com/golang/go/blob/a3868028ac8470d1ab7782614707bb90925e7fe3/src/runtime/sys_windows_amd64.s#L499

Rust uses QueryPerformanceCounter: https://github.com/rust-lang/rust/blob/38ec87c1885c62ed8c66320ad24c7e535535e4bd/library/std/src/time.rs#L91

V8 uses QueryPerformanceCounter after checking for old CPUs: https://github.com/v8/v8/blob/dc712da548c7fb433caed56af9a021d964952728/src/base/platform/time.cc#L672

Ruby uses QueryPerformanceCounter: https://github.com/ruby/ruby/blob/44cff500a0ad565952e84935bc98523c36a91b06/win32/win32.c#L4712

C# implements QueryPerformanceCounter on other platforms using CLOCK_MONOTONIC, indicating that they should be roughly equivalent: https://github.com/dotnet/runtime/blob/01b7e73cd378145264a7cb7a09365b41ed42b240/src/coreclr/pal/src/misc/time.cpp#L175

Swift originally used QueryPerformanceCounter, but switched to QueryUnbiasedInterruptTime() because they didn't want to count time the system spent asleep: https://github.com/apple/swift-corelibs-libdispatch/commit/766d64719cfdd07f97841092bec596669261a16f

------

Note that none of these languages use GetTickCount64(). Swift is an interesting counter point, and I noticed QueryUnbiasedInterruptTime() is available on Windows 8 while QueryInterruptTime() is new as of Windows 10. The "Unbiased" just refers to whether it advances during sleep.

I'm not actually sure whether time.monotonic() in Python counts time spent asleep, or whether that's desirable. Some kinds of timers using monotonic time should definitely freeze during sleep so they don't cause a flurry of activity on wake, but others definitely need to roughly track wall clock time, even during sleep.

Perhaps the long term answer would be to introduce separate "asleep" and "awake" monotonic clocks in Python, and possibly deprecate perf_counter() if it's redundant after this (as I think it's aliased to monotonic() on non-Windows platforms anyway).
msg395490 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-06-09 22:38
You resolved bpo-41299 using QueryPerformanceCounter(), so we're already a step toward making it the default monotonic clock. Personally, I've only relied on QPC for short intervals, but, as you've highlighted above, other language runtimes use it for their monotonic clock. Since Vista, it's apparently more reliable in terms of calibration and ensuring that a processor TSC is only used if it's known to be invariant and constant.

That said, Windows 10 also provides QueryInterruptTimePrecise(), which is a hybrid solution. It uses the performance counter to interpolate a timestamp between interrupts. I'd prefer to use this for time.monotonic() instead of QPC, if it's available via GetProcAddress().

QueryInterruptTimePrecise() is about 1.38 times the cost of QPC (on average across 100 million calls). Both functions are significantly more expensive than QueryInterruptTime() and GetTickCount64(), which simply return a value that's read from shared memory (i.e. the KUSER_SHARED_DATA structure).

> QueryUnbiasedInterruptTime() is available on Windows 8 while 
> QueryInterruptTime() is new as of Windows 10. The "Unbiased" 
> just refers to whether it advances during sleep.

QueryInterruptTime() and QueryUnbiasedInterruptTime() don't provide high-resolution timestamps. They're updated by the system timer interrupt service routine, which defaults to 64 interrupts/second. The time increment depends on when the counter is read by the ISR, but it averages out to approximately the interrupt period (e.g. 15.625 ms).

> I'm not actually sure whether time.monotonic() in Python counts 
> time spent asleep, or whether that's desirable. 

POSIX doesn't specify whether CLOCK_MONOTONIC [1] should include the time that elapses while the system is in standby mode. In Linux, CLOCK_BOOTTIME includes this time, and CLOCK_MONOTONIC excludes it. Windows QueryUnbiasedInterruptTime[Precise]() excludes it.

> Perhaps the long term answer would be to introduce separate 
> "asleep" and "awake" monotonic clocks in Python

Both may not be supportable on all platforms, but they're supported in Linux, Windows 10, and macOS. The latter has mach_continuous_time(), which includes the time in standby mode, and mach_absolute_time(), which excludes it.

--- 
[1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/clock_gettime.html
msg395493 - (view) Author: Ryan Hileman (lunixbochs2) * Date: 2021-06-09 23:13
Great information, thanks!

> Windows 10 also provides QueryInterruptTimePrecise(), which is a hybrid solution. It uses the performance counter to interpolate a timestamp between interrupts. I'd prefer to use this for time.monotonic() instead of QPC, if it's available via GetProcAddress()

My personal vote is to use the currently most common clock source (QPC) for now for monotonic(), because it's the same across Windows versions and the most likely to produce portable monotonic timestamps between apps/languages on the same system. It's also the easiest patch, as there's already a code path for QPC.

(As someone building multi-app experiences around Python, I don't want to check the Windows version to see which time base Python is using. I'd feel better about switching to QITP() if/when Python drops Windows 8 support.)

A later extension of this idea (maybe behind a PEP) could be to survey the existing timers available on each platform and consider whether it's worth extending `time` to expose them all, and unify cross-platform the ones that are exposed (e.g. better formalize/document which clocks will advance while the machine is asleep on each platform).
msg395681 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-12 08:47
Changing is clock is a tricky. There are many things to consider:

* Is it really monotonic in all cases?
* Does it have a better resolution than the previous clock?
* Corner cases: does it include time spent in time.sleep() and while the system is suspended?
* etc.

--

When I designed PEP 418 (in 2012), QueryPerformanceCounter() was not reliable:

"It has a much higher resolution, but has lower long term precision than GetTickCount() and timeGetTime() clocks. For example, it will drift compared to the low precision clocks."
https://www.python.org/dev/peps/pep-0418/#windows-queryperformancecounter

And there were a few bugs like: "The performance counter value may unexpectedly leap forward because of a hardware bug".

A Microsoft blog article explains that users wanting a steady clock with precision higher than GetTickCount() should interpolate GetTickCount() using QueryPerformanceCounter(). If I recall correctly, this is what Firefox did for instance.

Eryk: "That said, Windows 10 also provides QueryInterruptTimePrecise(), which is a hybrid solution. It uses the performance counter to interpolate a timestamp between interrupts. I'd prefer to use this for time.monotonic() instead of QPC, if it's available via GetProcAddress()."

Oh, good that they provided an implementation for that :-)

--

> V8 uses QueryPerformanceCounter after checking for old CPUs: https://github.com/v8/v8/blob/dc712da548c7fb433caed56af9a021d964952728/src/base/platform/time.cc#L672

It uses CPUID to check for "non stoppable time stamp counter": 
https://github.com/v8/v8/blob/master/src/base/cpu.cc

  // Check if CPU has non stoppable time stamp counter.
  const unsigned parameter_containing_non_stop_time_stamp_counter = 0x80000007;
  if (num_ext_ids >= parameter_containing_non_stop_time_stamp_counter) {
    __cpuid(cpu_info, parameter_containing_non_stop_time_stamp_counter);
    has_non_stop_time_stamp_counter_ = (cpu_info[3] & (1 << 8)) != 0;
  }

Maybe we use such check in Python: use GetTickCount() on old CPUs, or QueryPerformanceCounter() otherwise. MSVC provides the __cpuid() function:
https://docs.microsoft.com/en-us/cpp/intrinsics/cpuid-cpuidex?view=msvc-160

--

> Swift originally used QueryPerformanceCounter, but switched to QueryUnbiasedInterruptTime() because they didn't want to count time the system spent asleep

Oh, I recall that it was a tricky question. The PEP 418 simply says:
"The behaviour of clocks after a system suspend is not defined in the documentation of new functions."

See "Include Sleep" and "Include Suspend" columns of my table:
https://www.python.org/dev/peps/pep-0418/#monotonic-clocks
msg395683 - (view) Author: Ryan Hileman (lunixbochs2) * Date: 2021-06-12 09:51
I think a lot of that is based on very outdated information. It's worth reading this article: https://docs.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps

I will repeat Microsoft's current recommendation (from that article):

> Windows has and will continue to invest in providing a reliable and efficient performance counter. When you need time stamps with a resolution of 1 microsecond or better and you don't need the time stamps to be synchronized to an external time reference, choose QueryPerformanceCounter, KeQueryPerformanceCounter, or KeQueryInterruptTimePrecise. When you need UTC-synchronized time stamps with a resolution of 1 microsecond or better, choose GetSystemTimePreciseAsFileTime or KeQuerySystemTimePrecise.

(Based on that, it may also be worth replacing time.time()'s GetSystemTimeAsFileTime with GetSystemTimePreciseAsFileTime in CPython, as GetSystemTimePreciseAsFileTime is available in Windows 8 and newer)

PEP 418:

> It has a much higher resolution, but has lower long term precision than GetTickCount() and timeGetTime() clocks. For example, it will drift compared to the low precision clocks.

Microsoft on drift (from the article above):

> To reduce the adverse effects of this frequency offset error, recent versions of Windows, particularly Windows 8, use multiple hardware timers to detect the frequency offset and compensate for it to the extent possible. This calibration process is performed when Windows is started.

Modern Windows also automatically detects and works around stoppable TSC, as well as several other issues:

> Some processors can vary the frequency of the TSC clock or stop the advancement of the TSC register, which makes the TSC unsuitable for timing purposes on these processors. These processors are said to have non-invariant TSC registers. (Windows will automatically detect this, and select an alternative time source for QPC).

It seems like Microsoft considers QPC to be a significantly better time source now, than when PEP 418 was written.

Another related conversation is whether Python can just expose all of the Windows clocks directly (through clock_gettime enums?), as that gives anyone who really wants full control over their timestamps a good escape hatch.
msg395719 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-12 23:49
> To reduce the adverse effects of this frequency offset error, recent versions of Windows, particularly Windows 8, use multiple hardware timers to detect the frequency offset and compensate for it to the extent possible. This calibration process is performed when Windows is started.

Technically, it remains possible to install Python on Windows 7, see: bpo-32592.
msg395769 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-06-14 01:03
On second thought, starting with Windows 8, WaitForSingleObject() and WaitForMultipleObjects() exclude time when the system is suspended. For consistency, an external deadline (e.g. for SIGINT support) should work the same way. The monotonic clock should thus be based on QueryUnbiasedInterruptTime(). We can conditionally use QueryUnbiasedInterruptTimePrecise() in Windows 10, which I presume includes most users of Python 3.9+ on Windows since Windows 8.1 only has a 3% share of desktop/laptop systems.

If we can agree on the above, then the change to use QueryPerformanceCounter() to resolve bpo-41299 should be reverted. The deadline should instead be computed with QueryUnbiasedInterruptTime(). It's limited to the resolution of the system interrupt time, but at least compared to GetTickCount64() it returns the real interrupt time instead of an idealized 64 ticks/second.

> expose all of the Windows clocks directly (through clock_gettime enums?)

_Py_clock_gettime() and _Py_clock_getres() could be implemented in Python/pytime.c. For Windows we could implement the following clocks:

	CLOCK_REALTIME            GetSystemTimePreciseAsFileTime
	CLOCK_REALTIME_COARSE     GetSystemTimeAsFileTime
	CLOCK_MONOTONIC_COARSE    QueryUnbiasedInterruptTime
	CLOCK_PROCESS_CPUTIME_ID  GetProcessTimes
	CLOCK_THREAD_CPUTIME_ID   GetThreadTimes
        CLOCK_PERF_COUNTER        QueryPerformanceCounter

	Windows 10+
	CLOCK_MONOTONIC           QueryUnbiasedInterruptTimePrecise
	CLOCK_BOOTTIME            QueryInterruptTimePrecise
	CLOCK_BOOTTIME_COARSE     QueryInterruptTime

> it may also be worth replacing time.time()'s GetSystemTimeAsFileTime with 
> GetSystemTimePreciseAsFileTime

See bpo-19007, which is nearly 8 years old.
msg395771 - (view) Author: Ryan Hileman (lunixbochs2) * Date: 2021-06-14 03:47
> The monotonic clock should thus be based on QueryUnbiasedInterruptTime

My primary complaint here is that Windows is the only major platform with a low resolution monotonic clock. Using QueryUnbiasedInterruptTime() on older OS versions wouldn't entirely help that, so we have the same consistency issue (just on a smaller scale). I would personally need to still use time.perf_counter() instead of time.monotonic() due to this, but I'm not totally against it.

> For consistency, an external deadline (e.g. for SIGINT support) should work the same way.

Have there been any issues filed about the deadline behaviors across system suspend?

> which I presume includes most users of Python 3.9+

Seems like Windows 7 may need to be considered as well, as per vstinner's bpo-32592 mention?

> starting with Windows 8, WaitForSingleObject() and WaitForMultipleObjects() exclude time when the system is suspended

Looks like Linux (CLOCK_MONOTONIC) and macOS (mach_absolute_time()) already don't track suspend time in time.monotonic(). I think that's enough to suggest that long-term Windows shouldn't either, but I don't know how to reconcile that with my desire for Windows not to be the only platform with low resolution monotonic time by default.

> then the change to use QueryPerformanceCounter() to resolve bpo-41299 should be reverted. The deadline should instead be computed with QueryUnbiasedInterruptTime()

I don't agree with this, as it would regress the fix. This is more of a topic for bpo-41299, but I tested QueryUnbiasedInterruptTime() and it exhibits the same 16ms jitter as GetTickCount64() (which I expected), so non-precise interrupt time can't solve this issue. I do think QueryUnbiasedInterruptTimePrecise() would be a good fit. I think making this particular timeout unbiased (which would be a new behavior) should be a lower priority than making it not jitter.

> For Windows we could implement the following clocks:

I think that list is great and making those enums work with clock_gettime on Windows sounds like a very clear improvement to the timing options available. Having the ability to query each clock source directly would also reduce the impact if time.monotonic() does not perfectly suit a specific application.

---

I think my current positions after writing all of this are:

- I would probably be in support of a 3.11+ change for time.monotonic() to use QueryUnbiasedInterruptTime() pre-Windows 10, and dynamically use QueryUnbiasedInterruptTimePrecise() on Windows 10+. Ideally the Windows clock_gettime() code lands in the same release, so users can directly pick their time source if necessary. This approach also helps my goal of making time.monotonic()'s suspend behavior more uniform across platforms.

- Please don't revert bpo-41299 (especially the backports), as it does fix the issue and tracking suspend time is the same (not a regression) as the previous GetTickCount64() code. I think the lock timeouts should stick with QPC pre-Windows-10 to fix the jitter, but could use QueryUnbiasedInterruptTimePrecise() on Windows 10+ (which needs the same runtime check as the time.monotonic() change, thus could probably land in the same patch set).

- I'm honestly left with more questions than I started after diving into the GetSystemTimePreciseAsFileTime() rabbit hole. I assume it's not a catastrophic issue? Maybe it's a situation where adding the clock_gettime() enums would sufficiently help anyone who cares about the exact behavior during clock modification. I don't have strong opinions about it, besides it being a shame that Windows currently has lower precision timestamps in general. Could be worth doing a survey of other languages' choices, but any further discussion can probably go to bpo-19007.
msg395782 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-06-14 08:41
> Seems like Windows 7 may need to be considered as well, as 
> per vstinner's bpo-32592 mention?

Python 3.9 doesn't support Windows 7. Moreover, the interpreter DLL in 3.9 implicitly imports PathCchCanonicalizeEx, PathCchCombineEx, and PathCchSkipRoot, which were added in Windows 8. So it won't even load in Windows 7.

> Have there been any issues filed about the deadline behaviors 
> across system suspend?

Not that I'm aware of, but waits should be correct and consistent in principle. It shouldn't behave drastically different just because the user closed the laptop lid for an hour.

> Looks like Linux (CLOCK_MONOTONIC) and macOS (mach_absolute_time())
> already don't track suspend time in time.monotonic(). I think that's
> enough to suggest that long-term Windows shouldn't either

I'm not overly concerned here with cross-platform consistency. If Windows hadn't changed the behavior of wait timeouts, then I wouldn't worry about it since most clocks in Windows are biased by the time spent suspended. It's a bonus that this change would also improve cross-platform consistency for time.monotonic(). 

> I tested QueryUnbiasedInterruptTime() and it exhibits the same 
> 16ms jitter as GetTickCount64() (which I expected), 

For bpo-41299, it occurs to me that we've only ever used _PY_EMULATED_WIN_CV, in which case PyCOND_TIMEDWAIT() returns 1 for a timeout, as implemented in _PyCOND_WAIT_MS(). Try changing EnterNonRecursiveMutex() to break out of the loop in this case. For example:

    } else if (milliseconds != 0) {
        /* wait at least until the target */
        ULONGLONG now, target;
        QueryUnbiasedInterruptTime(&target);
        target += milliseconds;
        while (mutex->locked) {
            int ret = PyCOND_TIMEDWAIT(&mutex->cv, &mutex->cs,
                        (long long)milliseconds * 1000);
            if (ret < 0) {
                result = WAIT_FAILED;
                break;
            }
            if (ret == 1) { /* timeout */
                break;
            }
            QueryUnbiasedInterruptTime(&now);
            if (target <= now)
                break;
            milliseconds = (DWORD)(target - now);
        }
    }
msg395784 - (view) Author: Ryan Hileman (lunixbochs2) * Date: 2021-06-14 09:15
> It shouldn't behave drastically different just because the user closed the laptop lid for an hour

I talked to someone who's been helping with the Go time APIs and it seems like that holds pretty well for interactive timeouts, but makes no sense for network related code. If you lost a network connection (with, say, a 30 second timeout) due to the lid being closed, you don't want to wait 30 seconds after opening the lid for the application to realize it needs to reconnect. (However there's probably no good way to design Python's locking system around both cases, so it's sufficient to say "lock timers won't advance during suspend" and make the application layer work around that on its own in the case of network code)

> Try changing EnterNonRecursiveMutex() to break out of the loop in this case

This does work, but unfortunately a little too well - in a single test I saw several instances where that approach returned _earlier_ than the timeout.

I assume the reason for this loop is the call can get interrupted with a "needs retry" state. If so, you'd still see 16ms of jitter anytime that happens as long as it's backed by a quantized time source.
msg395849 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-06-14 21:10
>> Try changing EnterNonRecursiveMutex() to break out of the loop in 
>> this case
>
> This does work, but unfortunately a little too well - in a single 
> test I saw several instances where that approach returned 
> _earlier_ than the timeout.

It's documented that a timeout between N and N+1 ticks can be satisfied anywhere in that range. In practice I see a wider range. In the kernel, variations in wait time could depend on when the due time is calculated in the interrupt cycle, when the next interrupt occurs and the interrupt time is updated, and when the thread is dispatched. A benefit of using a high-resolution external deadline is that waiting will never return early, but it may return later than it otherwise would, e.g. if re-waiting for a remaining 1 ms actually takes 20 ms.

There are many unrelated WaitForSingleObject and WaitForMultipleObjects in the interpreter, extension modules, and code that uses _winapi.WaitForSingleObject and _winapi.WaitForMultipleObjects. For example, time.sleep() allows WAIT_TIMEOUT to override the deadline. I suggest measuring the performance-counter interval for time.sleep(0.001) on both the main thread (Sleep based) and a new thread (WaitForSingleObjectEx based).
History
Date User Action Args
2021-06-14 21:10:46eryksunsetmessages: + msg395849
2021-06-14 09:15:47lunixbochs2setmessages: + msg395784
2021-06-14 08:41:41eryksunsetmessages: + msg395782
2021-06-14 03:47:35lunixbochs2setmessages: + msg395771
2021-06-14 01:03:13eryksunsetmessages: + msg395769
2021-06-12 23:49:27vstinnersetmessages: + msg395719
2021-06-12 09:51:03lunixbochs2setmessages: + msg395683
2021-06-12 08:47:53vstinnersetmessages: + msg395681
2021-06-12 00:44:28terry.reedysetnosy: + belopolsky, vstinner, p-ganssle

versions: - Python 3.8, Python 3.9, Python 3.10
2021-06-09 23:13:59lunixbochs2setmessages: + msg395493
2021-06-09 22:38:05eryksunsetnosy: + eryksun
messages: + msg395490
2021-06-07 01:05:26lunixbochs2setmessages: + msg395238
title: time.monotonic() should use QueryPerformanceCounter() on Windows -> time.monotonic() should use a different clock source on Windows
2021-06-06 22:05:07lunixbochs2create